Automatic Transcripts with Deepgram and Directus Automate
Published October 10th, 2023
Voice is one of the most common ways we communicate and yet one of the hardest for developers to use and understand. In this post, you'll use Deepgram's speech recognition API and Directus Automate to create and store transcripts whenever a new file is uploaded.
Before You Start
You will need a Directus project - check out our quickstart guide if you don't already have one and generate an API Token that allows access to the location where your audio files will be uploaded. You will also need a Deepgram account and API Key with "Member" privileges. You should also have a MP3 file to test with - here's one you can download and use.
Set Up Trigger
Create a new Flow from your Directus project settings - call it "Transcribe New Audio Files". Create an Event Hook trigger that is Non-Blocking - this means the flow will run asynchronously and not delay data being entered in the database. Finally, set the Scope to files.upload
.
Check File Is Audio
Add a Condition operation with the following rule:
{
"$trigger": {
"payload": {
"type": {
"_contains": "audio"
}
}
}
}
{
"$trigger": {
"payload": {
"type": {
"_contains": "audio"
}
}
}
}
If a file is not audio, the Flow will end without any further steps being taken. Further steps in this post should be added from the resolved path of the condition.
Generate Transcript with Deepgram
Create a Webhook / Request URL operation and set the key to deepgram
. Set the Method to POST and the URL to https://api.deepgram.com/v1/listen?smart_format=true&diarize=true. Smart format adds formatting to the transcript to make it more human-readable. Diarize will add speaker labels, so you can tell what was said by different people.
Add a Authorization
header with the value Token YOUR_KEY
, replacing YOUR_KEY
with your Deepgram API Key.
Finally, in the Request Body, provide a link to the file that triggered the Flow:
{
"url":"YOUR_DIRECTUS_URL/assets/{{$trigger.key}}?access_token=TOKEN"
}
{
"url":"YOUR_DIRECTUS_URL/assets/{{$trigger.key}}?access_token=TOKEN"
}
Replace YOUR_DIRECTUS_URL
with the URL for your Directus project, and TOKEN
with your Directus static token.
Save Transcript to File Description
From the resolved path of the previous operation, create an Update Data operation. Set the Collection to directus_files
with Full Access permissions.
Set a System Collection
The dropdown in the collection field will only show user-created collections. To add directus_files
, which is a system collection, click the {}
button to turn the input to raw mode and type the collection name manually.
Add one item to the IDs tags - {{$trigger.key}}
- which represents the ID of the file that was uploaded and triggered the Flow to run.
Deepgram provides a huge nested object in response to requests. To set the file description to the formatted transcript provided by Deepgram, set payload to the following:
{
"description": "{{deepgram.data.results.channels[0].alternatives[0].paragraphs.transcript}}"
}
{
"description": "{{deepgram.data.results.channels[0].alternatives[0].paragraphs.transcript}}"
}
Save your flow and test it by uploading an audio file. Wait a few seconds and check out the file editor and observe the description:
The final flow should look like this:
Summary & Next Steps
Now you have transcripts for audio files in Directus, you can begin to run queries against the words spoken. Deepgram also provides us with the tools to build more accessible applications.
Check out the Deepgram documentation for an overview of all features you can use when making requests, including some that use machine learning to provide insights about topics and entities mentioned in the audio file.
If you have any questions, feel free to drop into our very active developer community Discord server.