Audio-to-Text Transcription with OpenAI Whisper
build a retool app that calls Hugging Face Inference API to transcribe audio file into text
prerequisite
a Hugging Face token , we will call Inference API
Intro
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
Whisper is open sourced by OpenAI and can be accessed via Hugging Face Inference API
note: The Inference API is free to use, and rate limited. If you need an inference solution for production, check out Hugging Face Inference Endpoints service
Let's start
61 steps in total
Open Retool website. Here’s an invitation to get 20% off Retool
1 - Click on Create
2 - Click on App
3 - Type "audio-to-text"
4 - Click on Create app
5 - close get started window
6 - Click on highlight(Add)
7 - Click on File Button icon
8 - Drag File Button to canvas
9 - Click on File Button
10 - Type upload audio file
11 - right clicking and selecting “Add components” in the context menu
12 - Click on Add components
13 - Click on Button icon
14 - Button Component will be added in the main frame
15 - Click on Button
16 - Type Transcribe
17 - adjust the position of fileButton. drag fileButton to align left with Transcribe Button
18 - Click on highlight(Code)
19 - Click on highlight
20 - Click on Resource query
21 - rename query, Type "transcribe"
22 - select Resource
23 - Click on RESTQuery (restapi)
24 - access openai/whisper-large-v3 · Hugging Face
Click on Deploy
25 - Click on Inference API…
26 - we can transcribe audio by calling the api
27 - Copy element titled "https://api-inference.huggingface.co/models/openai/whisper-large-v3", this is api url
28 - back to retool app
Click on url field
29 - Paste "https://api-inference.huggingface.co/models/openai/whisper-large-v3" into text area
30 - Click on GET, change method to POST
31 - Click on POST
32 - Click on headers field
33 - back to hugging face
Click on "Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", this is the api secret key, very sensitive, do not disclose it to others. click `show API token` button, will change `xxxxxxx` to real api key
34 - Copy text titled "Authorization"
35 - Paste "Authorization" into text area, header key is `Authorization`
36 - Copy text titled "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
37 - Paste "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" into text area
38 - change body type
39 - Click on binary
40 - check fileButton id, is `fileButton1`
41 - enter body content
42 - enter to {{fileButton1.value[0]}}, variable fileButton1.value refer to file uploaded, so fileButton1.value[0] get the first file in the array. we only upload one file at a time
43 - Click on Preview query
44 - this is api response, we successfully transcribed the audio to text
45 - Click on Save
46 - Click on Run…, test query
47 - Select a file from upload menu
48 - right clicking and selecting “Add components” in the context menu
49 - Click on Add components
50 - Type "text"
51 - Click on Text Area icon
52 - Click on Label
53 - Type Transcribe Result
54 - Click on textArea1
55 - Type {{transcribe.data.text}}
56 - Click on Close
57 - Click on button1…
58 - Click on Event handlers
59 - Click on Edit click handler…. , every time we click a button, it triggers transcribe query
60 - Click on Transcribe
61 - We will transcribe audio
demo
this is my test audio