Audio-to-Text Transcription with OpenAI Whisper

build a retool app that calls Hugging Face Inference API to transcribe audio file into text

Nov 25, 2023

a Hugging Face token , we will call Inference API

Intro

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Whisper is open sourced by OpenAI and can be accessed via Hugging Face Inference API

note: The Inference API is free to use, and rate limited. If you need an inference solution for production, check out Hugging Face Inference Endpoints service

Let's start

61 steps in total

Open Retool website. Here’s an invitation to get 20% off Retool

1 - Click on Create

2 - Click on App

3 - Type "audio-to-text"

4 - Click on Create app

5 - close get started window

6 - Click on highlight(Add)

7 - Click on File Button icon

8 - Drag File Button to canvas

9 - Click on File Button

10 - Type upload audio file

11 - right clicking and selecting “Add components” in the context menu

12 - Click on Add components

13 - Click on Button icon

14 - Button Component will be added in the main frame

15 - Click on Button

16 - Type Transcribe

17 - adjust the position of fileButton. drag fileButton to align left with Transcribe Button

18 - Click on highlight(Code)

19 - Click on highlight

20 - Click on Resource query

21 - rename query, Type "transcribe"

22 - select Resource

23 - Click on RESTQuery (restapi)

24 - access openai/whisper-large-v3 · Hugging Face

Click on Deploy

25 - Click on Inference API…

26 - we can transcribe audio by calling the api

27 - Copy element titled "https://api-inference.huggingface.co/models/openai/whisper-large-v3", this is api url

28 - back to retool app

Click on url field

Click on https://example_site.com/api/v2/endpoint.json…

29 - Paste "https://api-inference.huggingface.co/models/openai/whisper-large-v3" into text area

30 - Click on GET, change method to POST

31 - Click on POST

32 - Click on headers field

33 - back to hugging face

Click on "Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", this is the api secret key, very sensitive, do not disclose it to others. click `show API token` button, will change `xxxxxxx` to real api key

34 - Copy text titled "Authorization"

35 - Paste "Authorization" into text area, header key is `Authorization`

36 - Copy text titled "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

37 - Paste "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" into text area

38 - change body type

39 - Click on binary

40 - check fileButton id, is `fileButton1`

41 - enter body content

42 - enter to {{fileButton1.value[0]}}, variable fileButton1.value refer to file uploaded, so fileButton1.value[0] get the first file in the array. we only upload one file at a time