text to speech with OpenAI TTS model
build a retool app that calls OpenAI TTS API to turn text into lifelike spoken audio
this is the finished app
the app has these features
when you click
Speak
button, the text will be converted to an audio file and automatically downloadedpreview audio file online
next, let’s learn how to develop the retool app
Prerequisite
create a OpenAPI API
Create a Retool App. Here’s an invitation to get 20% off Retool
Intro
OpenAI TTS API comes with 6 built-in voices (alloy, echo, fable, onyx, nova, and shimmer) and can be used to:
Narrate a written blog post
Produce spoken audio in multiple languages
Give real time audio output using streaming
Here are some examples
Alloy
Echo
Fable
Onyx
Nova
Shimmer
The OpenAI TTS model generally follows the Whisper model in terms of language support. Whisper supports the following languages and performs well despite the current voices being optimized for English:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
You can generate spoken audio in these languages by providing the input text in the language of your choice.
Develop
26 steps in total
1 - right clicking and selecting “Add components” in the context menu
2 - Type "text"
3 - Click on Text Area icon
4 - Type "button"
5 - Click on Button icon
6 - Click on Close
7 - Click on textArea1…
8 - Type Content
9 - Click on button1…
10 - Type Speak
11 - Click on highlight (Code)
12 - Click on highlight (Add query)
13 - Click on Resource query
14 - Change resource to RESTQuery
15 - Click on highlight (url)
Paste "https://api.openai.com/v1/audio/speech" into text area
16 - Change Action type from GET
to POST
17 - set headers
We will add a header, key is Authorization
, value
format is Bearer <your OpenAI API key>
Paste "Authorization" into key text area
Paste "Bearer your-sk" into value text area
18 - set body
there are 3 params, textArea1 is the id of Content text ares component
model: tts-1
input: {{textArea1.value}}
voice: alloy
19 - click highlight (Add Success event handler)
20 - change action from Control query
to Run Script
paste code, below code will convert the audio data returned by the Open API into a downloadable file
utils.downloadFile({base64Binary: query1.data.base64Data}, 'aa', 'mp3')
21 - Click on Save
22 - bind query to Speak button
add click event handler, query is query1
, method is Trigger
query1 will call OpenAI TTS API, will turn text into speech file
loading is {{query1.isFetching}} , when the query is executed, speak button will have a loading indicator
23 - test run query
Type "I from Unite states"
Click Speak button to run query, we will get a audio file named 'aa.mp3'
query can run successfully
23 - Click on Add components
24 - Type "video"
25 - Click on Video icon
26 - Paste "data:audio/mpeg;base64,{{query1.data.base64Data}}" into video URL
the video URL is mp3 file base64 format
that’s all, we’re done with the development, lets’ give it a try
External links
https://platform.openai.com/docs/guides/text-to-speech