text to speech with OpenAI TTS model

build a retool app that calls OpenAI TTS API to turn text into lifelike spoken audio

Dec 23, 2023

this is the finished app

the app has these features

when you click Speak button, the text will be converted to an audio file and automatically downloaded
preview audio file online

next, let’s learn how to develop the retool app

Prerequisite

create a OpenAPI API
Create a Retool App. Here’s an invitation to get 20% off Retool

Intro

OpenAI TTS API comes with 6 built-in voices (alloy, echo, fable, onyx, nova, and shimmer) and can be used to:

Narrate a written blog post
Produce spoken audio in multiple languages
Give real time audio output using streaming

Here are some examples

Alloy

1×

0:00

-0:07

Echo

1×

0:00

-0:08

Fable

1×

0:00

-0:09

Onyx

1×

0:00

-0:08

Nova

1×

0:00

-0:09

Shimmer

1×

0:00

-0:08

The OpenAI TTS model generally follows the Whisper model in terms of language support. Whisper supports the following languages and performs well despite the current voices being optimized for English:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

You can generate spoken audio in these languages by providing the input text in the language of your choice.

Develop

26 steps in total

1 - right clicking and selecting “Add components” in the context menu

Click on Add components

2 - Type "text"

Type "text"

3 - Click on Text Area icon

Click on Text Area icon

4 - Type "button"

Type "button"

5 - Click on Button icon

Click on Button icon

6 - Click on Close

Click on Close

7 - Click on textArea1…

Click on textArea1…

8 - Type Content

Type Content

9 - Click on button1…

Click on button1…

10 - Type Speak

Type Speak

11 - Click on highlight (Code)

Click on highlight

12 - Click on highlight (Add query)

Click on highlight

13 - Click on Resource query

Click on Resource query

14 - Change resource to RESTQuery

Click on RESTQuery (restapi)

15 - Click on highlight (url)

Click on

Paste "https://api.openai.com/v1/audio/speech" into text area

Paste "https://api.openai.com/v1/audio/speech" into text area

16 - Change Action type from GET to POST

Click on GET

Click on POST

17 - set headers

We will add a header, key is Authorization, value format is Bearer <your OpenAI API key>

Paste "Authorization" into key text area

Paste "Authorization" into text area

Paste "Bearer your-sk" into value text area

Paste "Bearer your-sk" into text area

18 - set body

there are 3 params, textArea1 is the id of Content text ares component

model: tts-1
input: {{textArea1.value}}
voice: alloy

Click on model

19 - click highlight (Add Success event handler)

Click on Success

20 - change action from Control query to Run Script

Click on Run script

paste code, below code will convert the audio data returned by the Open API into a downloadable file

utils.downloadFile({base64Binary: query1.data.base64Data}, 'aa', 'mp3')

21 - Click on Save

Click on Save

22 - bind query to Speak button

add click event handler, query is query1 , method is Trigger

query1 will call OpenAI TTS API, will turn text into speech file

loading is {{query1.isFetching}} , when the query is executed, speak button will have a loading indicator

23 - test run query

Type "I from Unite states"

Click Speak button to run query, we will get a audio file named 'aa.mp3'

query can run successfully

23 - Click on Add components

Click on Add components

24 - Type "video"

Type "vide"

25 - Click on Video icon

Click on Video icon

26 - Paste "data:audio/mpeg;base64,{{query1.data.base64Data}}" into video URL

Click on Content…

the video URL is mp3 file base64 format

that’s all, we’re done with the development, lets’ give it a try

External links

https://platform.openai.com/docs/guides/text-to-speech

Donate Subscriptions

Thank you for reading Retool in Action. This post is public so feel free to share it.

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts