Text to Speech
Free web tool: Text to Speech
Uses the browser Web Speech API. Available voices vary by browser and operating system.
About Text to Speech
The Text to Speech tool converts any typed or pasted text into spoken audio directly in your browser using the Web Speech API. It is ideal for proofreading your writing by listening to it, creating audio previews, helping language learners hear correct pronunciation, or assisting users with reading difficulties. No files are uploaded, no accounts are needed, and there is nothing to install.
The tool exposes all voices installed on your operating system and browser, which may include dozens of high-quality voices across multiple languages. You can filter for Korean voices by default, making it immediately useful for Korean-language content. The speech rate slider ranges from 0.5x (very slow, great for learning) to 2.0x (fast, great for review), and the pitch slider lets you adjust from a deep 0.5 to a high 2.0.
Technically, this tool uses the browser-native SpeechSynthesis interface introduced as part of the W3C Web Speech API specification. Unlike cloud-based TTS services that require an API key and send your text to remote servers, everything here is processed locally by your operating system's speech engine. The available voices depend on your platform: Windows uses SAPI voices, macOS uses the built-in voices from System Preferences, and Android and iOS use their respective TTS engines.
Key Features
- Converts any text to spoken audio instantly using the Web Speech API
- Lists all system voices installed on your device and browser
- Automatically selects a Korean voice when available
- Speech rate control from 0.5x to 2.0x for slow study or fast review
- Pitch control from 0.5 to 2.0 to customize voice character
- Play and Stop controls with live speaking state indicator
- 100% local processing — text never sent to any server
- Available voices update dynamically when the browser loads new voice data
Frequently Asked Questions
What is the Text to Speech tool?
It is a free, browser-based tool that reads your text aloud using your device's built-in speech synthesis system. It uses the W3C Web Speech API, which is supported in Chrome, Edge, Safari, and most modern browsers.
What voices are available?
The available voices depend on your operating system and browser. Windows typically includes several Microsoft voices, macOS includes high-quality Siri voices, and mobile platforms include their own TTS voices. Chrome on desktop often includes additional Google voices as well.
How do I adjust the reading speed?
Use the Rate slider to control speaking speed. A value of 1.0 is the default normal speed. Set it below 1.0 (down to 0.5) to slow down, which is helpful for language learning or following along. Set it above 1.0 (up to 2.0) to speed up for faster review.
Can I use this for Korean text?
Yes. The tool automatically tries to select a Korean voice if one is installed on your system. On Windows, install the Korean language pack to get Korean TTS voices. On macOS, enable Korean voices in System Preferences > Accessibility > Spoken Content.
Why do I see no voices in the list?
Some browsers load voice data asynchronously after the page loads. If the list is empty, try refreshing the page. Also make sure your browser supports the Web Speech API — Chrome and Edge have the best support.
Can I download the speech as an audio file?
Not directly from this tool, as the Web Speech API does not provide audio recording capability. If you need to save the audio, screen recording tools or dedicated TTS software that exports MP3 files may better suit your needs.
Is my text sent to any server?
No. The Web Speech API processes text locally using your operating system's built-in speech engine. Your text is never transmitted over the network. This makes the tool fully private and usable offline (once the page is loaded).
What is the pitch control for?
Pitch adjusts the frequency of the voice. A higher pitch (closer to 2.0) sounds more like a high voice, while a lower pitch (closer to 0.5) produces a deeper tone. This is useful for distinguishing different speakers when listening to multiple text sections.