Turn text into audio — real-world scenarios

When we launched the service, I assumed there'd be one use case — "person uploads their book." Reality turned out more interesting. People bring everything: lecture notes before exams, longreads from Substack, a behavioral economics textbook they don't have time for, an unpublished novel, a grandmother's memoirs. Each needs different settings, and a generic guide doesn't help.

So below — six concrete scenarios. Not exhaustive, just the ones I see most. If yours doesn't match exactly, find the closest and adjust.

1. Lecture notes before an exam

Familiar setup: semester ends, forty lectures of notes, you can't read them in their dry form. I have a friend who spent her whole junior year listening to her own notes at the gym and on the way to class. Her take: it actually helped, not because of AI magic but because going through the same material with your ears, days after writing it, sticks better than re-reading it for the tenth time.

What to set: one narrator voice, neutral, businesslike. Pace 1.15–1.25, because notes aren't literature and listening fast doesn't hurt comprehension. No character casting.

What to expect to lose: formulas and charts just disappear. If your notes are mostly equations, listening doesn't help — go back to the original. Foreign-language terms can come out odd, and in critical spots it's worth transliterating them in the source. Splitting by topic into chapters helps you jump back to a specific section later.

2. A long article off the internet

"Found a great longread, thirty thousand characters, no time to read now, want to listen on the train" — the most common scenario after books.

The prep step is the important one. Copy the text from the page into a notepad. Remove ad blocks, navigation, "see also," image captions. Keep only the body. Save as .txt or .md. If the article has subheadings, keep them as ## Heading — the service will turn them into chapters.

Voice: depends on topic. Technical — businesslike. Journalism — slightly livelier. Personal essay — warmer. Pace 1.0; speed up in the player on the move.

One thing that helped me. Long articles tend to come with filler — repeated points, surplus examples, rhetorical detours. Before render I usually trim. Saves both listening time and the characters you're paying for.

And one caveat. Not every article survives audio. If the piece relies on infographics, screenshots, tables — half the meaning stays on the page. Pick longreads where the substance is in the words.

3. A textbook for self-study

Sometimes you buy a 300-page book on psychology, history, or marketing, and sitting down to read it cover to cover never happens. You'd at least like to walk through the material.

Ideal source — epub or fb2. PDF is possible but means a text conversion that always comes out uneven; you'll spend half an hour cleaning.

Voice: one, neutral, like a competent lecturer. Not too young, not too old, somewhere in between. Pace 1.0 for new material, 1.2 for review. Casting usually unnecessary, except: if the book has lots of attributed direct quotes ("Marx wrote…", "Jung observed…") you can give the quotes a different voice — improves retention.

What doesn't work in textbooks: footnotes — strip them before render, they break flow. Tables and diagrams — gone. A statistics or econometrics textbook in audio loses half its value. A history, theory, or psychology textbook actually gains — verbal absorption beats visual for many readers.

And remember: a 500-page textbook is twenty-plus hours of audio. You don't listen to it in one sitting, don't try.

4. Your own book before publication

This is the scenario I plug to every author I talk to. Pushing your own manuscript through AI narration is the best diagnostic you can give yourself before submitting to an editor.

For the narrator voice, deliberately don't pick the one that sounds like your inner reading voice. Pick the opposite, so you hear the text "through someone else's ears." This is critical — your text in your imagined voice gets auto-completed mentally; you'll fill in what you meant. You need distance.

Pace 0.95–1.0. Slow enough that you can hear when a sentence limps. Faster — you'll skim past it.

Casting — just take the automation. The goal isn't a final product, it's hearing how your text walks. The important part is to listen with a notebook or notes app. When you catch a clunky line, write it down, keep going. Three or four hours later you'll have a list of edits no eyes-only proofread will give you.

Pay extra attention to dialogue. Audio surfaces unnatural lines instantly. And to long paragraphs without breaks — if it sounds suffocating, you need to break it up.

5. Translation in another language

Your translation of a book or article, or someone else's, and you want to hear how it lands in the target language.

The main thing is matching language to voice. English voice for English text, Russian for Russian. Don't experiment with cross-language combinations, they sound wrong. Pace standard — listening to translation in a non-native language fast is its own challenge.

Quality is excellent for English and Russian, good for German and French, medium for Polish and Ukrainian (stress sometimes drifts), still experimental for Korean, Japanese, Arabic. I wouldn't ship those last three.

Audio is great at catching translation errors. Awkward phrases or typos jump out within the first chapter. It's basically the same diagnostic as on your own book.

6. Memoirs and letters from family

A scenario I treat with extra care. A grandmother wrote her wartime memoirs. A grandfather kept a journal. Someone transcribed letters from the front. You'd like to keep this not just as text but as audio.

Source is often handwritten and needs to be digitized. Either type it out (manageable for short volumes) or run OCR with a careful proofread — handwriting recognition still misses things.

Voice: older, warm, matching the author's gender. Don't try to artificially imitate the actual person's specific voice — it won't work. Pick a general tone that doesn't dissonate.

Pace slow, around 0.9. Memoirs don't read fast, and in audio that's especially audible.

Emotion: restrained. Memoirs are often about heavy things, and over-emoted AI turns the text into pathos. Better even, calm, with respect.

What I usually tell people who come with these projects: don't edit the text. Keep the author's stylistic quirks, their phrasings, even what looks to you like errors. That's part of the person's voice, and for a family archive that matters more than literary polish.

And don't forget proper nouns. Cities, people, events — verify stress; for a personal archive, this is critical. Format: mp3 or ogg, so it opens on every relative's phone without trouble.

What's true across all of them

Text prep is almost always more important than service settings. Ten minutes cleaning the source saves hours of redo. I usually read the first two pages as-is, and it's already obvious what should go.

Test on a short chapter. Don't render 500,000 characters of something you haven't sanity-checked at 5,000. This is a rule I had to learn the painful way myself.

Listen in headphones at least the first time. Speakers smooth out artifacts, and on speakers a book can seem perfect. Headphones surface details you'll need to revisit settings for.

And finally, don't expect perfection. AI narration in 2026 is a good compromise, not a replacement for sitting down and reading attentively with a highlighter and margin notes. It's a different way of consuming text, with its own strengths and limits. Treated that way, it works.