AI narration for self-published authors — a guide

I know a fair number of authors who, five years ago, finished a good book, pushed it to KDP and a few indie storefronts, and then quietly let go of the idea of an audio edition. Because an audio edition meant $1,500 to $3,000 in a studio, and that was just a different budget — one indies don't get to. In 2026 the same job costs $50–100 and a couple of days, without leaving your desk. People keep asking how, so here's the actual workflow.

First — is it even worth it

There's a temptation to ship audio because you suddenly can. Not every text needs it, and I'd answer three questions honestly before opening the wallet.

Ask your audience directly. In your social, your reader chat, your newsletter: "would you actually listen to this?" Sometimes 80% of replies tell you they read with their eyes and aren't switching. Then audio is an exercise for you, not a product for them.

Think about genre. Non-fiction, YA, genre fiction land cleanly in audio. Literary fiction is more reliably read than listened to. Poetry in AI narration is something I wouldn't ship.

Think about length. A short story or novella is a "bonus to the ebook" — a promo, a gift to subscribers. A long novel is its own product with its own pricing and promotion plan, and that's a different project.

If those answers line up — go.

What you actually need

A finished text in a sane format — epub, fb2, markdown, txt. Finished, because any post-render edit means re-rendering, re-uploading, and a couple of extra hours of fiddly work.

An account with an AI narration service. Picking one is its own pain; I have a comparison post where I ran five popular ones through the same text.

A budget of $30–100 per book, depending on length and service.

A day to three of your time, total. Most of it is listening to early renders and tweaking.

Step one. Clean the text

You can get the file you upload into reasonable shape in about half an hour, and that half hour saves you several later.

Remove stray blank lines and ragged indents from your Word export. Add explicit chapter headers — # Chapter 1, # Chapter 2 — regardless of what your editor outputs. Check dialogue: it should be marked with quotes or em-dashes, ideally with attribution ("she said"). Footnotes and "see page 47" cross-references should either go or get reformatted; AI doesn't read them as humans do, and they break the flow in audio.

If you know specific words your model botches — proper nouns, technical terms, place names — and your service supports manual stress markers, mark them in the source. I once lost three render passes to a single character name before I just put + on the right vowel and was done with it.

Step two. Voices

The thing I want to say up front: don't trust full automation. It's good enough for a draft, to feel out how the book sounds in principle. The final pass needs human judgment, at least on the main roles.

The narrator voice is the headline. That's seventy percent of your listening time, and if you don't like it, you're miserable for fifteen hours. Pick one you personally want to listen to. Not "right for the genre" — the one that doesn't grate after thirty minutes.

Main characters, three to five of them, deserve manual selection. The voice should read as "different person" by their first line; otherwise listeners get lost in dialogue. Secondary characters can take whatever automation gives them, but spend a minute scanning the assignments to make sure nobody got an actively wrong tone (a grandmother shouldn't sound twelve).

Style hints help. Short and direct beats long and cinematic: "cold, detached," "with irony," "slow and thoughtful" land cleanly. Elaborate descriptions ("aggressive cold baritone of a senior officer") tend to confuse the model more than they help.

Step three. One chapter, then everything else

Don't render the whole book in one shot. This rule cost me my first attempt — I queued 600,000 characters at once, came back hours later to a finished audio, and discovered the protagonist's last name had stress on the wrong syllable in all 200 mentions. Full re-render.

So: render chapter one. Listen end to end, in headphones, not at 2x. What to catch — stress on your own surnames and place names, voices (do they sound how you imagined), pause naturalness, long-sentence performance (where AI tends to thin out).

Chapter one's clean? Push the rest. Not? Adjust and try again. This isn't wasted time, it's insurance.

Step four. Cover and metadata

Audio lives by different rules than ebook on the platforms. The audiobook cover almost always needs to be made fresh — square 3000×3000, title legible at thumbnail size in a phone player. An ebook cover designed for portrait with a fine font and a wide landscape will turn into mush on a player.

Designer for the cover adaptation: $50–150. If you don't have one, you can put together a basic square in Canva in an evening.

Metadata duplicates: title, author, description, genre, keywords. Same content as the ebook, separate upload.

Step five. Where to put it

The big four for English-language indie audio: Audible (via ACX), Findaway Voices (which fans out to many platforms), Storytel, Kobo. ACX is the obvious entry point if you're aiming at the Audible audience. Findaway gives you the widest distribution from one upload. Storytel and Kobo have their own indie programs.

Royalty splits through these channels run 30–50%. It's a lot, but it's the price of distribution.

The alternative is selling directly through your own site, Substack, Patreon, or Bandcamp. You keep everything, but you handle the traffic and payments yourself. For an established audience this works; for a new one, less so.

Most indies I know in 2026 do a hybrid: main launch through the big platforms, exclusive formats and early access through their own channel.

Step six. It does not market itself

This is the part I want to underline. AI narration is a production technology, not marketing. Your book won't randomly surface in Audible's "new in audio" carousel. You have to bring your audience.

What actually moves the first hundred to thousand listens:

Short clips on TikTok and Reels — the first interesting minute, with a "find the rest at" caption.
Promo codes for your existing audience — newsletter, Discord, Patreon.
Guest swaps with same-genre indie authors, honest cross-pollination.
Beta listeners — ten to twenty people get the audio free in exchange for an honest review.
Indie publishing podcasts (there are many, they always want guests).

Things I've stepped in, in case you don't have to

Quotes from other authors. If your book has direct quotations, verify the rights. Fair use varies by jurisdiction, and audio sometimes has stricter limits.

Final proof. Once audio is published, fixing a typo means re-rendering the chapter, re-uploading to platforms, sometimes refreshing metadata. A pile of micro-work. So a week before launch, do one more text pass.

The cheap-service trap. Saving 50% of the budget can cost you reviews from listeners who hear the electronic edge. If this is a book you care about, don't optimize for the lowest price.

Transparency with readers. In the audio edition's description I'd just write "AI narration" out loud. Most people are fine with it. A few are categorically not. But nobody appreciates finding out themselves. Trust matters more than the awkwardness of the disclosure.

In one paragraph

AI narration isn't magic. It's a new working tool that's replaced the studio for most indie scenarios. It won't make your book a bestseller on its own. It will give you a door that didn't exist five years ago — shipping audio without going broke. If your book already has readers in text, audio is the natural next step. If it doesn't, fix marketing first, audio later. No service routes around that order.