I Added 5 Languages to DIALØGUE in 48 Hours

A few days ago, a user asked if DIALØGUE supported Spanish.

It didn’t.

48 hours later, it supports five languages—Spanish, Vietnamese, Japanese, Korean, and Mandarin Chinese. Not translated UI buttons. Full native localization. Podcasts that sound like they were made for each market.

Five languages in 48 hours sounds like I hired a team or pulled all-nighters. I didn’t.

I gave Claude Code high-level direction. It wrote the planning documents, then executed them—often running parallel agents on different files simultaneously.

The Secret: Plan Once, Execute Fast

Here’s the interesting part: I didn’t write the planning documents. Claude Code did—with my guidance.

I described what I wanted: “Create a localization style guide that explains how to translate naturally, not literally. Include examples for Spanish, Vietnamese, Japanese, Korean, and Chinese. Cover formality levels and cultural context.”

Claude Code produced a 1,000+ line style guide, a comprehensive checklist, and a backend pipeline playbook. Then it followed its own plans to execute the work.

1. The Style Guide

Not a translation glossary—a philosophy document. It explains *how* to localize, with before/after examples for every language.

Example entry for Spanish:

| English | Literal (Wrong) | Natural (Right) |

|———|—————–|—————–|

| “Ideas, Produced” | “Ideas, Producidas” | “De la Idea, al Podcast” |

| “Researched, Not Recycled” | “Investigado, No Reciclado” | “Contenido Original, Sin Copiar” |

| “You Direct” | “Tú Diriges” | “Tú Decides” |

The literal translations are grammatically correct but sound like Google Translate. The natural versions capture the *intent*—they’re what a Spanish-speaking marketer would actually write.

The guide covers formality levels (Japanese uses です/ます form, Korean uses 합쇼체), cultural context, common mistakes to avoid. When Claude Code translates, it follows these principles—the ones it wrote—automatically.

2. The Localization Checklist

A comprehensive checklist of every file that needs to change for a new language:

Required (must have):

– Frontend: `messages/{locale}.json` (~2,200 keys)

– Backend: `locales/{locale}.json` (~82 keys)

– Language Utils: 3 functions in `language_utils.py`

– Speech Generation: 1 function in `gemini_voice_instructions.py`

– Voice Preview: 7 localized sentences

– Content Moderation: Keywords + prompts

– Smoke Tests: Add to test matrix

Optional (enhanced quality):

– Host/audience profiles per style

– Localized prompt templates

– Voice instructions per style

This checklist ensures nothing gets missed. Claude Code works through it systematically.

3. The Backend Playbook

Documents exactly how podcast generation handles language:

– How outline generation uses Gemini 3 Flash with language instructions

– How dialogue generation uses Claude Sonnet 4.5 with cultural context

– How speech generation uses Gemini TTS with language-specific guidance

With these three documents in place, adding a new language becomes mechanical. Claude Code reads the checklist, follows the style guide, and updates files according to the playbook.

The Problem With “Just Translate It”

When most apps add languages, they run their strings through a translation API and call it done. “Create Podcast” becomes “Crear Podcast” and everyone moves on.

That approach creates garbage.

Here’s why: translation preserves words, localization preserves meaning. The style guide encodes this difference.

| Language | Literal Translation | Native Localization |

|———-|——————–|——————–|

| Vietnamese | “Ý tưởng, Được Sản Xuất” | “Từ Ý Tưởng → Đến Podcast” |

| Japanese | “アイデア、制作された” | “アイデアを、カタチに” (Ideas into form) |

| Korean | “아이디어, 제작됨” | “아이디어를, 현실로” (Ideas into reality) |

| Spanish | “Ideas, Producidas” | “De la Idea, al Podcast” |

| Chinese | “想法,已制作” | “从想法,到播客” |

Every language required this level of attention. Every button, every error message, every piece of marketing copy.

And that’s just the UI. The actual podcast generation is harder.

Podcasts Need Native Hosts

Here’s something I didn’t anticipate: host names matter.

In the English version, the default hosts are Alex and Maya. Generic, forgettable, works fine.

But if you’re generating a Vietnamese podcast and the hosts are still named Alex and Maya? It sounds off. It sounds like a bad dub of an American show.

So each language got native host names:

| Language | Host 1 | Host 2 |

|———-|——–|——–|

| English | Alex | Maya |

| Vietnamese | Minh | Lan |

| Japanese | 太郎 (Tarō) | 花子 (Hanako) |

| Korean | 민준 (Minjun) | 수진 (Sujin) |

| Spanish | Carlos | María |

| Chinese | 明辉 | 雅琴 |

But names alone aren’t enough. The AI needs to understand cultural context.

A tech podcast in Japan has different conversational norms than one in the US. The level of formality, the way disagreement is expressed, the rhythm of back-and-forth—it’s all different.

So I created language-specific AI prompt instructions. The Vietnamese instructions specify “bạn” (informal but respectful “you”). The Japanese instructions specify です/ます form (polite but friendly). The Korean instructions specify 합쇼체 (formal polite).

This is the kind of thing you can’t automate. You need to understand each culture’s communication patterns and encode them into the prompts. But once encoded, Claude Code applies them consistently across thousands of strings.

The AI Model Selection That Actually Works

Here’s where it gets interesting from a technical perspective.

DIALØGUE uses three different AI models for three different tasks. Not because I wanted complexity, but because each model is genuinely better at its specific job.

Task 1: Research & Outline → Gemini 3 Flash

When a user enters a topic, the first step is research. The AI needs to:

1. Search the web for current information

2. Synthesize findings into a structured outline

3. Cite sources properly

This requires grounding (real-time web search) AND structured output (JSON schema compliance).

Here’s the problem: most models can’t do both simultaneously. Gemini 2.5 Flash supports grounding. It supports structured output. But not together. You have to pick one.

Gemini 3 Flash Preview is the only model I’ve found that supports both features at the same time. So it handles the outline generation.

```python
GEMINI_MODEL = "gemini-3-flash-preview"

# Key capability:
GEMINI_MODEL_CONFIGS = {
    "gemini-3-flash-preview": {
        "supports_grounding": True,
        "supports_structured_output": True,
        "supports_grounding_with_structured_output": True,  # This is the key
    }
}
```

The outline comes back in about 60 seconds with real sources, properly structured.

Task 2: Dialogue Generation → Claude Sonnet 4.5

Once the user approves the outline, the AI writes the full script—every word both hosts will say.

This is creative writing. It needs personality, natural rhythm, the feel of a real conversation.

I tested this with multiple models. Claude’s dialogue consistently sounds more human. The back-and-forth feels authentic. The hosts interrupt each other naturally. They build on each other’s points.

It’s subjective, but after generating hundreds of test podcasts, the difference is noticeable. Claude Sonnet 4.5 handles dialogue generation.

```python
ANTHROPIC_MODEL = "claude-sonnet-4-5-20250929"
```

Task 3: Audio Generation → Gemini 2.5 Flash TTS

Finally, the script needs to become audio. Two voices, proper pacing, emotional expression.

Gemini 2.5 Flash TTS offers 30 distinct voices with multi-speaker support. You can have Charon (informative, male) and Kore (firm, female) in the same audio file, naturally switching between speakers.

The quality is good enough that users often ask if it’s real voice actors.

```python
# Multi-speaker synthesis with Gemini TTS
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents=transcript,
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
                speaker_voice_configs=speaker_configs
            )
        )
    )
)
```

The Pipeline

So the full flow is:

```
User Input
    ↓
[Gemini 3 Flash] Research + Structured Outline (~60 sec)
    ↓
User Review & Feedback
    ↓
[Claude Sonnet 4.5] Full Dialogue Script (~2-3 min)
    ↓
User Review & Edit
    ↓
[Gemini 2.5 Flash TTS] Multi-speaker Audio (~3-5 min)
    ↓
Finished Podcast
```

Each model does what it’s best at. The total generation time is about 8-10 minutes for a 15 – 30 minute podcast.

One Feature That Surprised Me

Users asked for something I hadn’t considered: generating podcasts in a different language than their UI.

A Vietnamese user might be comfortable navigating the app in Vietnamese but want to create an English podcast for an international audience. Or a Spanish-speaking marketer might want to test content in Japanese.

So I added a podcast language selector. Your UI can be in English while you generate a podcast in Mandarin. The hosts will have Chinese names (明辉 and 雅琴), the dialogue will follow Chinese conversational patterns, and the audio will use appropriate voices.

It’s a small feature, but it dramatically expands what users can do.

Parallel Agents: The Force Multiplier

The real speed came from parallel agents. The checklist notes this explicitly:

> “Run parallel agents for backend files – Backend files are smaller and can be processed in parallel (language_utils.py, content_moderation.py, gemini_voice_instructions.py, locales/{locale}.json).”

When I said “Add Korean support,” Claude Code would spawn multiple agents:

– One agent updating the frontend translation file

– Another agent adding Korean cases to `language_utils.py`

– Another updating `gemini_voice_instructions.py`

– Another adding Korean to `content_moderation.py`

All running simultaneously. When they finished, Claude Code merged the changes and ran tests.

This is why the checklist distinguishes “Required” files from “Optional” files—so Claude Code knows what to parallelize versus what can wait.

For the frontend translation file (~2,200 keys), the checklist specifies a chunking strategy:

– Chunk 1: `common`, `nav`, `bottomNav` (~110 keys)

– Chunk 2: `homepage`, `auth` (~180 keys)

– Chunk 3: `dashboard`, `create` (~330 keys)

– And so on…

This prevents token limit issues and lets Claude Code translate namespace by namespace, reviewing each against the style guide.

The scope:

– 5 languages added

– ~2,500 translation strings total per language

– 10+ AI prompt templates per language

– 1 demo podcast generated per language

– Multiple parallel agents per language

The key insight: I didn’t write code. I didn’t write plans. I gave direction, reviewed output, and let Claude Code handle the rest—often in parallel.

What’s Next

The six languages cover about 2 billion potential users. That’s meaningful reach.

But I’m already thinking about what’s missing:

1. Arabic – Right-to-left layout support is complex

2. Hindi – Huge market, different script

3. Portuguese – Brazil is a massive podcast market

4. French – European expansion

5. Tagalog – Strong Filipino creator community

Each one requires the same level of care. Native localization, not translation. Cultural prompts, not copied templates.

The good news: the planning documents are reusable. Adding the next language should be even faster.

Try It Yourself

DIALØGUE now supports:

English – The original

Vietnamese – Từ Ý Tưởng → Đến Podcast

Japanese – アイデアを、カタチに

Korean – 아이디어를, 현실로

Spanish – De la Idea, al Podcast

Chinese – 从想法,到播客

Create a podcast in your language →

The UI will follow your chosen preference. And remember—you can generate podcasts in any of these languages regardless of your UI setting.

New users get 2 free podcasts to test it out.

The Takeaways

For builders using AI coding assistants:

1. Direct, don’t write. I didn’t write the planning docs—I described what I wanted and Claude Code produced them. Then it followed its own plans.

2. Structure enables scale. Style guides, checklists, playbooks—these aren’t just for humans anymore. Structure your requirements so an AI agent can execute systematically.

3. Match the model to the task. For DIALØGUE:

– Gemini 3 Flash for grounded, structured research

– Claude Sonnet 4.5 for natural dialogue

– Gemini 2.5 Flash TTS for multi-speaker audio

4. Localize, don’t translate. Your users can tell the difference. Encode cultural knowledge into your prompts and let the AI apply it consistently.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.