DIALØGUE: Building an AI-Powered Podcast Generator from Scratch (And Learning a Ton Along the Way)

Table of Contents

The Spark: Why Build a Podcast Generator?

Well, here’s the thing – I love podcasts. As someone who spends way too much time in traffic (hello, fellow commuters!), I’ve always wondered: what if I could generate a podcast on any topic I’m curious about? Not just a boring AI voice reading Wikipedia, but an actual engaging conversation between AI hosts.

Plus, let’s be honest, after building several smaller projects and writing about my coding journey, I wanted to tackle something bigger. Something that would push me to learn new technologies and maybe, just maybe, create something useful for others. 😛

What Is DIALØGUE?

DIALØGUE is an early-stage application (alpha phase) that generates professional podcasts using AI. You give it a topic – anything from “Fed decisions and market impact in 2025” to “Understanding quantum computing for beginners” – and it creates a complete 20+ – minute podcast with multiple AI voices having an actual conversation.

Here’s what makes it different from just having ChatGPT read you an article:

  • Interactive outline review: This is the game-changer – before any research or writing happens, you get to review the proposed outline and shape it exactly how you want
  • Research-based content: Uses search engine to research facts and current information
  • Natural dialogue: Claude Sonnet 4 writes conversational scripts
  • Multiple voices: High-quality neural voices with different host personalities

The whole process takes about 10 minutes from topic to finished audio file. Not bad for something that would take humans hours or days to produce!

The Tech Stack: A Serverless Adventure (That Evolved)

Alright, let’s get into the technical details (my favorite part!). Here’s what’s powering DIALØGUE:

Frontend

Next.js 15 + React 19: Because I wanted to use the latest and greatest

TypeScript: After getting burned by runtime errors one too many times

Tailwind CSS: Makes styling so much easier for someone who’s not a design wizard

Supabase JS Client: For auth and real-time updates (this was a game-changer)

Backend (Current – GCP)

Cloud Run: 10+ containerized Python microservices with automatic scaling

Cloud Workflows: Orchestrates pre-feedback (outline) and post-feedback (generation) workflows

Cloud Storage: Audio file storage with CDN delivery

API Gateway: Single entry point with CORS and authentication

Supabase: PostgreSQL database with Row Level Security and Edge Functions

*Note: Originally built on AWS Lambda/Step Functions, but migrated to GCP in July 2025 for better performance and 92% cost reduction in audio generation.*

AI Services

Claude 4.0 Sonnet**: Script generation with temperature 0 for JSON reliability (direct Anthropic API)

Perplexity AI: Research and fact-checking for each segment

OpenAI TTS: High-quality neural voices for natural conversation

Content Moderation: Anthropic’s built-in safety checks

Key Features and the User Journey

Here’s how it works from a user’s perspective:

1. Enter a topic: Simple text input, nothing fancy

2. AI generates an outline: Takes about 1 minute – you’ll see the proposed structure and segments

3. Review and shape your podcast: This is where DIALØGUE really shines! You can:

– Redirect the focus (“Make it more beginner-friendly”)

– Add missing context (“Include the recent 2025 developments”)

– Remove or modify segments (“Skip the technical jargon in segment 3”)

– Completely change direction if the AI misunderstood your intent

4. Generate the full podcast: Once you approve the outline, generation takes ~6-10 minutes

5. Download and enjoy: MP3 file ready for your commute

Behind the scenes, it’s doing a lot more:

– Breaking the topic into segments

Waiting for your approval before resource-intensive operations (no wasted credits on unwanted content!)

– Researching each segment with specific queries

– Writing natural dialogue between two AI hosts

– Handling errors gracefully (and refunding credits if things go wrong)

– Real-time progress updates so you know what’s happening

The Good, The Challenging, and The “Oh No” Moments

The Good

The outline review feature: Users love being able to shape their podcast before generation starts. It’s like having a conversation with your AI producer!

10x performance improvement by switching to direct Supabase queries (450ms → 45ms)

Instant user signup: Fixed the 3-minute delay bug with atomic Edge Functions (now < 500ms)

Automatic credit refunds when generation fails via database triggers

Real-time updates that actually work (thanks, Supabase!)

92% cost reduction in audio generation after GCP migration

Clean database-first architecture after removing Lambda legacy code

The Challenging (Now Solved!)

AWS Lambda layer hell: Import errors, 250MB size limits (solved by GCP migration)

JWT security migration: Upgraded from HS256 to P-256 while maintaining backward compatibility

AI temperature settings: Claude at 0.7 was generating invalid JSON 30% of the time (fixed with temperature 0)

WebSocket memory leaks: React components were leaking 50MB/hour (fixed with RealtimeManager)

Database race conditions: New users waited 3 minutes due to replication lag (fixed with atomic operations)

Credit system complexity: Simplified from dual credits to single type

The “Oh No” Moments

– That time I accidentally stored critical workflow data in the wrong place

– When I realized mysterious browser errors were from my own code exhausting resources

– Discovering security vulnerabilities during a routine audit (all fixed now!)

What I Learned (Spoiler: A Lot)

This project pushed me way outside my comfort zone, and I learned tons:

1. User control is crucial: The outline review feature wasn’t in my original design, but it became the most important feature. Letting users shape content before generation starts saves time, credits, and frustration

2. Start simple, migrate when needed: We began with AWS Lambda but hit complexity walls – the migration to Cloud Run solved everything

3. Direct database queries can be faster: My 10x performance improvement came from ditching unnecessary API layers

4. AI costs add up: Running multiple AI services for a single podcast requires careful cost management

5. User experience matters: Adding progress indicators and time estimates made a huge difference

6. Security is never “done”: Regular audits revealed issues I never would have thought of

7. Infrastructure as Code has gotchas: SAM’s quirks taught me a lot (like SSMParameterReadPolicy adding extra slashes!)

8. Cloud migrations can be surprisingly fast: With AI pair programming, we migrated from AWS to GCP in just one day!

Current Status and What’s Next

DIALØGUE is now live! It’s in alpha phase with:

– 2 free credits for new users

– Credit packs available:

– Starter: $4.99 for 4 podcasts

– Pro: $9.99 for 9 podcasts

– Bulk: $19.99 for 18 podcasts

– No refunds during alpha (except automatic refunds for technical failures)

Want to Try It?

I’d love for you to give it a try! Head over to podcast.chandlernguyen.com and create your first AI podcast. The first 2 are free, so you’ve got nothing to lose.

Fair warning: it’s still in alpha, so things might break. But hey, that’s part of the fun, right? If you do run into issues, there’s a feedback feature built right into the app (only for logged-in users – had to add that after some spam issues).

Final Thoughts

Building DIALØGUE has been one of the most challenging and rewarding projects I’ve tackled. It combined everything I’ve been learning – from AWS Lambda functions to React components to AI prompt engineering – and even led to an unexpected cloud migration journey.

The most surprising discovery? That outline review step I mentioned earlier. Initially, I thought users would just want to input a topic and get a podcast. But in testing, I realized that giving users control over the direction before the main generation process starts makes all the difference. It transforms the tool from a black box into a collaborative AI assistant.

Is it perfect? Nope. Is it useful? I think so! At the very least, it’s been an incredible learning journey, and I’m excited to see where it goes from here.

What would you create a podcast about? I’m genuinely curious – drop me a message or try it out yourself. Who knows, with the ability to shape and guide the content, your AI-generated podcast might be exactly what you’re looking for. 😛

Want the technical deep-dive? Follow the full journey:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.