DIALØGUE: How I Created an AI Podcast Generator with Human-in-the-Loop Design

The Spark: Why Build a Podcast Generator?
What Is DIALØGUE?
The Tech Stack: A Serverless Adventure (That Evolved)
Key Features and the User Journey
The Good, The Challenging, and The “Oh No” Moments
What I Learned (Spoiler: A Lot)
Current Status and What’s Next
Want to Try It?
Final Thoughts

The Spark: Why Build a Podcast Generator?

Well, here’s the thing – I love podcasts. As someone who spends way too much time in traffic (hello, fellow commuters!), I’ve always wondered: what if I could generate a podcast on any topic I’m curious about? Not just a boring AI voice reading Wikipedia, but an actual engaging conversation between AI hosts.

Plus, let’s be honest, after building several smaller projects and writing about my coding journey, I wanted to tackle something bigger. Something that would push me to learn new technologies and maybe, just maybe, create something useful for others. 😛

What Is DIALØGUE?

DIALØGUE is an early-stage application (alpha phase) that generates professional podcasts using AI. You give it a topic – anything from “Fed decisions and market impact in 2025” to “Understanding quantum computing for beginners” – and it creates a complete 20+ – minute podcast with multiple AI voices having an actual conversation.

Here’s what makes it different from just having ChatGPT read you an article:

Interactive outline review: This is the game-changer – before any research or writing happens, you get to review the proposed outline and shape it exactly how you want
Research-based content: Uses search engine to research facts and current information
Natural dialogue: Claude Sonnet 4 writes conversational scripts
Multiple voices: High-quality neural voices with different host personalities

The whole process takes about 10 minutes from topic to finished audio file. Not bad for something that would take humans hours or days to produce!

The Tech Stack: A Serverless Adventure (That Evolved)

Alright, let’s get into the technical details (my favorite part!). Here’s what’s powering DIALØGUE:

Frontend

– Next.js 15 + React 19: Because I wanted to use the latest and greatest

– TypeScript: After getting burned by runtime errors one too many times

– Tailwind CSS: Makes styling so much easier for someone who’s not a design wizard

– Supabase JS Client: For auth and real-time updates (this was a game-changer)

Backend (Current – GCP)

– Cloud Run: 10+ containerized Python microservices with automatic scaling

– Cloud Workflows: Orchestrates pre-feedback (outline) and post-feedback (generation) workflows

– Cloud Storage: Audio file storage with CDN delivery

– API Gateway: Single entry point with CORS and authentication

– Supabase: PostgreSQL database with Row Level Security and Edge Functions

*Note: Originally built on AWS Lambda/Step Functions, but migrated to GCP in July 2025 for better performance and 92% cost reduction in audio generation.*

AI Services

– Claude 4.0 Sonnet**: Script generation with temperature 0 for JSON reliability (direct Anthropic API)

– Perplexity AI: Research and fact-checking for each segment

– OpenAI TTS: High-quality neural voices for natural conversation

– Content Moderation: Anthropic’s built-in safety checks

Key Features and the User Journey

Here’s how it works from a user’s perspective:

1. Enter a topic: Simple text input, nothing fancy

2. AI generates an outline: Takes about 1 minute – you’ll see the proposed structure and segments

3. Review and shape your podcast: This is where DIALØGUE really shines! You can:

– Redirect the focus (“Make it more beginner-friendly”)

– Add missing context (“Include the recent 2025 developments”)

– Remove or modify segments (“Skip the technical jargon in segment 3”)

– Completely change direction if the AI misunderstood your intent

4. Generate the full podcast: Once you approve the outline, generation takes ~6-10 minutes

5. Download and enjoy: MP3 file ready for your commute

Behind the scenes, it’s doing a lot more:

– Breaking the topic into segments

– Waiting for your approval before resource-intensive operations (no wasted credits on unwanted content!)

– Researching each segment with specific queries

– Writing natural dialogue between two AI hosts

– Handling errors gracefully (and refunding credits if things go wrong)

– Real-time progress updates so you know what’s happening

The Good, The Challenging, and The “Oh No” Moments

The Good

– The outline review feature: Users love being able to shape their podcast before generation starts. It’s like having a conversation with your AI producer!

– 10x performance improvement by switching to direct Supabase queries (450ms → 45ms)

– Instant user signup: Fixed the 3-minute delay bug with atomic Edge Functions (now < 500ms)

– Automatic credit refunds when generation fails via database triggers

– Real-time updates that actually work (thanks, Supabase!)

– 92% cost reduction in audio generation after GCP migration

– Clean database-first architecture after removing Lambda legacy code

The Challenging (Now Solved!)

– AWS Lambda layer hell: Import errors, 250MB size limits (solved by GCP migration)

– JWT security migration: Upgraded from HS256 to P-256 while maintaining backward compatibility

– AI temperature settings: Claude at 0.7 was generating invalid JSON 30% of the time (fixed with temperature 0)

– WebSocket memory leaks: React components were leaking 50MB/hour (fixed with RealtimeManager)

– Database race conditions: New users waited 3 minutes due to replication lag (fixed with atomic operations)

– Credit system complexity: Simplified from dual credits to single type

The “Oh No” Moments

– That time I accidentally stored critical workflow data in the wrong place

– When I realized mysterious browser errors were from my own code exhausting resources

– Discovering security vulnerabilities during a routine audit (all fixed now!)

What I Learned (Spoiler: A Lot)

This project pushed me way outside my comfort zone, and I learned tons:

1. User control is crucial: The outline review feature wasn’t in my original design, but it became the most important feature. Letting users shape content before generation starts saves time, credits, and frustration

2. Start simple, migrate when needed: We began with AWS Lambda but hit complexity walls – the migration to Cloud Run solved everything

3. Direct database queries can be faster: My 10x performance improvement came from ditching unnecessary API layers

4. AI costs add up: Running multiple AI services for a single podcast requires careful cost management

5. User experience matters: Adding progress indicators and time estimates made a huge difference

6. Security is never “done”: Regular audits revealed issues I never would have thought of

7. Infrastructure as Code has gotchas: SAM’s quirks taught me a lot (like SSMParameterReadPolicy adding extra slashes!)

8. Cloud migrations can be surprisingly fast: With AI pair programming, we migrated from AWS to GCP in just one day!

Current Status and What’s Next

DIALØGUE is now live! It’s in alpha phase with:

– 2 free credits for new users

– Credit packs available:

– Starter: $4.99 for 4 podcasts

– Pro: $9.99 for 9 podcasts

– Bulk: $19.99 for 18 podcasts

– No refunds during alpha (except automatic refunds for technical failures)

Want to Try It?

I’d love for you to give it a try! Head over to podcast.chandlernguyen.com and create your first AI podcast. The first 2 are free, so you’ve got nothing to lose.

Fair warning: it’s still in alpha, so things might break. But hey, that’s part of the fun, right? If you do run into issues, there’s a feedback feature built right into the app (only for logged-in users – had to add that after some spam issues).

Final Thoughts

Building DIALØGUE has been one of the most challenging and rewarding projects I’ve tackled. It combined everything I’ve been learning – from AWS Lambda functions to React components to AI prompt engineering – and even led to an unexpected cloud migration journey.

The most surprising discovery? That outline review step I mentioned earlier. Initially, I thought users would just want to input a topic and get a podcast. But in testing, I realized that giving users control over the direction before the main generation process starts makes all the difference. It transforms the tool from a black box into a collaborative AI assistant.

Is it perfect? Nope. Is it useful? I think so! At the very least, it’s been an incredible learning journey, and I’m excited to see where it goes from here.

What would you create a podcast about? I’m genuinely curious – drop me a message or try it out yourself. Who knows, with the ability to shape and guide the content, your AI-generated podcast might be exactly what you’re looking for. 😛

Want the technical deep-dive? Follow the full journey:

Engineering lessons learned building DIALØGUE: My journey from advertising to engineering, and why complexity is the enemy
One AI Parameter Change Cost Me $54/Month: How a single temperature setting during AWS → GCP migration caused major inefficiencies

DIALØGUE: Building an AI-Powered Podcast Generator from Scratch (And Learning a Ton Along the Way)

Table of Contents