I Built Multi-Tenancy on Day 2. On Day 67, I Rebuilt It

October 27, 2025, 11:47 PM. I’m running what I thought would be a routine security audit on STRAŦUM. Everything’s been working fine for weeks—SMEs have their data, agencies have theirs, multi-tenancy is solid.

Coffee’s cold. The audit script is churning through logs. Then I see it: Agencies were writing to SME tables.

Not through a bug. Not through a security hole. Through the architecture itself.

Two months ago, I made the decision to build multi-tenant architecture from Day 2. Bold move for a solo founder with one working AI agent. I added `org_id` to every table, wrote RLS policies, built separate routing for SMEs and Agencies. It was working—SMEs had their campaigns, agencies had their clients, data was flowing to the right places.

Or so I thought.

I sat there for probably 20 minutes just staring at the schema. How did I miss this? I’d spent weeks building multi-tenant architecture, writing 83 RLS policies, testing with both SME and Agency accounts. Everything *worked*. But “working” and “correct” aren’t the same thing.

This is the kind of bug that makes you question whether you should be building software at all. Because it’s not a typo. It’s not a missed edge case. It’s architectural naivety.

I’d made the classic mistake: I assumed `org_id` filtering was enough for multi-tenant isolation. It wasn’t.

This is the story of discovering that true multi-tenant isolation requires more than just adding `org_id` to every table—and the 33 migrations over 48 hours that finally solved it.

> **Note**: SQL examples in this post use genericized schema and table names (`tenant_b`, `workspace_entities`, `entity_data`) for security. The concepts remain the same regardless of your specific naming conventions.

The Problem: Not All Tenants Are Equal

Here’s what I initially built:

```sql
-- Brand guidelines table (shared by SMEs and Agencies)
CREATE TABLE brand_guidelines (
  id UUID PRIMARY KEY,
  org_id UUID REFERENCES organizations(id),
  name TEXT,
  guidelines JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- RLS policy (seems safe)
CREATE POLICY brand_guidelines_org_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (org_id = get_user_org_id());
```

This works perfectly for SMEs. Each organization has its own brand guidelines. Row-Level Security ensures they can’t see each other’s data. (At least, I thought it did. More on that in a second.)

But Agencies are different.

Agencies don’t just have one set of brand guidelines. They have one per client:

– Client A’s brand guidelines (vibrant color palette, bold typography, innovation-focused messaging)

– Client B’s brand guidelines (muted color palette, minimal design, quality-focused positioning)

Same agency, different clients, completely different brands.

The naive solution (what I built first):

```sql
-- Add entity_id to the shared table
ALTER TABLE brand_guidelines ADD COLUMN entity_id UUID;

-- Update RLS policy
CREATE POLICY brand_guidelines_isolation ON brand_guidelines
  FOR ALL TO authenticated
  USING (
    organization_id = get_user_org_id() AND
    (entity_id IS NULL OR entity_id = get_user_entity_id())
  );
```

Problem: This created a table with two different data models:

```
SME row:    organization_id='org-123',  entity_id=NULL,      guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-a', guidelines={...}
Agency row: organization_id='org-456',  entity_id='entity-b', guidelines={...}
```

Queries became a mess with complex `NULL` handling, and every feature needed “if SME, else Agency” logic in application code.

And yes, I wrote all of these before realizing they were symptoms of a deeper problem. Weeks of work, all pointing to the same conclusion: I’d architected myself into a corner.

Every feature needed custom logic: “If SME, do this. If Agency, do that.”

Worse, the architecture made incorrect assumptions:

– Agencies writing `entity_id=NULL` would pollute SME data

– SMEs couldn’t have sub-entities even if they wanted sub-accounts

– Schema became “Swiss cheese” with nullable columns

This wasn’t multi-tenant architecture. This was a single table trying to serve two different data models.

The Revelation: Different Tenants Need Different Schemas

By late October, I realized the truth: SMEs and Agencies don’t share the same data model.

SME data model:

```
organization → campaigns → agent_outputs
```

Agency data model:

```
organization → workspace_entities → campaigns → agent_outputs
                   ↓
            entity_data (e.g., brand guidelines, personas)
```

Agencies have an entire layer (workspace entities) that SMEs don’t have. They also have entity-specific intelligence that shouldn’t exist in the SME world.

The solution: Separate database schemas.

```sql
-- SME tables (public schema)
public.brand_guidelines
public.campaigns
public.outputs

-- Agency tables (tenant_b schema)
tenant_b.workspace_entities
tenant_b.entity_data  -- Includes brand guidelines, personas, etc.
tenant_b.campaigns
tenant_b.outputs
```

Now SMEs and Agencies have completely different tables. No shared schema. No nullable `entity_id` pollution. No “if SME, else Agency” logic.

Why This Matters: The Business Case for Schema Routing

Before diving into the technical implementation, let’s talk about why this architectural decision matters beyond “it’s cleaner code.”

Future-Proofing for Growth (Maybe)

Schema routing isn’t just about solving today’s problem. It’s about keeping doors open for opportunities I can’t even predict yet.

I’m still in private alpha with 15 users. I don’t have enterprise customers. I haven’t talked to a GDPR lawyer. But here’s what schema routing *could* enable if STRAŦUM grows:

International Expansion:

– If we expand to EU: Separate schemas could enable data residency (EU client data in `eu_agency` schema on EU servers)

– Right to deletion becomes simpler: Query one schema, not filter through mixed tables

– Audit trails: “Show me all Client X data” = one schema query

Compliance Conversations:

– When someone eventually asks “How do you guarantee data isolation?”

– With `org_id` filtering: “We use Row-Level Security policies” (vague, hard to verify)

– With schema routing: “Each client’s data lives in a separate database schema” (concrete, auditable)

– I don’t know if this matters yet. But it *could* matter if we get those conversations.

The Honest Truth:

I’m not building for HIPAA or SOC 2 compliance right now. I’m building for SMEs and small agencies who need better marketing strategy.

But schema routing means if someone asks “Can you handle healthcare clients?” or “Do you support data residency?” someday, the answer is “yes, let me show you the architecture” instead of “let me rebuild everything first.”

The Downsides (Being Honest)

Schema routing isn’t all upside. Here’s what it actually costs:

Development Complexity:

– Every WRITE operation needs a router function

– Every READ operation needs a security view

– Testing requires both SME and Agency paths

– With Claude Code: 2 days of intense work (Oct 27-29, 2025) in the evenings

– Without AI tools: Would’ve been weeks

Migration Risk:

– 33 sequential migrations = 33 opportunities for typos

– One wrong `ALTER TABLE` = production data corruption

– Had to run each migration 3X on staging before touching prod

– The paranoia was real

Query Performance Overhead:

– Views with `UNION ALL` = slightly slower reads

– Router functions = extra function call on writes

– RLS + views = more complex query plans

– (In practice: I haven’t noticed slowdowns yet, but I also only have 15 alpha users)

Operational Complexity:

– Schema migrations now affect 2+ schemas (public + agency)

– Database backups need schema-aware restore

– Monitoring queries need to check multiple schemas

– This will bite me eventually, I just don’t know when

Why I Made the Trade-Off Anyway

The option value might be enormous. Or it might not matter at all.

Schema routing keeps doors open that I’m not even sure I want to walk through:

White-label partnerships: Could give a partner their own schema, rebrand the UI

Reseller opportunities: Agencies could resell with provable data isolation

Different pricing tiers: “Premium” customers could get dedicated schemas

Geographic expansion: EU schema, US schema, APAC schema – same codebase

Here’s the thing: I’m in private alpha. I don’t know if any of these will matter. Maybe I’ll never get a white-label request. Maybe geographic expansion is years away. Maybe the whole business pivots and none of this is relevant.

But here’s what I do know: with schema routing, these options exist. With `org_id` filtering, most of them would require a complete rewrite.

That’s the bet I made: Spend 2 extra days now (with Claude Code) to keep options open later.

Is it the right bet? Ask me in a year.

The Architecture: Schema Routing

Pattern 1: Schema-Specific Tables

Some tables only exist for one tenant type:

```sql
-- Specialized tenant schema
CREATE SCHEMA tenant_b;

-- Workspace entities (specific to this tenant type)
CREATE TABLE tenant_b.workspace_entities (
  id UUID PRIMARY KEY,
  organization_id UUID,
  name TEXT,
  metadata JSONB
);

-- Entity-specific data
CREATE TABLE tenant_b.entity_data (
  id UUID PRIMARY KEY,
  organization_id UUID,
  entity_id UUID REFERENCES tenant_b.workspace_entities(id),
  data_type TEXT,
  content JSONB
);
```

SMEs never touch these tables. They don’t exist in `public` schema.

Pattern 2: Database Router Functions

How do you write to the correct schema? **Router functions**.

Here’s the concept (simplified):

```sql
CREATE FUNCTION save_resource_routed(params)
RETURNS JSONB
LANGUAGE plpgsql
SECURITY DEFINER  -- Run with elevated privileges
AS $$
BEGIN
  -- Step 1: Detect organization type
  SELECT type INTO org_type FROM organizations WHERE id = p_org_id;

  -- Step 2: Route to correct schema based on type
  IF org_type = 'TENANT_B' THEN
    INSERT INTO tenant_b.entity_data (...) VALUES (...);
  ELSE
    INSERT INTO public.brand_guidelines (...) VALUES (...);
  END IF;

  RETURN result;
END;
$$;
```

How it works:

1. Detect org type: Query organizations table to determine tenant type

2. Route to correct schema: Write to appropriate schema based on type

3. Return result: Include which schema was used for debugging

Application code (same for all tenant types):

```typescript
// Just call the router function - no tenant-specific logic
const result = await supabase.rpc('save_resource_routed', {
  p_org_id: orgId,
  p_entity_id: entityId,  // null for simple tenants
  p_data: { ... }
});
```

No if/else in application code. The database does the routing.

Writing my first router function took 4 hours. Debugging why it wasn’t working? Another 6 hours. The problem? I’d forgotten to grant EXECUTE permissions. Classic solo founder energy: architectural brilliance, permission oversights.

Pattern 3: Security-Invoker Views for Reads

Writing uses router functions. Reading uses **views**.

```sql
-- Unified view combining both schemas
CREATE VIEW resources_unified
WITH (security_invoker = on)  -- Respects RLS policies
AS
  SELECT id, organization_id, NULL AS entity_id, data, 'public' AS source
  FROM public.brand_guidelines
UNION ALL
  SELECT id, organization_id, entity_id, content AS data, 'tenant_b' AS source
  FROM tenant_b.entity_data
  WHERE data_type = 'brand_guidelines';
```

Application code (unified reads):

```typescript
// Read resources (works for all tenant types)
const { data } = await supabase
  .from('resources_unified')
  .select('*')
  .eq('organization_id', orgId);

// RLS policies filter correctly regardless of source schema
```

Key detail: `WITH (security_invoker = on)` ensures RLS policies are enforced. Without this, views bypass RLS (security disaster).

The Migration: 33 Migrations in 48 Hours

Adding schema routing wasn’t a single migration. It was a journey.

You know what’s fun? Writing 33 database migrations in a row while knowing that if even ONE has a typo, you’ll corrupt production data. Actually, “fun” isn’t the word. “Terrifying” is more accurate. I ran each migration on staging three times before touching production.

October 27-29, 2025: 33 sequential migrations for complete schema routing.

The migration phases:

1. Create specialized schema – Set up `tenant_b` schema with proper permissions

2. Create schema-specific tables – Mirror necessary tables in new schema

3. Build router functions – One for each resource type that needs routing

4. Create security views – Unified views with `UNION ALL` for reads

5. Update RLS policies – Ensure both schemas have proper isolation

6. Data migration – Move existing data to correct schemas

7. Application updates – Switch from direct queries to router functions/views

Total effort: 33 migrations, 2 days with Claude Code, 100% worth it.

The Results: True Multi-Tenant Isolation

Before (Shared Tables with org_id)

Data model:

```sql
public.brand_guidelines (organization_id, entity_id, guidelines)
```

Problems:

– ❌ Nullable `entity_id` for one tenant type (data model confusion)

– ❌ Complex queries with `NULL` handling

– ❌ Application logic: `if (tenantTypeA) { … } else { … }`

– ❌ Risk of cross-contamination

After (Schema Routing)

Data model:

```sql
public.brand_guidelines (organization_id, guidelines)    -- Tenant Type A
tenant_b.entity_data (organization_id, entity_id, data)  -- Tenant Type B
```

Benefits:

– ✅ Clean data models (no nullable foreign keys)

– ✅ Simple queries without complex `NULL` handling

– ✅ No application if/else (database handles routing)

– ✅ Impossible for cross-schema contamination (physically separate)

Security Improvements

Before: Risk of cross-contamination with nullable columns and shared tables

After: Router functions automatically direct writes to correct schema based on organization type. Physical schema separation makes cross-contamination impossible.

Isolation level: Database-enforced separation. Not application-level checks.

When to Use Schema Routing vs org_id Filtering

Not every multi-tenant app needs schema routing. Use this decision tree:

Use `org_id` Filtering (Simpler) If:

All tenants have the same data model (e.g., todos app)

No hierarchical tenancy (no sub-entities within organizations)

Simple queries (`WHERE org_id = X` works everywhere)

B2C or small B2B (no enterprise sales, no compliance requirements)

MVP speed matters (get to market in weeks, not months)

Business reasoning: You’re validating product-market fit, not building for Fortune 500 compliance requirements. Ship fast, refactor later if you get enterprise traction.

Example Project management tool where every organization manages their own projects.

```sql
CREATE TABLE projects (
  id UUID PRIMARY KEY,
  org_id UUID,  -- Simple filtering
  name TEXT
);
```

Use Schema Routing (More Complex) If:

Different tenant types need different data models (Type A vs Type B vs Type C)

Hierarchical tenancy (organizations → workspace_entities → sub-entities)

Enterprise sales on roadmap (Fortune 500, healthcare, finance, government)

Regulatory compliance required (GDPR, HIPAA, SOC 2, FedRAMP)

White-label or reseller potential (partners need complete data isolation)

International expansion planned (data residency requirements)

Business reasoning: If you’re targeting enterprise customers, “data isolation” becomes a checkbox on security questionnaires. Schema routing lets you answer confidently. Row-level filtering leaves you hedging.

Example: Platform where some tenants have hierarchical workspace structures that fundamentally differ from direct customers.

Cost/Benefit (My Experience):

– Schema routing costs: 2 days with Claude Code (would’ve been weeks without AI assistance)

– Potential upside: Cleaner architecture, compliance readiness, partnership options

– Actual upside: Unknown – I’m still in private alpha

– Break-even: If schema routing opens even one door I couldn’t otherwise walk through, it pays for itself

The real question: Are you optimizing for speed-to-market or optionality? Both are valid. I chose optionality.

Alternative Isolation Strategies

Schema routing isn’t the only approach. Here’s the spectrum:

Level 1: Separate Databases (Highest Isolation)

“`

database_tenant_1

database_tenant_2

database_tenant_3

“`

Pros:

– ✅ Complete physical isolation

– ✅ Per-tenant backups

– ✅ Independent scaling

– ✅ Regulatory compliance (data residency)

Cons:

– ❌ High operational complexity (manage N databases)

– ❌ Expensive (database instance per tenant)

– ❌ Cross-tenant queries impossible

– ❌ Schema migrations across all databases

Use case: Enterprise SaaS with regulatory requirements, high-value customers ($10k+/month).

Level 2: Separate Schemas (Strong Isolation)

```
database
├── schema_tenant_1
├── schema_tenant_2
└── public (shared tables)
```

Pros:

– ✅ Strong logical isolation

– ✅ Shared infrastructure (one database)

– ✅ Schema-level permissions

– ✅ Different data models per tenant type

Cons:

– ❌ More complex than row-level

– ❌ Requires router functions

– ❌ Migration complexity (N schemas)

Use case: B2B SaaS with different customer tiers (STRAŦUM’s approach).

Level 3: Row-Level Filtering with RLS (Moderate Isolation)

```
database
└── public
    └── table (with org_id column)
```

Pros:

– ✅ Simple to implement

– ✅ Easy migrations (one schema)

– ✅ Cross-tenant analytics possible

– ✅ Postgres RLS enforces isolation

Cons:

– ❌ All tenants share same data model

– ❌ RLS performance overhead

– ❌ Risk of RLS misconfigurations

Use case: B2B SaaS with uniform data models (project management, CRM).

Implementation Checklist: Schema Routing

If you’re implementing schema routing, use this checklist:

Phase 1: Schema Design

– [ ] Create tenant type discriminator (`organizations.type`)

– [ ] Design schema-specific tables (what belongs where?)

– [ ] Create specialized schema: `CREATE SCHEMA tenant_b;`

– [ ] Mirror necessary tables in specialized schema

– [ ] Document which tables live in which schema

Phase 2: Router Functions

– [ ] Write router function for each resource type

– [ ] Use `SECURITY DEFINER` for elevated permissions

– [ ] Set `search_path = public, tenant_b` for multi-schema access

– [ ] Handle tenant type detection: `SELECT type FROM organizations`

– [ ] Return schema info for debugging

– [ ] Grant `EXECUTE` to authenticated role

Phase 3: Security Views

– [ ] Create unified views for reads (`UNION ALL` across schemas)

– [ ] Use `WITH (security_invoker = on)` for RLS enforcement

– [ ] Add `source_schema` column for debugging

– [ ] Test that RLS policies work on views

– [ ] Grant `SELECT` to authenticated role

Phase 4: Application Integration

– [ ] Update writes to use router functions: `supabase.rpc(‘save_resource_routed’, …)`

– [ ] Update reads to use views: `supabase.from(‘resource_unified’).select()`

– [ ] Remove application-level if/else logic

– [ ] Test tenant type A flow (writes to `public`)

– [ ] Test tenant type B flow (writes to `tenant_b`)

– [ ] Verify data isolation (Entity A ≠ Entity B)

Phase 5: Migration & Testing

– [ ] Write migration scripts (30+ for full coverage)

– [ ] Test on staging with real-like data

– [ ] Run security audit (cross-schema leakage?)

– [ ] Load test (RLS + views performance)

– [ ] Monitor in production (slow queries?)

Lessons Learned

1. org_id Is Necessary, Not Sufficient

Adding `org_id` to every table gives you row-level filtering. But if different tenant types need different data models, you need schema routing.

Learning: “Multi-tenant” isn’t binary. There are levels of isolation. I picked a higher level than I strictly needed for my current 15 users. Time will tell if that was smart or just extra work.

2. Application Logic → Database Logic

Every `if (tenantType === ‘TYPE_B’)` in your application is a code smell. Move tenant-aware logic to the database with router functions.

Learning: Database functions are harder to write but potentially easier to audit. If I ever get enterprise customers asking “prove your data isolation,” I can point to stored procedures. But right now, that’s hypothetical.

3. Views + RLS = Unified Reads

Reading from multiple schemas is complex. Views + `security_invoker = on` give you unified reads with proper isolation.

Learning: Views create abstraction layers that might make compliance easier someday. Or they might just add complexity I didn’t need. We’ll see.

4. Security Definer Functions Are Powerful

`SECURITY DEFINER` lets functions run with elevated privileges while still respecting RLS policies. Essential for router functions.

5. Migrations Are Worth It (Probably)

33 migrations for schema routing felt like a lot. But the result? Clean architecture, true isolation, and zero cross-tenant bugs.

Learning: Technical debt has a price. I chose to pay it early when I have 15 alpha users instead of waiting until I have paying customers. Was that the right call? I’ll know if I ever have 100 customers to worry about.

6. Architectural Flaws Hurt More Than Code Bugs

Finding a null pointer exception at 11:47 PM? Annoying. Finding out your entire multi-tenant architecture is fundamentally broken? That’s the kind of discovery that makes you lie awake at night.

But here’s what I learned: architectural mistakes are fixable. They’re expensive, yes. Time-consuming, absolutely. But I went from “one tenant type can accidentally contaminate another’s data” to “database-enforced isolation that’s impossible to bypass.”

Fix it early, fix it right, and you’ll sleep better.

7. Architecture Might Be Strategy (Or Maybe Just Over-Engineering)

The schema routing decision wasn’t just about “clean code.” It was about keeping future options open—white-label partnerships, enterprise sales, international expansion.

But here’s the honest truth: I’m in private alpha with 15 users. I don’t have enterprise customers knocking. I haven’t gotten a single GDPR question. The partnerships I’m imagining might never materialize.

I thought I was making a technical decision. Maybe I was making a business strategy decision. Or maybe I was just over-engineering because I find database architecture interesting.

Building multi-tenant SaaS with complex isolation needs? STRAŦUM uses schema routing to serve different tenant types with true data isolation. Request alpha access at https://stratum.chandlernguyen.com/request-invitation

*Still learning that “multi-tenant” has many levels of isolation. Still debugging RLS policies at midnight. Still questioning my Day 2 architecture decisions (but less so now). More database adventures at https://www.chandlernguyen.com/ .

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.