$ man apollo-wiki/supabase-warehouse
Integrationsadvanced
Apollo to Supabase Data Warehouse
Schema design, upsert logic, and account-contact split for Apollo data
☾
Why Supabase
Apollo data needs to live somewhere durable. CSV exports get lost. Spreadsheets do not scale. CRMs have import limits and field constraints. Supabase gives you a free Postgres database with a REST API, real-time subscriptions, and row-level security. Your Apollo sourcing scripts write directly to Supabase. Your enrichment scripts read from and write back to Supabase. Your CRM sync reads from Supabase. It is the central data warehouse that connects everything. And the free tier handles 500MB of data and 2GB of bandwidth - more than enough for most GTM pipelines.
PATTERN
The Account-Contact Split
Two tables.
accounts: domain (primary key), company_name, employee_count, industry, annual_revenue, tech_stack, icp_score, enriched_at. contacts: apollo_id (primary key), email, first_name, last_name, title, seniority, domain (foreign key to accounts), persona_tag, sourced_at, enriched_at. The account-contact split prevents duplicate company enrichment. 50 contacts at Microsoft = 1 account row enriched once, not 50 redundant company lookups. This is the same foundational pattern from Clay, applied to your own database.CODE
Upsert Logic
Every Apollo sourcing run should upsert, not insert. On the accounts table, upsert on domain - if the domain exists, update the fields that changed. On the contacts table, upsert on apollo_id (or email if apollo_id is missing). This prevents duplicates across multiple sourcing runs. In Supabase, use
supabase.from('contacts').upsert(data, { onConflict: 'apollo_id' }). In raw SQL: INSERT INTO contacts (...) VALUES (...) ON CONFLICT (apollo_id) DO UPDATE SET title = EXCLUDED.title, enriched_at = EXCLUDED.enriched_at. Fields like sourced_at should NOT be overwritten on upsert - you want to preserve when the contact was first sourced.PRO TIP
Query Patterns
Once your data is in Supabase, you can run pipeline analytics that CRMs make difficult. Examples: "How many VP-level contacts have we sourced at companies with 50-200 employees in the last 30 days?"
SELECT COUNT(*) FROM contacts c JOIN accounts a ON c.domain = a.domain WHERE c.seniority = 'vp' AND a.employee_count BETWEEN 50 AND 200 AND c.sourced_at > NOW() - INTERVAL '30 days'. Or: "What is our match rate by persona?" Group by persona_tag, count total vs enriched. These queries take seconds. In a CRM, you would be building custom reports and waiting.related entries