$ man data-lake
Data Lake
A persistent store of every enrichment result, qualification score, and engagement signal your GTM team has ever generated. Keyed by company domain. Queried before enrichment to avoid re-paying for data you already have.
most GTM teams treat enrichment as a per-campaign expense. they enrich 500 leads, run the campaign, archive the table, and start fresh next quarter. if 200 of those leads overlap, they just paid for the same data twice. a data lake stores every enrichment result with a timestamp. before running a new campaign, you query the lake first and only enrich the gaps. I've seen this cut enrichment costs by 40-60% for teams with overlapping target lists. beyond cost savings, a data lake builds institutional knowledge. you can see how a company's tech stack, headcount, and hiring signals changed over six months. that context makes outbound more relevant than any single-point enrichment.
I build a simple data lake as the foundation of every GTM engagement. PostgreSQL or SQLite with three core tables: companies (keyed by domain), contacts (keyed by email), and enrichment_results (timestamped). every enrichment pipeline checks the lake first. if the company was enriched within 90 days, use the cached data. only call Clay, Apollo, or Exa for stale or missing records. the lake also feeds analytics - which companies appeared in multiple campaigns, which contacts engaged across channels, which enrichment providers returned the best data. that is the kind of analysis you cannot do when every campaign is a throwaway CSV.