$ man deduplication

GTM · Automation & Scripts

Deduplication

Checking for and removing duplicate records before processing. If a company was already enriched in a previous run, skip it. If a contact appears twice with slightly different names, merge them.


why it matters

duplicate records waste API credits, create confusion in CRMs, and inflate your pipeline numbers. if you run an enrichment script twice without dedup, you get duplicate rows. if you import contacts to HubSpot without dedup, you get duplicate contact objects. if you send emails to duplicates, the same person gets two identical messages and marks you as spam. deduplication is a gate that should exist at every handoff point in the pipeline.

how I use it

I build dedup into every script. at the start of a batch run, I load the existing output CSV and build a set of already-processed domains or emails. before each API call, I check: is this domain already in the set? if yes, skip. if no, process and add to the set. this means I can re-run scripts safely — if a run fails at record 40 of 73, I restart and it picks up at record 41 automatically. for CRM dedup, Clay has Sculptor (AI-powered fuzzy matching) that catches "Microsoft" vs "Microsoft Corporation" vs "MSFT." for contact-level dedup, I use email as the primary key — if the email already exists in HubSpot, update the record instead of creating a new one.


related terms
Batch ProcessingSculptorValidationEnrichment Pipeline
GTM knowledge guideall terms →
ShawnOS.ai|theGTMOS.ai|theContentOS.ai