$ man clay-wiki/claygent-prompts
Core Conceptsintermediate
AI and Claygent Prompt Engineering
Get Claygent to stop hallucinating and start producing usable data
What Claygent Actually Is
Claygent is Clay's AI browsing agent. You give it a prompt and input fields, and it browses the web, reads pages, and returns structured data. It's not a static knowledge lookup — it actively visits URLs, reads content, and extracts information. This makes it powerful for research tasks ("go to this company's website and find their pricing model") but also means it can hallucinate, time out, and fail in ways that static enrichment doesn't.
PATTERN
The Validation System
Never trust Claygent output without validation. The pattern: (1) Run Claygent on a 5-row sample first. (2) Manually verify every field against the source — did it actually find this on the website, or did it make it up? (3) If it hallucinates on 1 out of 5, it'll hallucinate on 100 out of 500. Fix the prompt before scaling. (4) Add a confidence field to your output schema — force Claygent to self-report confidence (high/medium/low). (5) Filter downstream enrichment to only run on high-confidence results.
The 5-row test is non-negotiable. I've seen people run Claygent on 2,000 rows, find out later that 30% of the data was fabricated, and have to redo everything. Five rows. Verify manually. Then scale.
PATTERN
Prompt Structure That Works
Every Claygent prompt should follow this structure:
1. Role context — "You are a B2B research analyst tasked with..."
2. Specific task — exactly what to find, from where (be specific: "/about page", "/pricing page", not just "the website")
3. Input fields — listed at the bottom of the prompt, wrapped in curly braces
4. Output schema — exact JSON structure you expect back
5. Constraints — "Return ONLY valid JSON. No explanations. If you cannot find the information, return null for that field."
6. Edge case handling — what to return when data is missing, ambiguous, or contradictory
The more specific your prompt, the less Claygent improvises. Improvisation is where hallucinations happen.
PRO TIP
Model Selection Framework
Clay offers different AI models. The choice matters: GPT-4 — most capable, highest accuracy, highest credit cost. Use for complex research prompts that require reasoning (ICP qualification, competitor analysis). GPT-3.5 — faster, cheaper, less accurate. Good for simple extraction tasks ("pull the company's employee count from their LinkedIn page"). Claude — strong at structured output and following formatting instructions. Good when you need reliable JSON schema compliance.
Match the model to the task. Don't use GPT-4 credits on a task that GPT-3.5 handles fine. Don't use GPT-3.5 on a task that requires nuanced reasoning. The credit difference is 3-5x.
ANTI-PATTERN
Preventing Hallucinations
Claygent hallucinates most when: (1) The prompt is vague — "tell me about this company" invites fabrication. Be specific about what fields to extract and from which pages. (2) The target page doesn't exist — if you tell it to check /pricing and there is no pricing page, it'll guess. Add fallback logic: "If the page returns a 404 or the information is not found, return null." (3) The page is too large — token limits cause truncation, and Claygent fills in the gaps with imagination. Target specific pages, not entire sites. Use FireCrawl to get clean markdown first if needed. (4) No output schema — without a defined JSON schema, Claygent returns prose that's hard to parse and easy to embellish. Always define the exact output structure.
PATTERN
When to Use Claygent vs Formulas vs HTTP
Claygent: use when you need to browse a web page and extract unstructured information. Research, positioning analysis, product categorization from website content.
Formulas: use when you can transform data you already have. Name merging, title normalization, MX classification, scoring bins. Zero credits. Instant. No hallucination risk.
HTTP columns: use when you need structured API data. MX records, DNS lookups, time APIs, SemRush data, custom endpoints. Reliable, structured, no AI interpretation.
The priority order: formula first (free, reliable) → HTTP column (structured, low-cost) → Claygent (powerful but expensive and fragile). Only reach for Claygent when the first two can't do the job.
related entries