$ man how-to/cron-jobs-for-scraping

CLI Toolsadvanced

Cron Jobs for Job Scraping

Python scripts as cron jobs for job board scraping and signal detection


Why Job Boards Are Signals

A company posting "Head of RevOps" is a signal. They are building a revenue operations function. A company posting "SDR Manager" is a signal. They are scaling outbound. A company posting "Data Engineer" with "Clay" in the requirements is a signal. They are building GTM data infrastructure. Job postings are public intent data. The company is telling the world exactly what they are building and what skills they need. If your product or service aligns with what they are hiring for, that is a warm signal. The problem: job boards have thousands of new postings daily. Manual monitoring does not scale. Cron jobs do.
PATTERN

The Scraping Pipeline

Step 1: Python script that queries job board APIs or scrapes listings. Target boards relevant to your ICP: LinkedIn Jobs API, Indeed, Greenhouse board pages, Lever board pages. Filter by keywords that indicate buying intent for your product. Step 2: Parse and normalize the results. Extract company name, role title, posting date, key requirements. Store in a structured format - JSON or SQLite. Step 3: Enrich. Match company names against your existing Attio records. Check if they are already in your pipeline. If new, run through Apollo for firmographic data. Step 4: Score and route. Companies posting 3+ relevant roles in 30 days get a higher score than one-off postings. Route high-scoring signals to your ABM target list.
CODE

launchd Scheduling

On macOS, cron is deprecated in favor of launchd. Create a plist file in ~/Library/LaunchAgents/ that runs your Python script on a schedule. The plist defines: which script to run (ProgramArguments), when to run it (StartCalendarInterval or StartInterval), where to log output (StandardOutPath, StandardErrorPath), and whether to run at load (RunAtLoad). A typical setup: run the scraper every 6 hours. That catches new postings without hammering the source. The script writes results to data/signals/job-postings.json. A separate daily cron reads that file, deduplicates, enriches new entries, and pushes qualified signals to Attio. Keep the scraping and the enrichment as separate jobs. If the scraper fails, you do not lose yesterday's enrichment results. If enrichment fails, you do not lose today's scraping results. Decoupled pipelines are resilient pipelines.
PRO TIP

Feeding Signals into ABM

Raw job postings are noise. Enriched and scored job postings are signals. The enrichment step transforms "Acme Corp posted a Head of RevOps role" into "Acme Corp (Series B, 150 employees, using Salesforce and Outreach, $12M ARR) is building a RevOps function. They posted 3 GTM roles in the last 2 weeks. No existing relationship in Attio." That enriched signal feeds directly into your ABM targeting. Build a personalized landing page referencing their RevOps buildout. Draft outreach that connects your solution to their specific hiring pattern. The job posting gave you the opening. The enrichment gave you the context. The ABM pipeline turns both into a conversation.

related guides
ABM Personalization ArchitectureHow to Build an ABM Pipeline with AI
related on other sites
The CLI Ecosystem
← how-to hubclay wiki →
ShawnOS.ai|theGTMOS.ai|theContentOS.ai