HubSpot Label Enrichment with n8n, SerpAPI, and Claude
Stop manually researching artist labels. Let the workflow do it.
Music PR manages hundreds of artist contacts in HubSpot. Each contact is an artist or a band. The Company Namecolumn, which is supposed to hold the artist’s music label, was empty or wrong across most of the database. Researching every artist by hand was burning hours every week, and the field still drifted as the roster changed.
The question wasn’t “how do we update HubSpot.” The question was: what if a workflow did the research for us, with enough safety that it never poisoned the data?
The shape of the workflow
End to end, it’s a linear pipeline with one inner loop and one outer pacing wait. From a single “Run Enrichment” trigger:
- Get List Memberships from HubSpot - which artist contacts are in scope this run.
- Aggregate Member IDs then Batch Read Contacts - pull the full contact records via the HubSpot Private App API in a single batch instead of N round trips.
- Prepare Contacts - normalize the records into a clean list the loop can chew on.
- Loop Over Contacts - for each artist:
- Google AI Mode search via SerpAPI on the artist’s email, name, and known metadata. AI Mode returns a synthesized answer with sources.
- AI Agent with a structured output parser. Primary model:
claude-sonnet. Fallback:claude-haiku. - Update Contact via HubSpot HTTP
PATCH- write the resolved label into thecompanyfield. - Wait 12 seconds - per-contact throttle so SerpAPI and Anthropic rate limits stay happy.
- 3-minute pacing wait between batches at the outer loop level. Safety margin for long-running enrichments without tripping HubSpot daily quotas.
Why two Claude models
Sonnet is the primary because it reasons better over ambiguous search results. Music labels are messy - artists drop, switch, get acquired, sign side-deals. A weaker model confidently writes “Independent” when the truth is “Run On Records via a one-off licensing deal,” which is technically wrong but plausible enough to slip through review.
Haiku is the fallback specifically for when Sonnet rate-limits or errors. Haiku is faster and cheaper, so a temporary degraded mode is still a working pipeline. The trade is documented in the workflow: if you see a run with mostly Haiku rows in the cost log, you know the upstream had a hiccup and you can re-run those rows later with Sonnet for verification.
The anti-contamination rule
The structured output parser has a hard rule: when the search results are ambiguous or the agent isn’t confident, write "None" instead of guessing.
Better to leave a field empty than poison the database with a wrong label that someone later treats as ground truth.
This is the rule I’ve come back to in every enrichment build: the cost of a false-confident answer is much higher than the cost of a blank. A blank is honest about what we don’t know. A wrong label is a quiet lie that other workflows will start building on.
Cost tracking from day one
Every enrichment appends a row to a Google Sheet via the Append or update row in sheet node. The row carries: contact ID, input tokens, output tokens, model used, computed dollar cost, and the resolved label (or None).
I built this in from run one, not as an afterthought. Reason: without cost visibility, you find out three months in that the workflow cost more than the manual research would have, and the argument for keeping it gets harder. With visibility, you have receipts. You can argue: per-contact enrichment costs $0.03 of Claude tokens and replaces 4 minutes of human research at any rate you want to assign that time.
Status, honestly
Architecture v2 is wired in n8n. All nodes are connected. The Anthropic Chat Model nodes (Sonnet and Haiku), the SerpAPI Google AI Mode search, the HubSpot Batch Read and PATCH calls, the cost log to Google Sheets - all present, all linked.
What’s in flight: prompt-engineering iterations on the AI agent (this is where the anti-contamination behavior gets sharpened in practice), and the full-catalog rollout for Music PR.
This isn’t a shipped-to-production post. It’s an architecture-and-shape post. When the rollout completes with real numbers - how many labels resolved, how many Nonerows, dollar cost across the catalog - I’ll write the production-numbers follow-up.
The lesson worth repeating
The workflow IS the documentation. Every node is named for what it does in plain English. The Sticky Note in the canvas explains the anti-contamination rule and the dual-model strategy to the next engineer who opens the file. The cost log is the audit trail.
When this kind of work is built right, the next person opening the canvas understands it without reading a Confluence page that was last updated six months ago. That’s not a side effect of good engineering. That’s the point.
Working on something like this?
Start a conversation →