処理の適法な根拠が文書化されていないリスト。 事前同意のないEUまたはUKの連絡先、法律25号下のケベックのリード、CCPAの経路がないカリフォルニアの消費者データ——スキルは指定されたものは何でもエンリッチメントします。だからこそオペレーターが先に確認する必要があります。SKILL.md の「Do NOT invoke」ブロックがこれをカバーしており、コメントアウトしないでください。
vs Apolloのネイティブシーケンスパーソナライズ。 Apolloは自社のエンリッチメントデータから冒頭文を生成します。立ち上げは速いですが、冒頭文は見え見えのテンプレートで、ApolloのICP経験則(自社のものではない)でスコアリングされ、どのファクトがドラフトを決定したかの監査証跡がありません。このスキルは立ち上げに時間がかかり行あたりコストも高いですが、冒頭文はワンクリックで確認できる日付付きURLを参照し、ルーブリックは自分で管理するファイルです。
vs Clearbit + Outreach Smart Variables。 Clearbitが提供するSmart Variablesは事実のメールマージ("業界は${X}です")を生成するものであって、冒頭文ではありません——担当者が変数の周りに実際の文を書く必要があります。担当者がすでに文を書いている場合はこのスキルより安く、書いていない場合は担当者の時間がトークンコストを上回るため全体的に高くなります。
vs 手動による冒頭文作成。 シニアSDRは1件のアカウントに対して高品質なコールドオープナーを書くのに4〜7分かかります。フル原価ベースで時給$60として、1冒頭文あたり約$5の担当者コストです。スキルは最大で約$0.025/冒頭文です。ただし担当者は書きながらアカウントレベルの思考も行っています。スキルはしません。ほとんどのチームにとって適切な構成は、トップオブファネルのボリューム(ティア1アカウントリスト以下のすべて)にスキルを使い、名前付きアカウントリストには担当者作成の冒頭文を使うことです。
vs 現状(エンリッチメントなし、汎用冒頭文)。 汎用冒頭文の返信率は公開ベンチマークで1〜2%程度、最近のシグナルに紐づけた軽くパーソナライズした冒頭文は4〜8%です。スキルは後者を目指します。ルーブリックとスタイルガイドを整備する意思がある場合にのみ実施する価値があります。それなしでは、スキルのアウトプットは現状より実質的に優れていません。
---
name: lead-enrichment
description: Enrich a Clay table row with company summary, ICP fit score, and a personalized cold-email opener. Use this skill from Clay AI columns or from a Claude Code session driving the Anthropic Batch API over an exported lead list. Returns structured fields plus a draft opener for rep review — never auto-sends.
---
# Lead enrichment
## When to invoke
Whenever a Clay table or exported CSV of leads needs three derived fields in one pass: a 2-sentence company summary, an ICP fit score with one-line reasoning, and a sub-50-word cold-email opener that cites a real signal from the company's public footprint. Take a domain (and optionally a LinkedIn URL or a free-text ICP rubric) as input and produce a structured JSON-shaped Markdown record per row.
Do NOT invoke this skill for:
- Auto-sending email. The opener is a draft for rep review. Wiring the output directly into a sequence-send step bypasses the guardrail this skill exists to enforce.
- Enriching leads that have not consented to processing. If the source list was scraped from a region with prior-consent rules (EU/UK GDPR, Quebec Law 25, etc.) without a documented lawful basis, stop and surface the concern to the operator. This skill will not silently process unconsented data.
- Account-level research briefs for discovery calls — use the `account-research` skill instead. This one optimizes for batch volume and cost-per-row, not depth.
- Public-company financial scoring or any decision that needs licensed data (S&P, Pitchbook). This skill reads public web only.
## Inputs
- Required: `domain` — the company's primary domain (e.g. `acme.com`). Must resolve and serve HTML; parked or dead domains are returned with `status: unreachable` and skipped.
- Required at config time: a Clay table reference (table ID + column map) OR a CSV path if running outside Clay. The skill assumes the table has at least `domain`; `first_name`, `title`, and `linkedin_url` improve opener quality if present.
- Required at config time: `references/icp-rubric.md` populated with your real rubric. Without it, `icp_fit_score` is omitted (not guessed).
- Optional: `references/opener-style-guide.md` — your team's voice, banned phrases, max length override.
- Optional: `references/source-quality-matrix.md` — vendor preference order when multiple enrichment columns upstream of this skill disagree.
- Optional: `recent_news` — pre-fetched by an upstream Clay column. When present, the opener step uses it instead of re-fetching.
## Reference files
Always read the following from `references/` before generating any row. They encode the operator's positioning and quality bar — without them output is generic.
- `references/icp-rubric.md` — the rubric the score uses. Replace template content with your actual ICP definition before first run.
- `references/opener-style-guide.md` — voice, length cap, banned phrases (e.g. "I noticed", "love what you're doing", any superlative).
- `references/source-quality-matrix.md` — preference order for resolving conflicts between upstream enrichment vendors (Apollo vs. Clearbit vs. ZoomInfo vs. Clay's native People/Company API).
## Method
Run these four sub-tasks in order per row. Steps 1-3 produce the structured fields; step 4 only runs when the ICP score clears the configured threshold.
### 1. Resolve and fetch
Hit `https://{domain}` with a 10-second timeout and one redirect hop. On non-2xx, parked-domain heuristics, or empty body, mark `status: unreachable` and skip steps 2-4. Do not retry indefinitely — parked domains are a meaningful share of any scraped list and the skill should fail fast on them rather than burn API spend.
If the homepage resolves, additionally fetch `/about`, `/company`, and `/customers` (best-effort, ignore 404s). Concatenate cleaned text up to ~8K tokens; truncate from the end of the longest section, not the front, to preserve the homepage hero copy.
### 2. Extract structured snapshot
In one Claude call, extract:
- `industry` — single noun phrase, derived from copy not guessed
- `size_signal` — one of `solo | smb | mid | ent`, justified by a cited copy fragment (employee count claim, customer-logo density, funding mention) or `unknown`
- `value_prop` — one sentence in the company's own framing
- `recent_signal` — at most one verifiable fact (a launch, hire, funding round, customer win) with a URL anchor. If none found in the fetched pages, return `null` rather than inventing one.
Engineering choice: extraction is a single structured Claude call, not three. Each additional round-trip multiplies cost-per-row at 100K-row scale; the extraction prompt is small enough to stay reliable in one pass.
### 3. Score against ICP rubric
Load `references/icp-rubric.md`. Score the snapshot 1-10 with a one-line justification that names which rubric dimension drove the score. If the rubric file still contains template placeholders (`{...}` literals), return `score: null` and `reason: "rubric not configured"` rather than scoring against nothing.
Engineering choice: ICP scoring runs before opener generation, not after. Rationale: opener generation is the most expensive step (longer output, more careful prompt). Skipping it for sub-threshold rows is where the cost savings live. A common misconfiguration is to generate openers for every row "in case the rep wants to override" — that defeats the filter.
### 4. Generate opener (only if score >= threshold)
Default threshold: 6/10. Configurable per run.
Inputs to the opener step: `value_prop`, `recent_signal` (if non-null), the lead's `first_name` and `title` if present, and the style guide.
Hard rules baked into the opener prompt:
- Max 50 words.
- Must reference exactly one specific thing from `recent_signal` or `value_prop`. If both are null, return `opener: null` rather than writing flattery.
- No superlatives ("amazing", "incredible", "love").
- No invented company claims. The opener may only reference facts present in the snapshot. The prompt includes the snapshot inline and explicitly forbids facts not in it.
- No question close that pretends to know an internal pain ("how are you handling X?" without evidence X is happening).
The opener is always returned as a draft with a `confidence` field (0-1) reflecting how strong the cited signal was. Reps see the confidence; low-confidence openers are surfaced for rewrite, not auto-sent.
## Output format
One record per row, emitted as a fenced JSON block inside Markdown so Clay's AI column can parse it and a human reading the log can scan it.
```markdown
### acme.com
```json
{ "domain": "acme.com", "status": "ok", "industry": "Workforce management software", "size_signal": "mid", "size_signal_evidence": "About page cites '350+ employees across 4 offices'", "value_prop": "Schedules and pays hourly workforces for restaurants and retail.", "recent_signal": { "summary": "Launched a payroll module in March 2026.", "url": "https://acme.com/blog/payroll-launch" }, "icp_fit_score": 8, "icp_fit_reason": "Mid-market, hourly-workforce vertical — direct match on rubric line 2.", "opener_draft": "Saw the March payroll launch — folding pay into the same surface as scheduling is the move most ops leaders ask us about. Worth a 20-min on whether the multi-state tax piece is hitting the same wall others have?", "opener_confidence": 0.78, "sources": [ "https://acme.com/", "https://acme.com/about", "https://acme.com/blog/payroll-launch" ] }
```
```
For unreachable rows the record collapses to `{ "domain": "...", "status": "unreachable" }` with no other fields — no placeholder strings.
## Watch-outs
- **Source-quality drift across vendors.** When a Clay table has both Apollo and Clearbit enrichment columns upstream and they disagree on headcount or industry, this skill's snapshot can flip-flop run to run. Guard: `references/source-quality-matrix.md` declares a preference order; the snapshot step cites which vendor (or homepage-derived value) it used per field, so drift is auditable.
- **Opener inventing claims that aren't in the data.** Without strict prompting, openers drift toward confident-sounding fabrications ("congrats on your Series C" when there was no Series C). Guard: the opener prompt receives the snapshot inline and has an explicit "facts not in the snapshot are forbidden" rule; `recent_signal` carries a URL so a reviewer can verify in one click; openers with `opener_confidence` under 0.5 are flagged for rewrite, not sent.
- **Cost-per-row escalation if the ICP filter is loose.** A rubric that scores most rows 7+ defeats the threshold gate; opener generation runs on every row and per-row cost rises 3-4x. Guard: the skill logs a `score_distribution` summary at the end of each batch (`{ "1-3": N, "4-6": N, "7-10": N }`); if more than 60% of rows land 7+ across a 1K-row sample, the skill prints a warning to tighten the rubric before the next batch.
- **Rate limits on homepage fetches.** Hitting one root domain per row is fine; hitting `/about` and `/customers` triples the request count and some hosts throttle. Guard: per-host concurrency capped at 2, per-host minimum delay 250ms, and 429s are honored with a single back-off retry before the row is marked `status: unreachable`.
- **Stale enrichment.** A row enriched 90 days ago is not the same signal as one enriched today; reps treating old `recent_signal` values as fresh write embarrassing openers. Guard: every record carries an `enriched_at` timestamp, and the Clay column is configured to re-run when `enriched_at` is older than 30 days.
# ICP rubric — TEMPLATE
> Replace every `{...}` placeholder with your team's real values before
> the first run. The lead-enrichment skill checks for unsubstituted
> placeholders and refuses to score against a template — it returns
> `score: null, reason: "rubric not configured"` instead of guessing.
## Firmographics
| Dimension | In-ICP | Stretch | Out |
|---|---|---|---|
| Industry | {list, e.g. "B2B SaaS, fintech, vertical SaaS"} | {adjacent industries} | {hard out, e.g. "consumer, agencies, edu"} |
| Headcount | {range, e.g. "200-2000"} | {range, e.g. "100-200, 2000-5000"} | {below/above, e.g. "<100, >5000"} |
| Revenue | {range, e.g. "$25M-$500M ARR"} | {range} | {below/above} |
| Geo | {regions, e.g. "US, Canada, UK"} | {regions, e.g. "EU, ANZ"} | {regions, e.g. "China, Russia, sanctioned"} |
| Funding stage | {stages, e.g. "Series B-D, public"} | {stages} | {stages, e.g. "pre-seed, bootstrapped <$5M"} |
## Technographics
Tools whose presence on the prospect's stack signals fit (we win when they have these because the integration story is short or the pain is acute):
- {Tool 1 — e.g. "Salesforce + Outreach"}
- {Tool 2 — e.g. "dbt + Snowflake"}
- {Tool 3 — e.g. "Segment"}
Tools whose presence signals misfit (we lose to them, or their presence implies a build-not-buy culture):
- {Tool A — e.g. "Hightouch (we lose to)"}
- {Tool B — e.g. "in-house reverse-ETL (build-not-buy)"}
## Pain ranking
When the skill produces an ICP score, it weights against this priority order. Higher = more important to surface first.
1. {Pain category — e.g. "outbound data quality blocking pipeline"} — weight 5
2. {Pain category — e.g. "rep ramp time eating quota"} — weight 4
3. {Pain category — e.g. "RevOps reporting backlog"} — weight 3
4. {Pain category — e.g. "tooling sprawl / consolidation push"} — weight 2
5. {Pain category — e.g. "compliance / data-residency"} — weight 1
## Score interpretation
The skill returns `1-10`. Suggested interpretation (override per team):
- **9-10** — direct ICP, multiple positive signals, no disqualifiers. Send.
- **7-8** — strong fit on firmographics, signal is plausible. Send after rep glance.
- **5-6** — adjacent. Default threshold cuts here. Opener may not generate.
- **3-4** — stretch fit, weak signal. Skip unless rep has specific reason.
- **1-2** — out of ICP. Suppress.
## Disqualifiers
Single signals that drop a row out regardless of other fit. The skill flags these with `disqualifier: "..."` in the output and forces score to 1.
- {Disqualifier 1 — e.g. "uses {Competitor} as system of record"}
- {Disqualifier 2 — e.g. "regulated industry without our SOC 2 + HIPAA"}
- {Disqualifier 3 — e.g. "headcount under 50 — wrong motion"}
## Last edited
{YYYY-MM-DD} — bump on every material change. The skill prepends a "rubric may be stale" note when this date is more than 90 days old.
# Opener style guide — TEMPLATE
> Replace the placeholders with your team's actual voice rules. The
> lead-enrichment skill loads this file on every opener generation.
> Treat it as the single source of truth — do not duplicate rules into
> the prompt itself, where they will drift.
## Voice
One sentence on register: {e.g. "direct, plain, no jargon, no exclamation marks, no questions in the close unless the question is specific"}.
One sentence on stance: {e.g. "we're peers solving the same problem, not vendors selling at"}.
## Length
- Hard cap: 50 words. The skill enforces this and truncates with an ellipsis warning if the model overshoots.
- Soft target: 30-40 words. Three sentences max.
- No greeting line ("Hi {name},"). The sequence step adds the greeting; baking one into the opener double-greets.
## What every opener must contain
1. A specific reference to one thing from the snapshot — preferably `recent_signal`. If there is no recent signal, the opener references the `value_prop` and labels itself low-confidence.
2. A reason the reference matters to the operator's persona. Not "I noticed X" — "X is the move ops leaders ask us about".
3. A single, specific close. Either a 20-min ask or a question that would be answerable only by someone inside the company.
## Banned phrases
The prompt rejects any opener containing these. Add team-specific additions over time.
- "I hope this email finds you well"
- "I noticed"
- "I came across your"
- "love what you're doing"
- "quick question"
- "circling back" (in a first touch)
- Any superlative — "amazing", "incredible", "best-in-class", "leading"
- Any em-dash trio that reads as an LLM tic — "X — Y — Z"
- Compliments on growth/funding without a cited source
## Banned structures
- The "I was just on your website" opener. Always reads false.
- The "we work with companies like {logo}" name-drop in sentence one.
- The "are you the right person?" close. Wastes the reply.
## Confidence signal
The skill returns `opener_confidence: 0.0-1.0`:
- **0.8+** — opener references a dated `recent_signal` with a URL and the rep's persona is in `references/1-icp-rubric.md` buying committee.
- **0.5-0.8** — opener references `value_prop` only, or `recent_signal` is older than 60 days.
- **<0.5** — opener is generic; rep should rewrite or skip the row.
Threshold for auto-queueing into a sequence: configure per-team. Default recommendation: **never auto-queue**. The opener is a draft.
## Worked example
Snapshot: industry "scheduling for restaurants", recent_signal "March payroll launch", value_prop "schedules and pays hourly workforces".
Opener (35 words, confidence 0.78):
> Saw the March payroll launch — folding pay into the same surface as
> scheduling is the move most ops leaders ask us about. Worth a 20-min
> on whether the multi-state tax piece is hitting the same wall others
> have?
Why it scores 0.78 not 1.0: the close assumes a pain ("multi-state tax piece is hitting the same wall") that the snapshot does not directly evidence. A 1.0 opener would tie the close to a fact in the snapshot.
## Last edited
{YYYY-MM-DD}
# Source quality matrix — TEMPLATE
> Replace placeholders with the vendors actually wired into your Clay
> table. The lead-enrichment skill consults this matrix when two
> upstream sources disagree on the same field, and cites which source
> won in the per-row output for audit.
## Why this exists
Clay tables routinely stack 2-3 enrichment vendors per field — Apollo on industry, Clearbit on headcount, ZoomInfo on funding. They disagree often: Apollo will say a 200-person company is 50, Clearbit will assign the wrong industry, ZoomInfo will lag a recent acquisition by a quarter.
Without a declared preference order, the skill's snapshot step flip-flops between vendors run to run, ICP scores oscillate, and operators cannot reproduce the same row twice.
## Field-by-field preference order
Per field, list vendors top-to-bottom in trust order. The skill takes the highest-ranked vendor that returned a non-empty value. The homepage-derived value (extracted by the skill itself in step 1) is always the tiebreaker if all vendors disagree with each other.
### Industry / sector
1. {e.g. "Homepage About-page extraction (skill step 1)"}
2. {e.g. "Clearbit Reveal"}
3. {e.g. "Apollo Company"}
4. {e.g. "ZoomInfo Industry"}
### Headcount
1. {e.g. "LinkedIn employee count via Clay's Linked-In Profile column"}
2. {e.g. "Apollo employee_count"}
3. {e.g. "Clearbit metrics.employees"}
4. Skill homepage extraction (last resort — copy claims often inflated)
### Revenue / ARR
1. {e.g. "ZoomInfo revenue (paid only)"}
2. {e.g. "Clearbit metrics.annualRevenue (estimated, low confidence)"}
3. Omit (do not extract from homepage; vanity metrics are not revenue)
### Funding stage / last round
1. {e.g. "Crunchbase via Clay's native column"}
2. {e.g. "Press-release fetch in skill step 1, only if dated within 90d"}
3. Omit
### Recent signal (launches, hires, news)
1. Skill homepage / blog fetch in step 1 — must carry a dated URL
2. {e.g. "Clay's Built-With column for tech-stack changes"}
3. {e.g. "Google News column, capped to last 60 days"}
Important: a vendor returning "we don't know" is treated as missing, not as a contradicting value. The next vendor in the list is consulted.
## Conflict logging
When two vendors return different values for the same field, the skill adds a `conflicts` block to the per-row output:
```json
"conflicts": [
{ "field": "headcount", "values": { "linkedin": 380, "apollo": 120 }, "chose": "linkedin" }
]
```
Operators should sample 10-20 conflict rows weekly. If one vendor is consistently wrong, demote it in this matrix. The matrix is the control surface, not the prompt.
## Vendor cost ceiling
Each vendor has a per-row cost (Clay credits or external API spend). Set a soft cap here — if a row's combined enrichment cost exceeds the cap, skip the lowest-ranked vendor and accept lower confidence.
- Soft cap per row: {e.g. "12 Clay credits"}
- Hard cap per row: {e.g. "20 Clay credits — abort row"}
The skill respects these caps and emits `cost_ceiling_hit: true` when the hard cap fires, so the batch summary surfaces the count.
## Last edited
{YYYY-MM-DD}