claude-skill

Claudeによるリファレンスチェックの統合

Difficulty

中級

Setup time

30min

For

recruiter · talent-acquisition · hiring-manager

Recruiting & TA

Stack

採用担当者のリファレンスコールのメモ（生のトランスクリプトまたは録音サマリー）、候補者の職歴書、役職ルーブリックを受け取り、構造化されたリファレンスレポートを生成するClaude スキルです。レポートには、ルーブリックの各ディメンションの評価と逐語引用、リファレンス間の矛盾、リファレンスがカバーしていない領域（次のリファレンスに何を聞くべきかを採用担当者が把握できるよう）、全体的な信頼度バンドが含まれます。採用可否の推奨は決して行いません。採用担当者の90分の記述作業を15分のレビュー・編集ループに置き換えながら、リファレンスデータの監査可能性を維持します。

使用すべきタイミング

2回以上のリファレンスコールを完了し、トランスクリプト（Fathom、Gongのコール録音、または詳細なメモ）またはコールサマリーを持っている場合。
役職に書面によるルーブリック（構造化面接で使用したものと同じもの）があり、統合がディメンションを意識できる場合。
リファレンスの主張が後で監査可能であることを望む場合。レポートのすべての主張は、コールメモからの逐語引用に、リファレンスの名前とコールのタイムスタンプとともにトレースできなければなりません。

使用すべきでないタイミング

採用可否の推奨を生成する場合。 スキルはディメンションごとの信頼度を含む構造化評価を生成します。採用の決定は採用マネージャーと面接デブリーフにあります。スキルのアウトプットを意思決定に直結させると、スクリーニングでの自動却下と同様の自動意思決定に関する懸念が生じます。
リファレンスコール自体を代替する場合。 スキルはメモを処理しますが、リファレンスにインタビューしません。リファレンスにフォーム付きの自動メール送信（「AI生成リファレンス質問票」）は低品質なデータを生成し、リファレンスが将来のコールで率直に話す意欲を損ないます。
同意なしにコールを録音する場合。 米国のほとんどの州では採用担当者が録音する際に一者同意で足りますが、一部の州（カリフォルニア、イリノイ、フロリダ、メリーランド、マサチューセッツ、ミシガン、モンタナ、ニューハンプシャー、ペンシルバニア、ワシントン）では二者同意が必要です。EUではGDPRが適用され、録音コールには明示的な法的根拠が必要です。スキルはメモをどのように取得したかに関わらず処理しますが、録音を承認するものではありません。
候補者が承認していないバックチャンネルリファレンスの場合。 同意の立場、ワークフロー、法的リスクが異なります。

セットアップ

バンドルをドロップします。 apps/web/public/artifacts/reference-check-summary-skill/SKILL.md をClaude Codeのスキルディレクトリに配置します。
役職ルーブリックを再利用します。 スキルはスクリーニングと構造化面接に使用する同じルーブリックファイルを読み取ります。チームに共有ルーブリックがない場合は、面接質問バンクプロンプトパックが前提条件です。
同意記録を設定します。 スキルはリファレンスごとに consent_check フィールドを書き込みます（コールは録音されたか？候補者はリファレンスを承認したか？リファレンスはメモの処理に同意したか？）。いずれかの回答が no または unknown の場合、レポートには同意警告ヘッダーが付きます。
クローズ済みの採用案件でドライランを行います。 先四半期に採用した候補者のリファレンスを処理します。スキルのレポートと自身のその時点での記述を比較します。スキルがチームとは異なるディメンションの重み付けをしている場合はルーブリックのアンカーを調整します。

スキルの動作

5つのステップ。順序が重要です。同意とルーブリックのグラウンディングは統合の前に行います。同意またはルーブリックのグラウンディングなしの統合は、コールの単なる再叙述に過ぎないからです。

同意を検証します。 リファレンスごとに consent_check を確認します。同意が欠落または unknown の場合、レポートに警告ヘッダーを付けて（「リファレンスR2の同意が記録されていません — レポートを共有する前に確認してください」）続行します。ブロックしません。採用担当者は口頭で同意を得たが記録を忘れた可能性があります。
ルーブリックにグラウンディングします。 役職ルーブリックを読み取ります。統合のディメンションはルーブリックのディメンションであり、汎用的なもの（「コミュニケーション」、「リーダーシップ」）ではありません。ルーブリックに skill_match、level_fit、ownership_signal、team_collaboration がある場合、それらがレポートの見出しになります。
ディメンションごとの統合。 ルーブリックの各ディメンションについて、コールメモからそのディメンションに関連するすべての引用を抽出します。リファレンスごとにグループ化します。各引用に強度（strong-positive、weak-positive、neutral、weak-negative、strong-negative）をタグ付けします。引用はメモからの逐語引用です。言い換えはスキルが提供する監査可能性を失わせるため許可されません。
矛盾とギャップを表面化します。 2つのリファレンスが異なる評価をしているディメンション（一方が strong-positive、もう一方が weak-negative）を特定し、矛盾を明示的に表示します。リファレンスがカバーしていないディメンション（引用なし）を特定し、ギャップとして表示することで、採用担当者が次のリファレンスに何を聞くべきか、またはルーブリックのランキングステップが代わりに頼るべきものを把握できます。
ディメンションごとの信頼度バンド、全体的な推奨なし。 各ディメンションについて信頼度バンドを返します：high（複数のリファレンスが strong-positive または strong-negative で一致）、medium（混在するが収束傾向）、low（単一リファレンスまたは矛盾）、not assessed。全体的な採用可否スコアは返しません。決定は採用マネージャーにあります。

コストの現実

候補者レポートあたり（通常2〜4件のリファレンス、合計60〜90分のコール時間、4,000〜8,000語のメモ）、Claude Sonnet 4.6では：

LLMトークン — 通常12,000〜20,000インプット（メモ+ルーブリック+スキル指示）と2,000〜4,000アウトプット（構造化レポート）。Sonnet 4.6の定価で、候補者あたり約$0.10〜0.18。月に20件のリファレンスサイクルを行うチームのモデルコストは$2〜4。
採用担当者の時間 — 大きな節約。構造化リファレンスレポートの手書きは候補者あたり60〜90分。スキルのレポートをレビューしてトーンを調整したりコンテキストを追加したりするのは15〜25分。最大の時間節約は矛盾セクションで、採用担当者が自分のメモを最初に読む際に見落とすことが多い部分です。
セットアップ時間 — ルーブリック統合と同意確認フォーマットのために一度30分。各役職のルーブリックは再利用されるため、役職ごとの追加セットアップはゼロです。

成功指標

2つの数値を追跡します：

レポートに対する採用マネージャーの満足度 — デブリーフ後に採用マネージャーがレポートが適切なディメンションを表面化し矛盾を埋もれさせなかったかについて付ける1〜5の点数。キャリブレーションされたルーブリックで4以上を目指します。
リファレンスサイクル時間 — 「最後のリファレンス完了」から「採用マネージャーがレポートを受け取る」までの実時間。1〜2日から2時間以内に短縮されるべきです。

代替手段との比較

vs 手書きレポート。 採用担当者のナラティブボイスが成果物である最高水準の採用（エグゼクティブ、取締役会向け）では手書きが適切です。スキルは、チームが必要とするものが構造化アーティファクトである採用の80%でセットアップコストを回収します。
vs ATSネイティブのリファレンス自動化（Greenhouse Reference Check、Crosschq、SkillSurvey）。 これらのプロダクトはリファレンスの収集（メール経由の質問票形式）を担います。非同期の質問票形式のリファレンスを好む場合はこれらを選択してください。チームがライブコールを好み、ボトルネックが後の統合作業にある場合はこのスキルを選択してください。両者は補完的であり、スキルは質問票のアウトプットでも機能します。
vs ChatGPT形式の「これらのリファレンスメモをまとめて」。 汎用チャットは読みやすいが矛盾を埋もれさせた段落を返します。スキルは構造的に異なります：ディメンションごとのグループ化を強制し、逐語引用を要求し、全体的な推奨の作成を拒否します。

注意点

高信頼度のリファレンスに対する後知恵バイアス。 対策： レポートの構造はリファレス主導のナラティブではなくディメンションごとのグループ化を強制します。これにより、強い意見を持つ1件のリファレンスが全体の印象を支配しにくくなります。
引用の幻覚。 対策： スキルは逐語抽出に制約されています。コールメモに逐語的に存在しない引用は禁止されており、プロンプトはモデルに対して引用を引用できない場合はそのディメンションを省略するよう明示的に指示しています。
1件のリファレンスへの過剰な重み付け。 対策： 矛盾は両方の引用を並べて明示的に表示されます。レポートの信頼度バンドロジックはリファレンスが相違するディメンションを low に格下げし、自信に満ちた誤読を防ぎます。
順序付けによる暗黙の採用推奨。 対策： レポートはリファレンスの熱意ではなくルーブリックの順序でディメンションを並べます。strong-positive の引用は上位に浮上せず、属するディメンションに収まります。
同意と録音に関するリスク。 対策： リファレンスごとの同意確認フィールドは必須入力です。同意が欠落している場合は警告ヘッダーが表示されます。スキルは録音状況に関わらずメモを処理しますが、基本的な同意義務から採用担当者を免除するものではありません。
基礎ルーブリックのバイアスの引き継ぎ。 対策： ルーブリックに公平性チェックに失敗するディメンション（アンカーのない「カルチャーフィット」、学校のランクによるスコアリング）がある場合、統合はそのバイアスを引き継ぎます。まず役職の候補者プールに対してダイバーシティスレート監査スキルでルーブリックを検証してください。

スタック

スキルバンドルは apps/web/public/artifacts/reference-check-summary-skill/ にあります：

SKILL.md — スキル定義
references/1-report-format.md — 実際のアウトプットテンプレート（ディメンションごとの見出し、信頼度バンドスケール、矛盾セクション）
references/2-consent-checklist.md — 同意確認スキーマと警告ヘッダールール

ワークフローが想定するツール：Claude（モデル）。オプション：コール録音用のFathomまたはGong；候補者レコード用のAshby。並行する面接デブリーフワークフローについては面接デブリーフサマリースキルを参照してください。

関連概念：構造化面接、採用品質、候補者体験。

GitHubでこのページを編集

Files in this artifact

Download all (.zip)

---
name: reference-check-summary
description: Take reference-call notes (transcript or summary) plus the role rubric, and produce a structured per-dimension reference report with verbatim quotes, contradictions surfaced, and per-dimension confidence bands. Never authors an overall hire/no-hire recommendation — the decision sits with the hiring manager.
---

# Reference-check synthesis

## When to invoke

Use this skill when a recruiter has completed two or more reference calls and has notes (transcript, recorded call summary, or detailed manual notes) plus the role rubric. Take the notes plus rubric as input and return a structured Markdown report.

Do NOT invoke this skill for:

- **Generating a hire/no-hire recommendation.** This skill produces structured assessment with confidence per dimension. The hire decision sits with the hiring manager and the interview debrief.
- **Replacing the reference call itself.** This skill processes notes; it does not interview references. AI-generated reference questionnaires erode the reference's willingness to speak candidly.
- **Recording calls without consent.** The skill processes notes regardless of recording status, but does not authorize recording. Two-party-consent jurisdictions and EU GDPR have explicit lawful-basis requirements.
- **Backchannel references the candidate did not approve.** Different consent posture, different workflow.

## Inputs

- Required: `notes_dir` — path to a directory of per-reference Markdown files. Each file: `R1.md`, `R2.md`, etc., with the reference's name, role, relationship, call date, and notes.
- Required: `rubric` — path to the role rubric file. The rubric's dimensions become the report's headings.
- Required: `consent_log` — path to a per-reference consent record (see `references/2-consent-checklist.md`).
- Optional: `candidate_resume` — path to the resume. Used to ground statements like "the reference confirmed the deal mentioned on the resume" rather than re-narrating the resume.

## Reference files

Always read these from `references/`:

- `references/1-report-format.md` — the literal output format. Per-dimension headings come from the rubric, not from this file.
- `references/2-consent-checklist.md` — the consent-check schema and the warning-header rules.

## Method

Five steps, in order.

### 1. Validate consent

Open `consent_log`. For each reference, check four fields: `candidate_authorized` (the candidate gave the recruiter permission to call this person), `recording_consent` (if the call was recorded), `notes_processing_consent` (the reference was told the notes might be processed by AI), `jurisdiction` (which state / country the reference was in during the call).

If any field is `unknown` or `no`, do NOT halt — emit a warning header at the top of the report and continue. The recruiter may have collected consent verbally and forgotten to log it; the warning surfaces the gap for them to verify before sharing the report.

If `recording_consent: no` and `jurisdiction` is in `[CA, IL, FL, MD, MA, MI, MT, NH, PA, WA]` or any EU country, the warning header upgrades to a halt: "Two-party consent jurisdiction; recording without consent is illegal. The skill will not process the notes from this reference. Verify consent and re-run with `consent_log` updated, or omit this reference."

### 2. Ground in the rubric

Read the rubric. The synthesis dimensions ARE the rubric dimensions, not generic ones. If the rubric has `skill_match`, `level_fit`, `ownership_signal`, `team_collaboration`, those are the report's section headings.

If the rubric has dimensions that fail a fairness check (school-tier scoring, "culture fit" without anchors, employment-gap penalties), surface them but proceed — the rubric is upstream of this skill, and the right fix is at the rubric layer, not by silently dropping dimensions here.

### 3. Per-dimension synthesis

For each rubric dimension, read every reference's notes and extract every quote that bears on the dimension. A quote is a verbatim string from the notes; paraphrasing is not allowed. If you cannot extract a verbatim quote for a reference's view on a dimension, the cell stays empty and the dimension's confidence band reflects the gap.

Tag each quote with strength on a 5-level scale:

- `strong-positive` — explicit named outcome, clear ownership, the reference stakes their credibility on it.
- `weak-positive` — observed positive behavior but no named outcome or scope.
- `neutral` — descriptive without judgment.
- `weak-negative` — observed gap or hesitation, qualified.
- `strong-negative` — explicit disqualifying behavior named, with scope.

### 4. Surface contradictions and gaps

For each dimension, compare the per-reference assessments. If two references diverge by ≥2 levels (e.g. one `strong-positive`, one `weak-negative`), surface the contradiction explicitly with both quotes side by side. Do NOT average or smooth — the contradiction IS the signal.

For each dimension, identify gaps: dimensions no reference covered. List them in a "Coverage gaps" section. The recruiter uses this to decide what to ask the next reference, or what the rubric ranking step has to lean on instead.

### 5. Confidence band per dimension

For each dimension, return a confidence band:

- `high` — multiple references converge with strong-positive or strong-negative quotes.
- `medium` — references mostly converge, weak-positive / weak-negative quotes, no contradictions.
- `low` — single reference, contradiction surfaced, or only weak-strength quotes.
- `not assessed` — no reference covered the dimension.

Do NOT return an overall hire/no-hire score. The report ends after the last dimension's confidence band.

## Output format

See `references/1-report-format.md` for the literal template. The shape is:

```
# Reference report — {Candidate name} — {Role}

[CONSENT WARNING HEADER if any reference's consent is missing]

## References

| ID | Name | Role | Relationship | Call date |
|---|---|---|---|---|
| R1 | ... | ... | ... | ... |

## Per-dimension synthesis

### {Dimension 1 from rubric}

**Confidence: {band}**

| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "..." |
| R2 | weak-positive | "..." |

[CONTRADICTION block if R1 and R2 diverge ≥2 levels]

### {Dimension 2 from rubric} ...

## Coverage gaps

Dimensions no reference addressed:
- {dimension X} — recruiter to ask R3 or rely on rubric ranking step.

## Provenance

- Rubric: `{path}` — SHA `{short}`
- Notes: `{notes_dir}` — N references processed
- Generated: `{ISO timestamp}`
```

## Watch-outs

- **Hallucinated quotes.** *Guard:* the prompt forbids paraphrasing; quotes must appear verbatim in the input notes. If you cannot find a verbatim quote for a reference's view on a dimension, the cell is empty and the confidence band drops.
- **Hindsight bias.** *Guard:* the report is structured per-dimension, not per-reference. A strongly opinionated reference cannot dominate the narrative because the report doesn't have a narrative — it has a table per dimension.
- **Implicit recommendation via ordering.** *Guard:* dimensions are ordered by rubric, not by reference enthusiasm. Strong-positive quotes do not float to the top.
- **Consent gaps.** *Guard:* warning header on missing consent; halt on illegal recording in two-party jurisdictions.
- **Bias inheritance from rubric.** *Guard:* surfaced but not silently dropped — the right fix is at the rubric layer, upstream of this skill.

# Reference report format

This is the literal output template the skill writes. Every report follows this shape so downstream consumers (hiring manager, recruiting coordinator, audit reviewer) read predictable structure.

## Template

```markdown
# Reference report — {Candidate name} — {Role title}

Generated: {ISO timestamp} · Rubric SHA: {short hash} · Skill version: 1.0

{CONSENT WARNING HEADER — present only if any reference has missing consent — see consent-checklist.md}

## References

| ID | Name | Role | Relationship to candidate | Call date | Duration |
|---|---|---|---|---|---|
| R1 | Jamie Liu | VP Eng, Acme Fintech | Direct manager (2y) | 2026-04-28 | 45m |
| R2 | Sam Park | Senior IC peer, Acme Fintech | Cross-team collaborator (1y) | 2026-04-30 | 30m |

## Per-dimension synthesis

### Skill match — production Go and distributed-systems experience

**Confidence: high**

| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Owned the entire payments routing rewrite in Go — moved from synchronous to event-driven, took our P99 from 800ms to 180ms over Q3." |
| R2 | strong-positive | "When we needed someone to actually understand the consensus layer in our state machine, Jamie was the only person who could explain why the failover semantics were broken." |

### Level fit — Senior IC scope, cross-team influence

**Confidence: medium**

| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Was effectively the tech lead on the routing team — running the design reviews, mentoring two juniors." |
| R2 | weak-positive | "Came over to our team for the integration work — drove the meetings but it was a smaller scope, just three of us." |

*Note: confidence is medium because R2's scope was a single integration; R1's scope was a multi-quarter team-leadership signal. The strong-positive on team-lead scope only comes from R1.*

### Team collaboration — handles disagreement well

**Confidence: low**

| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Pushed back on a design I'd already approved, with data — turned out he was right and we caught a P0 before it shipped." |
| R2 | weak-negative | "Sometimes the pushback comes across as harsh in the moment — I had to mediate once between Jamie and one of our front-end folks." |

**⚠️ Contradiction surfaced.** R1 and R2 diverge by 2 levels on this dimension. R1's framing is that the pushback is principled and outcome-positive; R2's framing is that the delivery has interpersonal cost. Recruiter to surface this in the hiring-manager debrief.

### Ownership signal — sees work through to outcome

**Confidence: high**

| Reference | Strength | Quote |
|---|---|---|
| R1 | strong-positive | "Stayed on the routing project through the post-launch operational phase — wasn't the kind of engineer who hands off after launch." |
| R2 | strong-positive | "When the integration work hit a snag with our auth team, Jamie went and unblocked it himself rather than escalating." |

## Coverage gaps

Dimensions the references did not address (no verbatim quote found):

- **Response to ambiguity** — neither reference described a situation where the candidate had to act under unclear requirements. Recruiter to ask R3, or rely on the structured-interview step that probes this.
- **Customer-facing scope** — no quotes on the candidate's interaction with customers or with non-technical stakeholders. If the role requires customer-facing work, this gap matters.

## Provenance

- Rubric: `data/rubrics/senior-backend-engineer.json` — SHA `a3f2b1c4d5e6f7a8`
- Notes: `data/references/jamie-liu/` — 2 references processed
- Consent log: `data/references/jamie-liu/consent.json`
- Generated by: `reference-check-summary` skill v1.0 on Claude Sonnet 4.6
- Generated at: 2026-05-03T14:00:00Z
```

## Notes on the template

- **No overall hire/no-hire recommendation.** The report ends after the last per-dimension table and the coverage-gaps section. The decision sits with the hiring manager.
- **Dimension order matches the rubric.** The skill does NOT reorder by reference enthusiasm or by confidence band. The rubric's ordering reflects the team's prioritization; the report respects that.
- **Quotes are verbatim.** No paraphrasing, no smoothing. If a reference said "kinda harsh" the report says "kinda harsh," not "somewhat harsh."
- **Contradictions surface inline.** A separate "contradictions" section at the end is harder to read than inline notes per dimension.

# Consent checklist for reference processing

The reference-check-summary skill requires a per-reference consent log as input. This file documents the schema, the warning-header rules, and the halt conditions.

## Per-reference consent record

For each reference, the consent log contains:

```json
{
"reference_id": "R1",
"candidate_authorized": true,
"recording_consent": true,
"notes_processing_consent": true,
"jurisdiction": "US-NY",
"recorded": true,
"consent_collected_at": "2026-04-28T14:00:00Z",
"consent_collected_by": "recruiter-email@firm.com"
}
```

### Field definitions

- `candidate_authorized` — the candidate told the recruiter "you can call this person." Without this, the reference call should not have happened. Halt if any reference's value is `false`.
- `recording_consent` — if the call was recorded, the reference consented to recording. The skill needs this only if `recorded: true`.
- `notes_processing_consent` — the reference was told that the notes from the call may be processed by AI to generate a structured report. This is the explicit consent for the skill's processing path under GDPR Art. 6 lawful-basis requirements.
- `jurisdiction` — the state or country the reference was physically in during the call. This determines recording-consent law.
- `recorded` — whether the call was recorded.

## Warning-header rules

If any reference's consent record is missing or has `unknown`/`null` values, the report's top-of-page warning header reads:

```
⚠️ CONSENT WARNING

The following references have incomplete consent records:
- R2: notes_processing_consent is unknown.
- R3: candidate_authorized is unknown.

Verify consent before sharing this report. The skill processed the
notes regardless of the gap; the warning surfaces the gap for the
recruiter to confirm with the candidate and reference.
```

The warning is informational. The skill continues to the report. The recruiter is responsible for either confirming the missing consent (and updating the log for next time) or omitting the affected reference from the shared report.

## Halt conditions

Halt processing for a reference (skip it, do not include in the report) when:

1. **`candidate_authorized: false`** — the reference call should not have happened. Including the reference in the report would compound the underlying consent failure. Surface to the recruiter as a gap to address.

2. **`recorded: true` AND `recording_consent: false` AND `jurisdiction` is in a two-party-consent jurisdiction.** Two-party-consent jurisdictions (CA, IL, FL, MD, MA, MI, MT, NH, PA, WA in the US, plus all EU countries under GDPR) make recording without consent illegal. Processing the recorded notes compounds the violation. The skill refuses to process the reference and surfaces the issue to the recruiter.

```
HALT: R2 was recorded in CA without consent. Recording is illegal
in CA without two-party consent. The skill will not process this
reference's notes. Either delete the recording and re-interview the
reference (with consent this time), or omit the reference from the
report.
```

3. **`notes_processing_consent: false`** — the reference explicitly declined to have notes processed by AI. The skill respects that. The reference's notes can still inform the recruiter's own write-up, but they are not run through the skill.

## Why this matters

GDPR Art. 6 requires a lawful basis for processing personal data. A reference's notes ARE personal data (the reference's, and the candidate's). The lawful basis for AI processing is most commonly explicit consent or legitimate interest with a balancing test. In either case, the reference must have been informed.

NYC LL 144 and the EU AI Act focus on the candidate side, but reference data falls in the same processing pipeline. A defensible recruiting AI posture handles consent on both sides.

The skill cannot enforce that the recruiter actually collected consent. What it can enforce is that the consent is logged before processing, and that missing or contradictory consent surfaces to the recruiter rather than getting buried.

## What goes in the consent log when you didn't collect consent properly

The honest answer: omit the reference from this skill's processing. Use your own write-up. The skill's auditability comes from the consent record being trustworthy; populating it with `unknown` to make the skill run defeats the purpose.

Update your reference-call intake script to collect the four fields above as part of the call opening. The marginal time cost is 30 seconds per call.