ooligo
claude-skill

Reference-check synthesis with Claude

Difficulty
intermediate
Setup time
30min
For
recruiter · talent-acquisition · hiring-manager
Recruiting & TA

Stack

A Claude Skill that takes a recruiter’s reference-call notes (raw transcript or recorded summary), the candidate’s resume, and the role rubric, and produces a structured reference report: per-dimension assessment with verbatim quotes, contradictions between references, areas the references didn’t cover (so the recruiter knows what to ask the next reference), and an overall confidence band — never a hire/no-hire recommendation. Replaces the recruiter’s 90-minute write-up with a 15-minute review-and-edit loop while preserving the auditability of the reference data.

When to use

  • You completed two or more reference calls and have either a transcript (Fathom, Gong call recordings, or detailed notes) or call summaries.
  • The role has a written rubric (the same one used in structured interviewing) so the synthesis can be dimension-aware.
  • You want the references’ claims auditable later — every assertion in the report must trace to a verbatim quote from the call notes, with the reference’s name and the call timestamp.

When NOT to use

  • Generating a hire/no-hire recommendation. The skill produces a structured assessment with confidence per dimension. The hire decision sits with the hiring manager and the interview debrief. Wiring the skill output to a decision triggers the same automated-decision-making concerns as auto-rejection in screening.
  • Replacing the reference call itself. The skill processes notes; it does not interview references. Auto-emailing references with a form (“AI-generated reference questionnaire”) produces low-quality data and erodes the reference’s willingness to speak candidly on future calls.
  • Recording calls without consent. Most US states are one-party consent for the recruiter to record; a few (CA, IL, FL, MD, MA, MI, MT, NH, PA, WA) are two-party. EU is GDPR — recorded calls need an explicit lawful basis. The skill processes notes regardless of how they were captured; it does not authorize recording.
  • Backchannel references the candidate didn’t approve. Different consent posture, different workflow, different legal exposure.

Setup

  1. Drop the bundle. Place apps/web/public/artifacts/reference-check-summary-skill/SKILL.md into your Claude Code skills directory.
  2. Reuse the role rubric. The skill reads the same rubric file used for screening and structured interviews. If your team doesn’t have a shared rubric, the interview question bank pack is the prerequisite.
  3. Configure the consent record. The skill writes a consent_check field per reference (was the call recorded? did the candidate authorize the reference? did the reference consent to processing of the notes?). If any answer is no or unknown, the report is flagged with a consent-warning header.
  4. Dry-run on a closed hire. Process the references for a candidate hired last quarter. Compare the skill’s report to your own contemporaneous write-up. Tune the rubric anchors if the skill weighs dimensions differently than the team did.

What the skill actually does

Five steps. The order matters: the consent and rubric grounding happen before the synthesis, because a synthesis without consent or rubric grounding is just a re-narration of the calls.

  1. Validate consent. Check consent_check per reference. Missing or unknown consent → emit a warning header on the report (“Consent not recorded for reference R2 — verify before sharing report”) and continue. Do not block; the recruiter may know consent was given verbally and forgot to log it.
  2. Ground in the rubric. Read the role rubric. The synthesis dimensions are the rubric dimensions, not generic ones (“communication,” “leadership”). If the rubric has skill_match, level_fit, ownership_signal, team_collaboration, those are the report’s headings.
  3. Per-dimension synthesis. For each rubric dimension, extract every quote from the call notes that bears on the dimension. Group by reference. Tag each quote with strength (strong-positive, weak-positive, neutral, weak-negative, strong-negative). Quotes are verbatim from the notes; paraphrasing is not allowed because it strips the auditability the skill exists to provide.
  4. Surface contradictions and gaps. Identify dimensions where two references diverge (one strong-positive, another weak-negative) and surface the contradiction explicitly. Identify dimensions the references didn’t cover (no quote found) and surface those as gaps so the recruiter knows what to ask the next reference, or what the rubric ranking step has to lean on instead.
  5. Confidence band per dimension, no overall recommendation. For each dimension, return a confidence band: high (multiple references converge with strong-positive or strong-negative), medium (mixed but convergent), low (single reference or contradiction), not assessed. Do not return an overall hire/no-hire score. The decision sits with the hiring manager.

Cost reality

Per candidate report (typically 2-4 references, 60-90 minutes of total call time, 4-8K words of notes), on Claude Sonnet 4.6:

  • LLM tokens — typically 12-20k input (notes + rubric + skill instructions) and 2-4k output (structured report). At Sonnet 4.6 list pricing, roughly $0.10-0.18 per candidate. A team running 20 reference cycles per month spends $2-4 in model cost.
  • Recruiter time — the win. Hand-writing a structured reference report is 60-90 minutes per candidate. Reviewing the skill’s report and editing tone or adding context is 15-25 minutes. The bigger time saver is on the contradictions section, which a recruiter often misses on a first pass through their own notes.
  • Setup time — 30 minutes once for the rubric integration and consent-check format. Each role’s rubric is reused, so the marginal setup per role is zero.

Success metric

Track two numbers:

  • Hiring-manager satisfaction with the report — a 1-5 score the hiring manager gives after the debrief, on whether the report surfaced the right dimensions and didn’t bury the contradictions. Should sit at 4+ on a calibrated rubric.
  • Reference-cycle time — wall-clock from “last reference completed” to “hiring manager has the report.” Should drop from 1-2 days to under 2 hours.

vs alternatives

  • vs hand-written report. Hand-written is the right call for the highest-stakes hires (executive, board-facing) where the recruiter’s narrative voice is the deliverable. The skill earns its setup cost on the 80% of hires where the structured artifact is what the team needs.
  • vs ATS-native reference automation (Greenhouse Reference Check, Crosschq, SkillSurvey). Those products own the reference collection (questionnaire-style references via email). Pick them if your firm prefers async questionnaire references. Pick this skill if your team prefers live calls and the bottleneck is the synthesis afterward. The two are complementary; the skill works on questionnaire output too.
  • vs ChatGPT-style “summarize these reference notes.” Generic chat returns a paragraph that reads well and buries the contradictions. The Skill is structurally different: it forces per-dimension grouping, requires verbatim quotes, refuses to author an overall recommendation.

Watch-outs

  • Hindsight bias on high-confidence references. Guard: the report’s structure forces per-dimension grouping rather than reference-led narrative, which makes it harder for one strongly-opinionated reference to dominate the read.
  • Hallucinated quotes. Guard: the skill is constrained to verbatim extraction. Quotes that don’t appear in the call notes verbatim are forbidden; the prompt explicitly directs the model to omit a dimension if no quote can be cited rather than paraphrase.
  • Over-weighting one reference. Guard: contradictions are surfaced explicitly, with both quotes side by side. The report’s confidence-band logic downgrades to low on dimensions where references diverge, which prevents a confident-but-mistaken read.
  • Implicit hire recommendation through ordering. Guard: the report orders dimensions by the rubric, not by the reference’s enthusiasm. Strong-positive quotes do not float to the top; they land in the dimension they belong to.
  • Consent and recording exposure. Guard: the consent-check field per reference is required input; missing consent triggers a warning header. The skill processes notes regardless of recording status, but it does not absolve the recruiter of the underlying consent obligation.
  • Bias in the underlying rubric carrying through. Guard: if the rubric has dimensions that fail a fairness check (“culture fit” without anchors, school-tier scoring), the synthesis inherits the bias. Run the rubric through the diversity slate auditor for the role’s pool first.

Stack

The skill bundle lives at apps/web/public/artifacts/reference-check-summary-skill/ and contains:

  • SKILL.md — the skill definition
  • references/1-report-format.md — the literal output template (per-dimension headings, confidence-band scale, contradictions section)
  • references/2-consent-checklist.md — the consent-check schema and warning-header rules

Tools the workflow assumes you use: Claude (the model). Optional: Fathom or Gong for call recording; Ashby for the candidate record. For the parallel interview-debrief workflow, see the interview debrief summary skill.

Related concepts: structured interviewing, quality of hire, candidate experience.

Files in this artifact

Download all (.zip)