claude-skill

Claudeによる構造化面接ループビルダー

Difficulty

中級

Setup time

30min

For

recruiter · hiring-manager · talent-acquisition

Recruiting & TA

Stack

職務記述書、ロールのレベル、必須コンピテンシー、キャリブレーションされた強みを持つ各面接官を含む面接官プールを受け取り、完全なループ設計（ステージの進行、アンカー説明付きのステージ別ルーブリック、次元別の行動質問、各選定理由付きの面接官アサインメントテーブル）を生成するClaude スキルです。ATSに何かを設定する前に、採用マネージャーのレビューゲートで停止します。「候補者がスクリーンに入ってからループを決めよう」という状況を、30分の設計パスで運用上の規律を生む形に置き換えます。

使う場面

承認済みの職務記述書、確定したレベル、このロールで採用と不採用を分ける必須コンピテンシーのリストがある。
レベル帯ごとのスコアレベル別アンカー説明を持つ構造化面接ルーブリックライブラリがある。apps/web/public/artifacts/interview-loop-builder-skill/references/1-competency-library.mdのコンピテンシーテンプレートがその形式を示しています。記入できない場合、このスキルが参照できるルーブリックはまだありません。
コンピテンシーごと・レベル帯ごとにキャリブレーションされた強みが記録された面接官プールがある。バンドル内のreferences/2-interviewer-strengths.mdのマトリクスを参照してください。
採用マネージャーがAshbyまたはGreenhouseでループを設定する前にループをレビューする。スキルはファイルを書き込んで停止します。ATSへのプッシュは行いません。

使わない場面

自動スケジューリング。 このスキルはループを設計します。面接のスケジュール設定、カレンダーのマッチング、候補者向け予約リンクの送信は行いません。それはGoodtime、Ashby Scheduling、またはGreenhouse Schedulingの役割です。設計とスケジューリングを1つのスキルに結合することは、独立して失敗すべき2つの失敗モードを結合することになります。
採用マネージャーとのルーブリック設計の代替。 スキルはコンピテンシーライブラリから引き出したスコアレベル別アンカー説明を出力しますが、ライブラリ自体（IC5においてシステム設計の5はどのように見えるか）は採用マネージャーとファンクションヘッドが所有します。ライブラリが空またはテンプレートのみの場合、スキルは拒否し、キャリブレーションされたシグナルのないファンクションのルーブリックアンカーを発明するのではなくTODOを提示します。
ロール固有のキャリブレーションのない汎用テンプレートループ。 入力にレベル、必須コンピテンシー、または対象面接官プールが明記されていない場合、スキルは拒否します。「行動面接、技術面接、システム設計、リーダーシップ」という汎用ラベルの4ステージループは構造化されているように見えますが、そうではありません。ロールの優先事項に関係なく、すべての候補者が同じ質問を受けることになり、構造化の意味が失われます。
定義された複雑さの閾値を下回るロール。 2週間の契約ロールに4ステージのループは不要です。ロールが契約社員、時給、または6ヶ月未満の予想期間の場合、スキルは警告を表示し1ステージのスクリーンを提案します。
行動面接トレーニングの代替。 スキルが出力する質問は状況/行動/結果の形式に従いますが、面接官は一貫してスコアリングするためにキャリブレーションのトレーニングが依然として必要です。スキルは足場であり、トレーニングは前提条件です。

セットアップ

バンドルを配置する。 apps/web/public/artifacts/interview-loop-builder-skill/SKILL.mdをClaudeのスキルディレクトリ（またはclaude.aiのカスタムスキル）に配置します。スキルはdesign_loopという1つの呼び出し可能な関数を公開します。
コンピテンシーライブラリを記入する。 references/1-competency-library.mdをチームリポジトリにコピーします。すべてのプレースホルダーを実際のコンピテンシー、定義、カバーするレベル帯、スコアレベル別アンカー説明に置き換えます。ライブラリがテンプレートのみの場合、スキルは実行を拒否します。
面接官強みマトリクスを記入する。 references/2-interviewer-strengths.mdをコピーします。対象面接官、所属チーム、各コンピテンシーについてスコアリングのキャリブレーションがされているレベル帯を列挙します。「最終面接」列は6ヶ月のアイドル後に再キャリブレーションを促すトリガーです。
ロールごとに入力を設定する。 特定のロールに対して、職務記述書のパス、レベル、3〜6件のコンピテンシーIDの配列、および記入済みの面接官強みマトリクスのパスを渡します。スキルはloop.mdとscorecards/以下のステージ別スコアカードスキャフォールドを出力します。
クローズドループでドライランする。 直近四半期に手動で設計したロールで実行します。スキルのステージマッピングと面接官アサインメントを手動設計と比較します。乖離がある場合、スキルのプロンプトではなく通常コンピテンシーライブラリまたは面接官マトリクスを調整する必要があります。

スキルの実際の動作

6ステップ、順番通りに実行します。この順番は重要です。決定論的な検証とマッピングがLLMによるルーブリックアンカーと質問の生成の前に行われ、最後の候補者体験パスが各ステージを個別に割り当てる際に見えない過負荷を検出するために組み立てられたループ全体を再読します。

入力を検証する。 各コンピテンシーIDがライブラリに存在すること、面接官プールがロールのレベルで各必須コンピテンシーに対してキャリブレーションされた担当者を少なくとも1名持つこと、レベルがライブラリのカバー帯内に収まること。いずれかのチェックが失敗した場合、明示的なTODOとともに停止します。ICのみのライブラリでDirectorループを設計すると、誇張されたルーブリックが生成されます。このステップがそれを防ぎます。
コンピテンシーをステージにマッピングする。 採用担当者スクリーンはフィットと基本を評価します（ルーブリックは適用しない）。HMスクリーンはトップ1〜2の差別化コンピテンシーを取り上げます。オンサイトループは可能な限り残りを1ステージに1つずつ展開します。1インタビュー1コンピテンシーのルールは意見が強く反映されています。60分のインタビューに2つのコンピテンシーを詰め込むと、両方でより浅いシグナルしか得られず、現場でルーブリックを適用することが難しくなります。
ステージ別ルーブリックを生成する。 スクリーン後の各ステージについて、コンピテンシーライブラリから候補者のレベル帯のアンカー説明を引き出します。次元ごとに状況/行動/結果の形式で3〜5件の行動質問を生成し、各質問に対して提案されるプロービングフォローアップを1件付けます。デフォルトで仮定的な「どうするか」質問は除外されます。証拠に基づいた経験より流暢な推測を評価することになるためです。
理由付きで面接官をアサインする。 スクリーン後の各ステージについて、プールから1〜3名の面接官を提案します。キャリブレーションのフィット（必須要件）、負荷（同じループで1名の面接官が複数のステージに入らない）、視点の多様性（プールが許す場合は採用チーム外の面接官を少なくとも1名）でマッチします。各アサインメントには明示的な理由文字列が付いています。
候補者体験パス。 組み立てられたループを再読します。ICで5時間、リーダーシップで7時間を超える合計アクティブ面接時間 → フラグを立てテイクホームを提案します。6名を超える異なる面接官 → ループ疲弊のフラグを立てます。同じコンピテンシーを2つのステージが調査している → 冗長なシグナルのフラグを立てます。アコモデーションなしのクロスタイムゾーンステージ → TODOを提示します。
採用マネージャーレビューゲート。 loop.mdとscorecards/<NN>-<stage-id>.mdを書き込みます。停止します。スキルは「ATSに公開する」アクションを定義しません。HMがファイルを開いて編集し、AshbyまたはGreenhouseで自らループを設定します。

リテラル出力フォーマットとスコアカードスキャフォールドのレイアウトはバンドル内のreferences/3-loop-output-format.mdにあります。フォーマットが固定されているのは、下流のコンシューマー（スコアカードを読む面接官、スコアを集計するデブリーフファシリテーター）が予測可能な構造を必要とするためです。

コスト

Claude Sonnet 4.5でのループ設計1回あたり：

LLMトークン — 通常30〜60kの入力トークン（職務記述書＋コンピテンシーライブラリ＋面接官マトリクス＋スキル指示）と10〜20kの出力トークン（ループ＋アンカーと質問を含む3〜5件のスコアカードスキャフォールド）。Sonnet 4.5ではループ設計1回あたり約$0.20〜0.40です。四半期に8件採用するファンクションはこのスキルのモデルコストで$5未満です。
採用担当者と採用マネージャーの時間 — 節約はここにあります。キャリブレーションされたルーブリック引き出しを含む手動ループ設計は、設計コールでHMと採用担当者合わせて90〜120分、質問とアサインメントの文書化にさらに30〜60分かかります。スキルはそれを生成されたループの30分レビューパスに圧縮します。ロールあたり、上位ICまたはマネージャーの時間が約90分節約されます。
セットアップ時間 — コンピテンシーライブラリと面接官マトリクスが記入されていれば、ロールあたり30分です。ライブラリとマトリクスが前提条件であり、新規の場合はコンピテンシー帯ごとにキャリブレーションセッションが必要です。これは構造化面接への投資であり、このスキルのセットアップではありません。
複合効果 — 構造化ループはあらゆる過去20年の公表研究において、場当たり的なループより優れた採用品質を生み出します。スキルの勝利は「構造化を例外ではなくデフォルトにする」ことにあり、ロールごとの設計オーバーヘッドを取り除くことで実現します。

成功指標

ATSでロールごと四半期ごとに3つの数値を追跡します。

ループ設計のリードタイム — 「ロール承認」から「ATSでループ設定完了」までの時間（時間単位）。スキルをループに組み込んだ後は大幅に短縮されるはずです。短縮されない場合、ボトルネックは設計ではなくHMレビューです。ロールキックオフシーケンスの早い段階でループを提示してください。
ルーブリックに対する評価者間合意 — コンピテンシー次元ごとに、面接官の独立したスコアが1ポイント以内に収まる頻度。キャリブレーションされたコンピテンシーでは80%以上に到達すべきです。それ以下の場合、調整すべきはスキルではなくコンピテンシーライブラリのアンカー説明です。
12ヶ月時点の採用品質 — ループが移動させることを目的とした長期的な指標。同じロールファミリーにおいて、スキルで設計されたループと場当たり的なループで採用されたコホートを比較します。スキル設計コホートがアウトパフォームしない場合、構造化を放棄するのではなく、コンピテンシー対ステージのマッピングを再検討してください。

代替手段との比較

vs Ashbyの構造化面接テンプレート — Ashbyは設定済みループ、スコアカードレンダリング、デブリーフを1つのプロダクトで管理します。管理されたUXが必要で、チームがATSで作業するならAshbyのテンプレートを選んでください。ルーブリックアンカー、面接官強みマトリクス、コンピテンシー対ステージのマッピングを自分のリポジトリでバージョン管理し、コンピテンシーライブラリが進化するにつれて設計ステップを交換可能にしたいならこのスキルを選んでください。スキルの出力はAshbyのループ設定への入力であり、代替ではありません。
vs 汎用テンプレートループ — すべてのATSはデフォルトの4ステージテンプレート（「電話スクリーン、HMスクリーン、技術面接、オンサイトパネル」）を提供します。一見構造化されているように見えますが、そうではありません。同じテンプレートがバックエンドIC4とCSマネージャーM2に、各ロールで採用と不採用を実際に分けるコンピテンシーに関係なく同じ汎用質問と一緒に適用されます。スキルは設計がロール固有のキャリブレーションに基づいているため、2回目のロールから30分のセットアップコストを回収できます。
vs 採用マネージャーによるDIYループ設計 — 上位のHMはゼロから優れたループを90〜120分で設計できます。ただし通常はそうしません。締め切りのプレッシャーの下では、フィットに関係なく前回実行したループを再利用する傾向があります。スキルの勝利は「ピーク時の経験豊富なHMより優れた設計ができる」ことではなく、「すべてのロールとすべての週にわたって、経験豊富なHMと同等の設計を一貫してできる」ことです。その一貫性が複合効果です。
vs 全く構造化しない — 構造化面接に関する公表メタ分析は、仕事のパフォーマンスに対する予測妥当性において構造化面接は非構造化面接の約2倍であることを示しています。現状が非構造化であれば、問うべきはスキルではなく構造化の採用です。スキルは、構造化をすべてのロールで実際に実装できるほど安くする方法です。

注意点

同じ人物がどこにでもアサインされる面接官の過負荷。 対策： スキルのアサインメントステップは「同じループの複数のステージに面接官を入れない」というハードルールを強制します。アサインメントテーブルはステージごとにバックアップ面接官を提示するため、プライマリが利用不可能な場合でも採用担当者は代替案を持つことができ、2つのステージでプライマリを再利用する必要がありません。
ステージ間の冗長なシグナル。 対策： 候補者体験パスは組み立てられたループを再読し、複数のステージで調査されているコンピテンシーにフラグを立てます。ループ出力の最上部のコンピテンシー対ステージのマッピングテーブルにより、採用マネージャーのレビュー時に冗長性が可視化されます。
候補者体験の軽視。 対策： 候補者体験パスはループの最下部の1文ではなく、スキルの独立した名前付きステップです。合計時間のキャップ（IC 5時間、リーダーシップ 7時間）、面接官の上限（6名）、ループを肥大化させるコンピテンシーに対するテイクホームの提案、タイムゾーン対応のTODOを強制します。このパスがなければ、「もう1つの30分の会話」が各ステージを個別に割り当てる際に見えない形で積み重なります。
単一ループ内のキャリブレーションドリフト。 対策： ステージごとに出力されるルーブリックブロックには、自由記述の「1〜5で評価」ではなくコンピテンシーライブラリから引き出されたスコアレベルごとのアンカー説明が含まれます。同じ候補者が同じループで4名の異なる面接官にスコアリングされる際にキャリブレーションを保つのはアンカーです。曖昧なルーブリック → 曖昧なスコア → すべての次元を逸話で再議論するデブリーフ。
採用マネージャーによる設計のゴム印。 対策： スキルはレビューゲートで停止してファイルに書き込みます。「ATSに公開する」アクションはありません。HMはループを設定する前にファイルを開いて編集する必要があります。この摩擦は意図的です。HMが読まずに承認し始めると、ループの内容がロールの優先事項から離れていき、スキルが節約した時間を稼げなくなります。
面接官キャリブレーションの陳腐化。 対策： 面接官マトリクスには「最終面接」列があります。6ヶ月を超えたセルは再度アサインされる前に再キャリブレーションを促します。面接インテリジェンスによって有用なシグナルを生み出していない質問が明らかになった場合、コンピテンシーライブラリのアンカーを更新すれば、次回の実行時にスキルが新しいアンカーを出力します。

スタック

スキルバンドルはapps/web/public/artifacts/interview-loop-builder-skill/にあり、以下を含みます：

SKILL.md — フロントマター、起動ルール、入力、手順、ガードとペアになった注意点を含むスキル定義
references/1-competency-library.md — スコアレベルごと・レベル帯ごとのアンカー説明を含むコンピテンシー分類。実行前にファンクションごとに記入する
references/2-interviewer-strengths.md — コンピテンシーごとのキャリブレーション済みカバレッジを含む対象面接官プールマトリクス。実行前にチームごとに記入する
references/3-loop-output-format.md — スコアカードスキャフォールドのレイアウトを含むスキルが出力するリテラルMarkdownフォーマット

ワークフローが前提とするツール：Claude（モデル）、AshbyまたはGreenhouse（HMが設計済みループを設定するATS）、BrightHireまたはMetaview（コンピテンシーライブラリのアンカー調整にフィードバックするシグナルを提供する面接インテリジェンス）。上流の職務記述書ライターおよび下流の面接デブリーフサマリーと直接連携します。

GitHubでこのページを編集

Files in this artifact

Download all (.zip)

---
name: interview-loop-builder
description: Take a job description, level, must-have competencies, and an interviewer pool with strengths, and produce a complete interview loop design — stage progression, per-stage rubric, behavioral questions per dimension, and interviewer assignments paired with the rubric dimension each one is scoring. Stops at a hiring-manager review gate before the loop is configured in the ATS.
---

# Interview loop builder

## When to invoke

Use this skill when a recruiter or hiring manager has a confirmed opening, an approved JD, and needs the structured interview loop designed before the first candidate moves into the screen stage. Take the JD, the role's level, the must-have competencies, and the eligible interviewer pool with each interviewer's calibrated strengths, and produce a Markdown loop design plus per-stage scorecard scaffolds.

Do NOT invoke this skill for:

- **Auto-scheduling.** This skill designs the loop; it does not schedule. Calendar coordination, interviewer-availability matching, and candidate-facing booking links are Ashby Scheduling, Greenhouse Scheduling, or Goodtime's job. Mixing design and scheduling in one skill couples two failure modes that should fail independently.
- **Replacing the rubric design with the hiring manager.** The skill maps competencies to a rubric *dimension* and writes anchor descriptions per score level, but the actual rubric per role family is owned by the hiring manager and the head of the function. If `references/1-competency-library.md` is empty or all-template, the skill refuses and surfaces a TODO rather than inventing a rubric for a function it has no calibrated signal on.
- **Generic templated loops without role-specific calibration.** If the inputs do not name the level, the must-have competencies, or the interviewer pool, the skill refuses. A four-stage loop with generic "behavioral", "technical", "system design", "leadership" labels passes for structured but is not — every candidate gets the same questions regardless of role priorities, which defeats the point.
- **Roles below a defined complexity threshold.** A two-week contractor role does not need a four-stage loop. The skill warns and suggests a one-stage screen if the role is contract, hourly, or under 6 months expected tenure.

## Inputs

- Required: `job_description` — path to a Markdown file (typically the output of the JD-writer skill, or a manually authored JD).
- Required: `level` — one of `IC1`, `IC2`, `IC3`, `IC4`, `IC5`, `IC6`, `M1`, `M2`, `M3`, `Director`, `VP`. Drives loop length and the competency-to-stage mapping.
- Required: `must_have_competencies` — array of 3-6 competency IDs drawn from `references/1-competency-library.md`. The skill maps each to a stage and refuses if more than 6 are provided (loop length blows out, signal per interview drops).
- Required: `interviewer_pool` — path to `references/2-interviewer-strengths.md` filled in for the role's function, listing each eligible interviewer with their calibrated strengths (which competencies they have been calibrated to score on what level bands).
- Optional: `loop_length_max` — hard cap on number of post-screen stages, default 4. Above 5, the skill warns about candidate-experience cost.
- Optional: `time_zone` — interviewer/candidate timezone hint used in the candidate-experience pass to flag any cross-timezone stages that need an explicit accommodation.

## Reference files

Always read these from `references/` before generating the loop. They contain the user's competency taxonomy, the interviewer pool, and the output format. Without them, the loop is a template, not a design.

- `references/1-competency-library.md` — the team's calibrated competency taxonomy with rubric dimensions and behavioral-anchor descriptions per score level. Replace the template with your real library before running.
- `references/2-interviewer-strengths.md` — the eligible interviewer pool with each interviewer's calibrated strengths. The skill matches competencies to interviewers using this matrix.
- `references/3-loop-output-format.md` — the literal Markdown format the skill emits, including the per-stage rubric block, the interviewer-assignment table with rationale, and the candidate-experience summary.

## Method

Run these six steps in order. Do not parallelize — later steps depend on context from earlier steps, and the candidate-experience pass at the end deliberately re-reads the assembled loop to catch overload that is invisible while assigning each stage in isolation.

### 1. Validate inputs

Open `job_description`, `references/1-competency-library.md`, and `references/2-interviewer-strengths.md`. Confirm:

- Each ID in `must_have_competencies` exists in the competency library. Unknown IDs → stop and surface them.
- The interviewer pool contains at least one interviewer calibrated on each must-have competency at the role's level. If a competency has no calibrated interviewer, surface a TODO ("hire / calibrate an interviewer for {competency} at {level} before designing this loop") and stop.
- The level falls inside the range the competency library has anchor descriptions for. Designing a Director loop with an IC-only library produces inflated rubrics; refuse.

### 2. Map competencies to stages

For each must-have competency, decide the stage where it is best evaluated. The mapping is opinionated:

- **Recruiter screen** evaluates fit, interest, comp alignment, and basic must-have-skill confirmation. Never a competency dimension on the rubric — recruiters do not score the rubric.
- **Hiring-manager screen** evaluates the top 1-2 competencies that most differentiate hire / no-hire on this role. The HM is the highest-signal interviewer; spending HM time on lower-priority competencies wastes calibration.
- **On-site loop** spreads remaining competencies one-per-interview where possible. One competency per interview is the engineering choice — bundling two competencies into one 60-minute interview produces shallower signal on both, and it makes the rubric harder for the interviewer to apply in the moment.
- **Working-session / take-home** (optional, only for IC4+ technical roles) evaluates competencies that need extended time or written artefact (system design, written communication, code review depth).

The output of this step is a `competency → stage` table with the rationale for each placement.

### 3. Design the per-stage rubric

For each post-screen stage, generate the rubric block. Each rubric dimension corresponds to one competency from step 2. For each dimension:

- Pull the anchor descriptions from `references/1-competency-library.md` for the candidate's level band.
- Generate 3-5 behavioral questions that probe the dimension. Each question must reference the situation / behavior / outcome shape; hypothetical "what would you do if…" questions are excluded by default because they reward articulate guessing over evidenced experience.
- Generate one suggested probing follow-up per question for the interviewer to use when the candidate's first answer is shallow.

The choice to include anchors and follow-ups in the output (rather than just the questions) is what separates a usable scorecard from "here are some questions, score it from 1 to 5". Without anchors, calibration drifts within a single loop.

### 4. Assign interviewers with rationale

For each post-screen stage, propose 1-3 interviewer candidates from the eligible pool. Match by:

- Calibration fit — the interviewer is calibrated on this competency at this level band. Hard requirement.
- Load balance — no interviewer is assigned to more than one stage in the same loop. Prevents the "only one person actually evaluates this candidate" failure mode where a single interviewer dominates the debrief.
- Diversity of perspective — where the eligible pool allows, propose at least one interviewer from outside the hiring team to reduce consensus bias in the debrief.

Output: an assignment table with the rationale per assignment ("Jamie calibrated at IC4 on systems-design; Priya outside the hiring team, last interviewed at this level 6 weeks ago").

### 5. Candidate-experience pass

Re-read the assembled loop and check, in order:

- Total interview time across all stages. Above 5 hours active for an IC role or 7 hours for a leadership role, flag and suggest moving one competency to a take-home.
- Number of distinct interviewers the candidate meets. Above 6, flag ("loop fatigue").
- Stages that require the candidate to repeat the same story. If two stages probe the same competency, surface as redundant.
- Cross-timezone stages without an accommodation note. Surface a TODO for the recruiter.

The choice to do this as a separate step rather than during stage design is deliberate: while assigning each stage in isolation it is easy to add "one more 30-minute conversation"; only by re-reading the full assembled loop does the candidate-side cost become legible.

### 6. Hiring-manager review gate

Stop. Write the loop design to `loop.md` per the format in `references/3-loop-output-format.md`. Write each stage's scorecard scaffold to `scorecards/<stage>.md`. Do not push anything to Ashby, Greenhouse, or Lever. Do not mark the role as "loop ready" in the ATS. Surface the path to both files and exit.

The hiring manager's job from here: read the loop, validate the competency-to-stage mapping reflects the actual role priorities, edit the questions, and configure the loop in the ATS. The skill does not re-enter the loop until the hiring manager confirms changes for a v2 design.

## Output format

```markdown
# Interview loop — {Role title} ({level})

Generated: {ISO timestamp} · Competencies: {n} · Stages: {n} · Total active time: {hours}

## Competency → stage mapping

| Competency | Stage | Rationale |
|---|---|---|
| Systems design | On-site Interview B | Highest differentiator at IC5; needs 60min for depth |
| Stakeholder influence | HM screen | Top hire/no-hire signal for this role |
| ... | ... | ... |

## Stage 1: Recruiter screen (30 min)

- Goal: confirm fit, interest, must-have-skill basics, comp alignment
- Key questions: ...
- Disqualifying signals: ...
- Not on rubric (screen, not scored)

## Stage 2: Hiring-manager screen (45 min)

- Goal: depth on {top 1-2 competencies}
- Rubric dimensions: {dim 1}, {dim 2}
- Behavioral questions:
1. {Question} — Probe: {follow-up}
2. ...
- Scorecard scaffold: `scorecards/02-hm-screen.md`

## Stage 3: On-site Interview A — {Competency} (60 min)

- Rubric dimension: {dim}
- Anchor descriptions ({level band}):
- 5 — {anchor}
- 4 — {anchor}
- 3 — {anchor}
- 2 — {anchor}
- 1 — {anchor}
- Behavioral questions:
1. {Question} — Probe: {follow-up}
2. ...
- Scorecard scaffold: `scorecards/03-onsite-a.md`

## Suggested interviewer assignments

| Stage | Primary | Backup | Rationale |
|---|---|---|---|
| HM screen | {HM name} | — | hiring manager |
| Onsite A — Systems design | Jamie L. | Priya R. | Jamie calibrated IC5 systems-design; Priya outside hiring team |
| ... | ... | ... | ... |

## Stage N: Debrief

- Format: independent scoring submitted before discussion
- Decision criteria: {explicit thresholds — e.g. "no rubric dimension < 3, aggregate >= 16"}

## Candidate-experience pass

- Total active interview time: {hours}
- Distinct interviewers: {n}
- Cross-timezone stages: {none | list with accommodation TODO}
- Redundant signal flagged: {none | list}
- Take-home recommendation: {none | move {competency} to take-home}

## Open TODOs for hiring manager

- ...
```

## Watch-outs

- **Interviewer overload from the same person being assigned everywhere.** *Guard:* step 4 enforces "no interviewer in more than one stage of the same loop" as a hard rule. The assignment table surfaces backup interviewers per stage so the recruiter has a fallback when the primary is unavailable, rather than re-using the primary in two stages.
- **Redundant signal across stages.** *Guard:* the candidate-experience pass in step 5 re-reads the loop and flags any competency probed in more than one stage. The competency-to-stage table in the output makes redundancy visible to the hiring manager in review.
- **Candidate experience neglected.** *Guard:* the candidate-experience pass in step 5 is a separate, named step rather than a sentence at the bottom of the loop. It enforces total-time caps, distinct-interviewer caps, take-home suggestions for competencies that bloat the loop, and timezone accommodation TODOs.
- **Calibration drift inside a single loop.** *Guard:* the rubric block emitted in step 3 includes anchor descriptions per score level pulled from the competency library, not free-text "rate 1 to 5". Anchors are the thing that holds calibration when the same candidate is scored by four different interviewers.
- **Hiring manager rubber-stamps the design.** *Guard:* skill stops at the review gate in step 6 and writes to files. There is no "publish to ATS" action defined anywhere in this skill. The HM has to open the file and edit it before configuring the loop.
- **Generic loops where role specificity matters.** *Guard:* step 1 refuses to run if `must_have_competencies` is empty or if the interviewer pool is missing calibrated coverage. The skill never falls back to a "default loop" for the function.

# Competency library — TEMPLATE

> Replace this template's contents with your team's actual competency
> library. The interview-loop-builder skill reads this file on every
> run; without your real library, the loop output is generic and the
> rubric anchors are uncalibrated.

## How to use

Each competency has:

- A short ID (used by `must_have_competencies` in the skill input)
- A one-sentence definition
- The level bands it has anchor descriptions for
- Anchor descriptions per score level (1-5) per level band

The skill maps each competency to a stage and emits the anchors for the candidate's level band into the per-stage rubric.

## Coverage matrix

| Competency ID | Definition | Bands covered |
|---|---|---|
| systems-design | Designs systems that meet current requirements while preserving headroom for known future needs. | IC3, IC4, IC5, IC6 |
| stakeholder-influence | Builds shared understanding and commitment across stakeholders without formal authority. | IC4, IC5, IC6, M1, M2, M3 |
| technical-depth | Reasons from first principles in the candidate's primary technical domain. | IC1, IC2, IC3, IC4, IC5, IC6 |
| ownership | Drives ambiguous problems to resolution, including the parts not in the original scope. | IC2, IC3, IC4, IC5, IC6, M1, M2 |
| communication-written | Writes documents that move decisions forward without a meeting. | IC3, IC4, IC5, IC6, M1, M2, M3 |
| people-leadership | Hires, develops, and retains a high-performing team. | M1, M2, M3, Director, VP |
| strategic-thinking | Identifies the right problem to solve before optimizing the solution. | IC5, IC6, M2, M3, Director, VP |

## Per-competency anchors

### systems-design (IC4 band)

- **5** — Designs the system end-to-end including failure modes, upgrade paths, and observability. Names trade-offs explicitly with reasoning per axis (latency, cost, complexity, blast radius).
- **4** — Designs the happy path and the most likely failure modes. Names trade-offs but reasoning is uneven across axes.
- **3** — Designs the happy path. Identifies failure modes when prompted. Trade-offs implicit, surfaced under follow-up questioning.
- **2** — Designs a workable system but misses obvious failure modes or scaling cliffs. Trade-offs not articulated.
- **1** — Designs a system that does not meet stated requirements, or cannot articulate why a design is structured as proposed.

> Replace the IC4 anchors above with your real anchors. Add anchor
> blocks for IC3, IC5, and IC6 in the same shape.

### stakeholder-influence (M2 band)

- **5** — Names the specific stakeholders, their incentives, the surface area of the disagreement, and the sequence of conversations that produced commitment. Outcome was a documented decision change.
- **4** — Names the stakeholders and incentives, walks through conversations, but the outcome description is fuzzy.
- **3** — Walks through one or two conversations. Stakeholder incentives are inferred under prompting.
- **2** — Describes the disagreement and the resolution but cannot describe the work between them.
- **1** — Describes a meeting outcome with no underlying stakeholder work.

> Replace the M2 anchors above with your real anchors. Add anchor
> blocks for IC4, IC5, IC6, M1, M3 in the same shape.

## Calibration discipline

When you add or change anchors:

- Run a calibration session with at least 3 interviewers scoring the same recorded interview. If their scores diverge by more than 1 point, the anchors are not calibrated yet.
- Date each change. The skill does not version anchors automatically; if you change an anchor mid-loop, the consistency of the loop is on you.
- Retire anchors that the calibration set cannot reproduce. Vague anchors are worse than no anchors — they create the illusion of structure.

## Last edited

{YYYY-MM-DD} — update on every material change.

# Interviewer strengths matrix — TEMPLATE

> Replace this template's contents with your team's actual interviewer
> pool and calibrated strengths. The interview-loop-builder skill
> reads this file on every run; without it, the assignment table is
> guessed rather than matched.

## How to use

Each row is one eligible interviewer. Columns are the competency IDs from `1-competency-library.md`. Each cell is the level band(s) the interviewer is calibrated to score that competency on. Empty cell = not calibrated; the skill will not assign.

The skill reads this matrix in step 4 (interviewer assignment) and matches by:

1. Calibration fit (interviewer is calibrated on the competency at the candidate's level band).
2. Load — at most one stage per loop per interviewer.
3. Diversity — at least one interviewer from outside the hiring team when the eligible pool allows.

## Pool

| Interviewer | Team | systems-design | stakeholder-influence | technical-depth | ownership | communication-written | people-leadership | strategic-thinking | Last interview (date) |
|---|---|---|---|---|---|---|---|---|---|
| Jamie L. | Platform | IC4, IC5 | — | IC3, IC4 | IC4 | — | — | — | 2026-04-19 |
| Priya R. | Data | — | IC4, IC5, M1 | IC4 | IC4, IC5 | IC4, IC5 | — | — | 2026-04-22 |
| Marcus T. | Product | — | IC5, M1, M2 | — | IC5 | M1, M2 | M1, M2 | M2 | 2026-04-12 |
| Aiko S. | Eng leadership | IC5, IC6 | M1, M2, M3 | IC5 | IC5, M1 | M2, M3 | M1, M2, M3 | M2, M3, Director | 2026-04-26 |

> Replace the rows above with your real interviewer pool. The columns
> are the competency IDs you defined in `1-competency-library.md`.

## Calibration discipline

When you add an interviewer or extend an interviewer's coverage:

- They sit shadow on at least 2 interviews at the new level band, then reverse-shadow (they score, the calibrated interviewer audits) on at least 2 more.
- Both calibrated interviewer and new interviewer sign off in the matrix before the cell is added.
- Retire a calibration band if the interviewer has not used it in 6 months. The "Last interview" column is the trigger to re-calibrate.

## Outside-the-hiring-team rule

The skill prefers at least one interviewer outside the hiring team per loop, to reduce consensus bias in debriefs. To make this work:

- Tag each interviewer's `Team` accurately.
- Ensure the eligible pool for each role family includes at least 2 outside-team interviewers per must-have competency at the role's level. If it does not, the skill will assign in-team only and surface a TODO ("expand cross-team calibration for {competency}").

## Last edited

{YYYY-MM-DD} — update on every calibration change.

# Loop output format — TEMPLATE

> The interview-loop-builder skill emits its output in the format
> below. This file documents the format so the team can adjust before
> running the skill at scale; the skill reads this file as the literal
> template, not a guideline.

## File layout

The skill writes:

- `loop.md` — the loop design, in the format below.
- `scorecards/<NN>-<stage-id>.md` — one scorecard scaffold per post-screen stage, with the rubric block prefilled and an empty scoring section.

## loop.md format

```markdown
# Interview loop — {Role title} ({level})

Generated: {ISO timestamp}
Competencies: {n}
Stages: {n}
Total active interview time: {hours}h
Distinct interviewers: {n}

## Inputs summary

- JD: {path}
- Level: {level}
- Must-have competencies: {comma-separated IDs}
- Interviewer pool: {path to filled interviewer-strengths matrix}
- Loop length cap: {n}
- Time zone hint: {tz | not provided}

## Competency → stage mapping

| Competency | Stage | Rationale |
|---|---|---|

## Stage 1: Recruiter screen (30 min)

Goal, key questions, disqualifying signals. Not on rubric.

## Stage 2: Hiring-manager screen (45 min)

Goal, rubric dimensions, behavioral questions with probes, scorecard
scaffold link.

## Stage 3..N: On-site interviews (60 min each)

For each stage:

- Rubric dimension (single competency)
- Anchor descriptions for the candidate's level band (5 lines, one
  per score level, pulled from competency library)
- Behavioral questions with probes (3-5)
- Scorecard scaffold link

## Suggested interviewer assignments

| Stage | Primary | Backup | Rationale |
|---|---|---|---|

## Stage N+1: Debrief

Format (independent scoring before discussion), decision criteria
(explicit thresholds), debrief facilitator.

## Candidate-experience pass

- Total active interview time
- Distinct interviewers
- Cross-timezone stages with accommodation TODOs
- Redundant signal flagged
- Take-home recommendation if loop is over the time cap

## Open TODOs for hiring manager

- ...
```

## scorecards/<NN>-<stage-id>.md format

```markdown
# Scorecard — {Stage} ({Competency})

**Candidate:** {name}
**Interviewer:** {name}
**Date:** {YYYY-MM-DD}
**Level:** {level}

## Rubric dimension: {Competency}

Anchor descriptions ({level band}):
- 5 — {anchor}
- 4 — {anchor}
- 3 — {anchor}
- 2 — {anchor}
- 1 — {anchor}

## Behavioral questions

1. {Question}
   - Probe: {follow-up}
   - Notes:
2. ...

## Scoring

- Score: ___ / 5
- Evidence (cite candidate's words for the score):
- Hire / no-hire on this dimension: ___
- One-line rationale:

## Submit

Independent scoring submitted to debrief before discussion.
```

## What the format enforces

- Every rubric dimension has anchor descriptions inline. No "rate 1 to 5" without anchors.
- Every behavioral question has a probe. Interviewers do not have to invent follow-ups in the moment.
- Every interviewer assignment has a rationale. The hiring manager can audit why a person was assigned without re-running the skill.
- The candidate-experience pass is a section, not a sentence.
- Open TODOs are explicit. The skill never silently leaves a gap.

## Last edited

{YYYY-MM-DD}