claude-skill

Claudeによるダイバーシティスレート監査

Difficulty

上級

Setup time

45min

For

recruiter · sourcer · talent-acquisition · dei-leader

Recruiting & TA

Stack

候補者スレート（採用担当者が意図しているインタビューラインナップ、またはソース済みプール全体、または応募者プール）を当該職種の関連労働市場参照プールに照らして監査し、構成上のギャップを表面化し、個々の候補者に対して統計的推論を実行せず、どの候補者を追加または削除するかを推奨せずに、構造化された監査記録を出力するClaude スキルです。出力はリクルーターとDEIリードのための意思決定支援であり、自動化された意思決定システムではありません。

使う場面

採用マネージャーに送る前に、ソース済みプールからスレートをカットし、スレートの構成が職種の関連労働市場プールを反映しているか確認したい場合。
四半期を締め、DEIプログラムレビューのためにロール全体で集計された監査が必要な場合。
NYC Local Law 144のバイアス監査提出を準備しており、正式な独立監査の前にスレート構成の内部事前確認が必要な場合。

使わない場面

個々の候補者の保護クラス所属の特定。 スキルは集計されたセルフレポートの人口統計データのみを処理します。名前、写真、学校、または候補者レベルのシグナルから人口統計を推論することを拒否します。
スレートを「再バランス」するための候補者の自動却下。 構成上の数字を達成するために候補者を却下することは逆差別であり、元の不均衡と同じ法的リスクを引き起こします。スキルはギャップを表面化します；修正は上流（ソースチャンネル、検索クエリ、JD言語）であり、スレートカットのステップではありません。
候補者が同意しなかった構成データ。 セルフIDデータには、会社のATSが取得する候補者承認の下での独自の同意フローがあります（Ashby、Greenhouse、Leverはすべてこれを公開しています）。スキルは候補者が共有に同意したデータのみを集計で処理します。
5名未満の候補者の単一ロールスレート。 スレートが小さいほど、監査シグナルの意味が少なくなります。スキルは5名未満でサイズについて警告し、3名未満での構成統計の計算を拒否します。

セットアップ

バンドルを配置する。 apps/web/public/artifacts/diversity-slate-auditor-skill/SKILL.md をClaude Codeスキルディレクトリに配置します。
参照プールソースを設定する。 スキルは比較のために参照プールが必要です — 通常、BLSの職業雇用統計（無料、公開）に、利用可能な場合は業界固有のデータで補強したものです。references/1-reference-pools.md の参照プールセレクターはどのBLSテーブルがどのロールファミリーにマッピングされるかを文書化しています。
ATSエクスポートを接続する。 AshbyとGreenhouseはどちらもAPIを通じてセルフIDエクスポートを公開しています（Ashby /candidate.list とself-idカラム；Greenhouse applications エンドポイントとEEOCフィールド）。スキルはエクスポートを読み取ります；ATSを直接呼び出しません。この分離はデータ最小化がエクスポート時に行われることを意味し、スキルは生の候補者レコードを見ることがありません。
スレートサイズのガードレールを設定する。 デフォルト：5名未満で警告、3名未満で拒否。チームの典型的なスレートサイズが異なる場合は、ロールファミリーごとに調整します。
クローズされたスレートでドライラン。 先四半期にクローズしたロールのスレートを監査します。スキルのギャップ分析をDEIリードによる同じスレートの読みと比較します。スキルは構成デルタを表面化します；それらのデルタが重要かどうかはスキルが行わない判断です。

スキルが実際に行うこと

6ステップ。スキルは推論を集計レベルに保つよう構造化されています — 候補者レベルではなく — そしてギャップを表面化しますが介入を推奨しません。なぜなら、適切な介入はギャップのソースによって異なり、スレートカットのステップではないからです。

スレートをロードする（インタビューするつもりの候補者、またはソース済みプール、または応募者プール — 採用担当者が監査したいものに応じて）。スキルは集計レベルのエクスポートを期待します：候補者ごとのセルフIDは読み取られますが集計を計算するためにのみ使用され、候補者ごとの分析は出力されません。
ロールファミリーの参照プールをロードする。 BLSの職業雇用統計がデフォルトです；ロールファミリーからBLSテーブルへのマッピングは references/1-reference-pools.md にあります。業界固有の参照プール（例：ソフトウェアエンジニアリングの Stack Overflow Developer Survey）は採用担当者が代替できます。
スレートと参照プールレベルで構成デルタを計算する。 スレートがセルフIDデータを持つ各人口統計ディメンション（EEOC カテゴリーによるジェンダー、人種/民族、退役軍人ステータス、障害ステータス — 会社が収集するディメンションのみ）について、スレートの割合と参照プールの割合を計算します。絶対デルタを計算します。
信頼度バンドでディメンションごとにギャップを表面化する。 50人のスレートでの5ppのデルタは8人のスレートでの同じデルタより意味があります。信頼度バンドはスレートサイズと参照プールの特異性を反映します。
上流のギャップ候補を表面化する。 各表面化されたデルタについて、採用担当者が調査できる3〜5つの可能性のある上流原因をリストします — ソースチャンネルの組み合わせ、検索クエリ言語（ブール検索ビルダーのフェアネス事前確認がその一部を検出）、JD言語、スクリーンでの採用マネージャーの言語。ランク付けや推奨はしない；採用担当者とDEIリードが調査する候補者をリストします。
監査記録を出力する。 スレートの構成、使用した参照プール、計算されたデルタ、スキルのバージョンを含む署名済みJSONLライン。PII不含。監査記録はNYC LL 144の提出や内部DEIレビューを守れるものにします。

コストの実態

Claude Sonnet 4.6でのスレート監査ごとのコスト：

LLMトークン — 5〜10k入力（スレートの集計＋参照プールテーブル＋スキル指示）と2〜3k出力（ディメンションごとのギャップ分析＋上流候補者）。監査ごとに約0.05〜0.10ドル。
参照プールデータ — BLSデータは無料。Stack Overflow Developer Surveyは無料。業界固有のデータセットは様々です；BLSのみのパスはコスト0ドルです。
採用担当者/DEIリードの時間 — 本当の勝ち。構成監査は通常、面倒なため省略されます；スキルはそれを追加コストではなくデフォルトコストにします。スレートあたり5〜10分の監査読み取り、プラス四半期ごとに20〜40分の上流ギャップ候補の調査を期待します。
セットアップ時間 — 参照プールマッピングとATSエクスポート接続のために一度45分。

成功指標

月次で3つのことを追跡します、スレートごとではなく：

時間経過とともの構成デルタのドリフト — 追跡されたロールのスレート対参照プールのギャップは縮小していますか？縮小しない場合、上流の介入が機能していません。
ソースチャンネルの組み合わせの変化 — 監査がソースチャンネルのギャップ候補を表面化した場合、次の四半期にチャンネルの組み合わせは実際に変化しますか？ソーシングが同じチャンネルを推薦し続ける場合、監査の上流サーフェスがソーシングに届いていません。
NYC LL 144/内部DEI監査のギャップ — 正式な年次バイアス監査が行われた時、その結果は年間を通じてスレートごとの監査が表面化したものと一致しますか？正式な監査がスレート監査が見逃したギャップを表面化した場合、参照プールマッピングまたは追跡されているディメンションが不完全です。

代替手段との比較

ATSネイティブのダイバーシティダッシュボード（Greenhouse Inclusion、Ashbyのダイバーシティレポート）対比。 ATSネイティブのダッシュボードは構成を表示しますが；参照プールデルタを計算せず上流候補者を表面化しません。レポートのみが必要な場合はATSネイティブを選び、スレートごとの意思決定支援が必要な場合はスキルを選びます。
Crosschq Diversity / SeekOut DEI / Eightfoldのダイバーシティレイヤー対比。 それらは独自の参照プールと分析レイヤーを持つより深いプロダクトです。予算がプラットフォームプレイをサポートしており管理されたプロダクトが必要な場合はそれらを選びます。監査ロジックをリポジトリに、制御できる参照プールマッピングを、ポータブルな監査記録を必要とする場合はスキルを選びます。
手計算の構成統計対比。 手計算は年1回のDEIレビューには問題ありませんが、スレートのカデンスでは滑ります；誰もスレートごとに手計算しません。スキルはすべてのスレートで実行できるほど安価に監査を行います。
監査なし対比。 デフォルトで、NYC LL 144（ニューヨーク市での採用に使用されるAIツールに対して年次バイアス監査が必要）の下での法的リスクです。スキルは最も安価な守れるポスチャーです。

注意点

「再バランス」からの逆差別。 ガード： スキルは個々の候補者の追加や削除を推奨しません。構成数字を達成するために候補者を削除してスレートを調整することは逆差別であり、元の不均衡と同じ法的リスクを生じます。監査は表面化します；修正は上流です。
候補者シグナルからの人口統計の推論。 ガード： スキルは候補者が共有に同意したセルフIDデータのみを処理します。名前から人種/民族、代名詞からジェンダー、卒業年から年齢、または候補者レベルの推論を推論することを拒否します。比較に使用される参照プールは集計統計であり、候補者レベルの特徴ではありません。
少ないスレートのノイズ。 ガード： 5名未満のスレートサイズは監査に警告ヘッダーを生成します；3名未満ではスキルが構成統計の計算を拒否します。
古い参照プール。 ガード： references/1-reference-pools.md の参照プールマッピングはソースごとの last_verified 日付を持ちます。18ヶ月以上古いソースはマッピングを更新する警告を引き起こします。
監査記録の改ざん。 ガード： 監査記録はスキルバージョンが埋め込まれた追記専用のJSONLです。変更はファイルの署名チェーンを壊します。定期的な監査記録保持は少なくとも会社の採用記録保持と同じ長さであるべきです（通常2〜7年）。
DEIデータの流出リスク。 ガード： 監査記録には集計とデルタが含まれており、候補者ごとのフィールドは含まれません。スキルは候補者ごとのセルフIDデータを監査記録に書き込むことを拒否します。

スタック

スキルバンドルは apps/web/public/artifacts/diversity-slate-auditor-skill/ にあり、以下を含みます：

SKILL.md — スキル定義
references/1-reference-pools.md — ロールファミリーから参照プールへのマッピング（BLS、Stack Overflow Developer Surveyなど）
references/2-audit-record-format.md — JSONL監査記録のリテラル出力フォーマット

ワークフローが使用を前提とするツール：Claude（モデル）、Ashby または Greenhouse（ATS、セルフIDエクスポート用）。並行したソースチャンネル監査については、ブール検索ビルダー — そのフェアネス事前確認が一部の上流ギャップ原因を検出します。

GitHubでこのページを編集

Files in this artifact

Download all (.zip)

---
name: diversity-slate-auditor
description: Audit a candidate slate's composition against a reference labor-market pool, surface per-dimension gaps with confidence bands, list upstream gap candidates for the recruiter to investigate, and emit an audit record. Never makes per-candidate inferences; never recommends adding or removing individual candidates from a slate.
---

# Diversity slate auditor

## When to invoke

Use this skill when a recruiter or DEI lead has a candidate slate (interview lineup, sourced pool, application pool) and wants the slate's composition audited against the role's reference labor-market pool. Take an aggregate-level slate export plus a reference-pool mapping as input and return a structured audit report plus an append-only JSONL audit record.

Do NOT invoke this skill for:

- **Identifying individual candidates' protected-class membership.** This skill processes self-reported aggregate data only. It refuses to infer demographics from name, photo, school, or any candidate-level signal.
- **Auto-rejecting candidates to "rebalance" a slate.** The skill surfaces gaps; it never recommends adding or dropping individual candidates. Rebalancing by candidate-level removal is reverse discrimination.
- **Composition data candidates have not consented to share.** Self-ID flows in Ashby/Greenhouse/Lever capture explicit consent. The skill processes only consented data.
- **Slates of <3 candidates.** Composition statistics are not meaningful at that size.

## Inputs

- Required: `slate_export` — path to a per-role aggregate export from the ATS. The export should contain self-ID counts per dimension at the slate level, NOT per-candidate rows. Example: `{ "gender": {"woman": 4, "man": 7, "non_binary": 1, "decline_to_state": 2}, "race_ethnicity": {...}, ... }`. If the export is per-candidate, the skill aggregates first and discards the per-row data before any analysis.
- Required: `role_family` — string identifying the role (e.g. `senior-software-engineer`, `account-executive`). Used to look up the reference pool in `references/1-reference-pools.md`.
- Optional: `reference_pool_override` — path to a custom reference-pool file (e.g. industry-specific data). If absent, defaults to BLS for the mapped occupation.
- Optional: `slate_label` — free-text label for the audit record (e.g. `Q2-2026-senior-eng-onsite-slate`).

## Reference files

- `references/1-reference-pools.md` — role-family-to-reference-pool mapping with sources, dates, and the BLS occupation codes.
- `references/2-audit-record-format.md` — the literal JSONL schema for the audit record.

## Method

Six steps.

### 1. Load the slate

Open `slate_export`. If the export is per-candidate, aggregate immediately and discard the per-row data — DO NOT pass per-candidate self-ID through any subsequent step.

If the slate has <3 candidates, halt: "Slate too small for audit. Composition statistics on <3 candidates are not meaningful and risk identifying individuals."

If the slate has 3-4 candidates, emit a warning header on the audit but continue: "Small slate — composition deltas have wide confidence bands."

### 2. Load the reference pool

Read `references/1-reference-pools.md` and map `role_family` to the appropriate BLS occupation code (or other source). Load the reference pool's per-dimension percentages.

If the reference pool's `last_verified` date is older than 18 months, emit a freshness warning on the audit. Continue.

If `reference_pool_override` is provided, use that file instead and skip the BLS mapping.

### 3. Compute composition deltas

For each dimension where both the slate AND the reference pool have data:

- Slate percentage = slate_count / slate_total
- Reference percentage = reference value
- Delta = slate_pct - reference_pct (signed; negative = under-representation in slate)

Round to 1 decimal place. Do NOT compute statistical-significance scores at the per-dimension level — slate sizes are too small for the inferential framing to mean anything.

### 4. Surface gaps with confidence bands

For each dimension with `|delta| >= 5pp`, emit a "gap" entry with:

- Direction (under or over)
- Magnitude (in percentage points)
- Confidence band based on slate size:
  - `n >= 30` → `medium-high` confidence
  - `10 <= n < 30` → `medium` confidence
  - `5 <= n < 10` → `low` confidence
  - `3 <= n < 5` → `informational only`

Do NOT label gaps as "concerning" or "fine." That judgment is the DEI lead's, not the skill's.

### 5. Surface upstream gap candidates

For each dimension with a gap, list 3-5 likely upstream causes the recruiter and DEI lead can investigate:

- **Sourcing channel mix** — which channels did the slate come from? Channels have their own composition skews; LinkedIn surfaces differently than Stack Overflow Jobs differently than employee referrals.
- **Search query language** — does the [Boolean search builder](/en/workflows/boolean-search-builder-claude-skill/) fairness pre-flight surface anything when run against the role intake?
- **JD language** — masculine-coded language ("rockstar," "ninja," "competitive") has measurable effect on application-stage composition. The JD audit is a separate workflow.
- **Hiring-manager screen language** — what questions did the screen include? Did any function as a proxy filter?
- **Application drop-off** — at which stage did the under-represented group drop off most? If at sourcing, the channel mix is the likely cause; if at screen, the screen rubric is.

DO NOT rank these. The right intervention varies by gap source. Listing them is decision support.

### 6. Emit audit record

Append one JSONL line to `audit/<YYYY-MM>.jsonl` matching the schema in `references/2-audit-record-format.md`. The record contains:

- `audit_id` (uuid), `timestamp`, `slate_label`, `role_family`
- `slate_size`, `dimensions_audited`, per-dimension `slate_pct` / `reference_pct` / `delta` / `confidence`
- `reference_pool_source`, `reference_pool_last_verified`
- `skill_version`, `model`

NO PII. NO per-candidate fields. The audit record is what makes a NYC LL 144 submission or annual DEI review defensible; it must be immune to candidate re-identification.

## Output format

```markdown
# Slate audit — {slate_label}

Audited: {ISO timestamp} · Role family: {role_family} · Slate size: {n}

{SMALL-SLATE WARNING if 3-4 candidates}
{REFERENCE-POOL FRESHNESS WARNING if >18 months old}

## Reference pool

- Source: {BLS table / Stack Overflow Developer Survey 2024 / etc.}
- Last verified: {date}

## Composition deltas

| Dimension | Slate % | Reference % | Delta | Confidence |
|---|---|---|---|---|
| Gender — woman | 28.6% | 21.8% | +6.8pp | medium |
| Gender — man | 50.0% | 76.5% | -26.5pp | medium |
| Race — Asian | 35.7% | 19.3% | +16.4pp | medium |
| Race — Black | 0.0% | 8.5% | -8.5pp | medium |
| Race — Hispanic/Latino | 7.1% | 7.6% | -0.5pp | medium |
...

## Gaps surfaced (|delta| >= 5pp)

### Race — Black: under-represented by 8.5pp (medium confidence)

Upstream gap candidates to investigate:
- Sourcing channel mix — what share of the slate came from referral vs. inbound vs. cold sourcing? Referral pools tend to mirror existing team composition.
- Search query language — run the role intake through the Boolean search builder's fairness pre-flight.
- Application drop-off — at which funnel stage is the gap widest?
- Outreach response rate — does outreach response by demographic show the gap originating in candidate engagement vs. sourcing reach?
- JD language — does the JD use language that has measured composition impact on application stage?

### Race — Asian: over-represented by 16.4pp (medium confidence)
{same shape}

## Audit record

Appended to `audit/2026-05.jsonl` — record id `{uuid}`.
```

## Watch-outs

- **Reverse discrimination from "rebalancing."** *Guard:* skill never recommends per-candidate adds/removes. Output is composition deltas + upstream gap candidates only.
- **Per-candidate inference.** *Guard:* skill processes aggregate data only; per-candidate exports are aggregated and discarded immediately on load.
- **Small-slate noise.** *Guard:* refuses at <3, warns at 3-9, low-confidence at <10.
- **Stale reference pools.** *Guard:* freshness warning at >18 months on the source.
- **Audit-record retention.** *Guard:* records are append-only JSONL with skill version embedded. Recruiters / DEI leads handle retention per firm hiring-record policy (typically 2-7 years).

# Reference-pool mapping

The diversity slate auditor compares slate composition to a reference labor-market pool. This file maps each role family to the appropriate reference source.

The defaults are BLS Occupational Employment Statistics (free, US-only, updated annually). Industry-specific overrides are listed where stronger sources exist.

## Format

Each entry has:

- `role_family` — the string the recruiter passes to the skill
- `bls_occupation_code` — the BLS SOC (Standard Occupational Classification) code
- `bls_table_url` — the canonical BLS table URL for the occupation's demographic breakdown
- `last_verified` — when this entry was confirmed against the BLS source
- `recommended_override` — a stronger source where one exists
- `notes` — caveats specific to this role family

## Mappings

### Software engineering

```yaml
role_family: senior-software-engineer
bls_occupation_code: "15-1252" # Software Developers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: stack-overflow-developer-survey
notes: |
BLS lumps all software developer levels together. For senior+ roles,
the Stack Overflow Developer Survey breaks down by years of experience
and tends to surface a different demographic mix at 10+ years vs. all
developers. For roles requiring 8+ years experience, the SO override
is more representative.
```

```yaml
role_family: junior-software-engineer
bls_occupation_code: "15-1252"
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Junior roles draw heavily from CS programs. The CRA Taulbee Survey
has CS-bachelor's demographics that may be a better fit for new-grad
hiring slates.
```

```yaml
role_family: engineering-manager
bls_occupation_code: "11-9041" # Architectural and Engineering Managers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Management roles have substantially different demographic distributions
from IC roles. Use this code (not the IC code) for EM/Director slates.
```

### Sales

```yaml
role_family: account-executive
bls_occupation_code: "41-3091" # Sales Representatives, Services
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Tech-AE roles and SaaS-AE roles tend to have different demographics
from the broader services-sales population the BLS code covers.
Industry-specific data is hard to come by; treat the BLS reference
as a floor.
```

```yaml
role_family: sales-development
bls_occupation_code: "41-3091"
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
SDR roles are entry-level; the BLS code includes career sales reps,
which skews older. Adjust expectations for early-career composition.
```

### Customer success

```yaml
role_family: customer-success-manager
bls_occupation_code: "13-1151" # Training and Development Specialists
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
No clean BLS code for CSM. The training-and-development code is the
closest occupational analog by job content; the customer-service-rep
code is too entry-level. Treat with caveat.
```

### Recruiting / HR

```yaml
role_family: recruiter
bls_occupation_code: "13-1071" # Human Resources Specialists
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: null
```

### Marketing

```yaml
role_family: marketing-manager
bls_occupation_code: "11-2021" # Marketing Managers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: null
```

### Data / analytics

```yaml
role_family: data-scientist
bls_occupation_code: "15-2051" # Data Scientists
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: |
Data scientist is a relatively new BLS code (added 2021). The
demographic data is thinner than for established occupations.
```

```yaml
role_family: data-analyst
bls_occupation_code: "15-2098" # Data Analysts
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: null
notes: null
```

### Legal

```yaml
role_family: in-house-counsel
bls_occupation_code: "23-1011" # Lawyers
bls_table_url: https://www.bls.gov/cps/cpsaat11.htm
last_verified: 2026-01-15
recommended_override: aba-profile-of-the-legal-profession
notes: |
ABA's annual Profile of the Legal Profession has more granular
partnership/in-house/government breakdowns than BLS. For in-house
roles specifically, the ABA override is more representative.
```

## Adding a role family

To add a new role family:

1. Find the BLS SOC code that best matches the role's actual job content (not the marketing title).
2. Confirm the BLS demographic table for that occupation has the dimensions you need.
3. Add the entry to this file with `last_verified` set to today.
4. If a stronger industry-specific source exists (industry survey, professional association data), note it under `recommended_override`.

## Refresh cadence

BLS publishes Current Population Survey demographic tables annually. This file should be re-verified every 12 months. Sources older than 18 months trigger a freshness warning in the auditor's output.

# Audit-record JSONL schema

The diversity slate auditor appends one JSONL line per audit to `audit/<YYYY-MM>.jsonl`. This file documents the schema. The format is fixed because external readers (NYC LL 144 audit submission, internal DEI program review, legal discovery) need to parse the records reliably.

## Schema

```json
{
  "audit_id": "uuid-v4",
  "timestamp": "ISO-8601 UTC",
  "skill_version": "1.0",
  "model": "claude-sonnet-4-6",
  "slate_label": "free-text identifier",
  "role_family": "string from references/1-reference-pools.md",
  "slate_size": "integer",
  "slate_size_warning": "ok | small_slate_warning | informational_only",
  "reference_pool": {
    "source": "BLS-15-1252 | stack-overflow-developer-survey-2024 | ...",
    "last_verified": "ISO-8601 date",
    "freshness_warning": "ok | over_18_months"
  },
  "dimensions": [
    {
      "dimension": "gender",
      "category": "woman",
      "slate_pct": 28.6,
      "reference_pct": 21.8,
      "delta_pp": 6.8,
      "confidence": "low | medium | medium-high"
    },
    {
      "dimension": "race_ethnicity",
      "category": "Black",
      "slate_pct": 0.0,
      "reference_pct": 8.5,
      "delta_pp": -8.5,
      "confidence": "low | medium | medium-high"
    }
  ],
  "gaps_surfaced": [
    {
      "dimension": "race_ethnicity",
      "category": "Black",
      "direction": "under",
      "magnitude_pp": 8.5,
      "confidence": "medium",
      "upstream_candidates": [
        "sourcing-channel-mix",
        "search-query-language",
        "application-drop-off",
        "outreach-response-rate",
        "jd-language"
      ]
    }
  ]
}
```

## Field-by-field

- `audit_id` — uuid v4. Stable for the audit's lifetime; allows downstream systems to deduplicate.
- `timestamp` — ISO-8601 UTC of when the audit was generated, NOT when the slate was assembled.
- `skill_version` — version of this skill (semver). Allows downstream readers to handle schema evolution.
- `model` — exact model ID used (e.g. `claude-sonnet-4-6`). Required for NYC LL 144 reproducibility — the audit must identify the model that processed the data.
- `slate_label` — free-text label. Recruiter chooses; suggested format `<quarter>-<role-family>-<stage>` (e.g. `Q2-2026-senior-eng-onsite-slate`).
- `role_family` — must match a key in `references/1-reference-pools.md`. Required for the reference-pool validation chain.
- `slate_size` — integer count of the slate.
- `slate_size_warning` — `ok` if `n >= 5`, `small_slate_warning` if `3 <= n < 5`, `informational_only` if `n < 3`. The audit refuses to compute deltas at `n < 3` (the auditor halts at load-time before any record is written).
- `reference_pool` — object. `source` is the named source string. `last_verified` is when the role-to-pool mapping was last confirmed against the source. `freshness_warning` is `over_18_months` if the source's `last_verified` is older than 18 months.
- `dimensions` — array of per-dimension/category records. Every dimension/category pair the slate has data for AND the reference pool has data for. Pairs missing from either side are silently skipped (the audit does not assert about dimensions it cannot compare).
- `gaps_surfaced` — array of dimensions with `|delta_pp| >= 5`. Empty array if no gaps cross the threshold. Each gap entry includes the upstream-candidate keys for the recruiter / DEI lead to investigate; the upstream candidates are NOT recommendations but a list of investigation surfaces.

## What the schema deliberately does NOT include

- **Per-candidate fields.** No candidate IDs, no per-candidate self-ID, no per-candidate scores. The skill's design point is aggregate-only inference; the audit record reflects that.
- **Statistical-significance scores.** Slate sizes are too small for inferential framing to mean anything, and surfacing a p-value invites the wrong kind of reading. The confidence band (`low | medium | medium-high`) is a coarser, more honest summary.
- **Recommendations.** The skill surfaces gaps and lists upstream candidates. It does not say "you should hire more X" or "the slate is unbalanced" — those judgments are the DEI lead's, and the skill's role is decision support, not decision automation.
- **Identifying information about the recruiter or DEI lead.** The audit record is about the slate, not about who ran the audit. Operator identity belongs in the audit log of the system that called the skill (your ATS, your scheduling tool), not in the skill's own record.

## Retention

The audit records should be retained for at least as long as the firm retains hiring records — typically 2-7 years for affirmative-action-program firms (under 41 CFR 60-1.12), longer in some EU jurisdictions. NYC LL 144 requires the bias-audit results be made publicly available; the per-slate audit records support the annual aggregation that goes public.

The skill writes append-only JSONL with the skill version embedded. Modification breaks readability of the file; prefer correction-via-superseding-record (write a new audit with `slate_label` referencing the original) over editing.

## Reading the records

Downstream readers (the firm's annual DEI report, the NYC LL 144 submission, an external auditor) parse the JSONL by line. The schema is forward-compatible: new optional fields can be added in future skill versions; consumers that don't recognize new fields ignore them.

For the annual aggregation, group by `role_family` and quarter, then for each `(role_family, quarter)` compute:

- Mean delta per dimension/category over all slates
- Total gaps surfaced and per-gap counts
- Trend in delta over the past four quarters

That aggregation lives outside this skill — it's a separate report. The audit records exist so that aggregation is possible.