cursor-rule

CLM engineer Cursor rules

Difficulty

advanced

Setup time

20min

For

legal-ops-engineer · contracts-engineer

Legal Ops

Stack

A .cursorrules file for engineers building integrations against contract-lifecycle-management platforms (Ironclad, Juro, LinkSquares, ContractPodAi, Agiloft) — the Python or TypeScript glue between CLM, the data warehouse, the firm’s CMRR / cash-pacing dashboards, and the legal-ops admin surfaces. CLM engineering has the same shape as recruiting engineering (recruiting engineer cursor rule): every line touches commercial-confidential data; audit logging and change-control are the only things between a CLM engineer and a counsel asking “show me what changed and when.”

When to use

A legal-ops or contracts engineer is building integrations against a CLM platform and wants Cursor to push back when the code drifts toward the standard CLM-engineering anti-patterns (silent writes, swallowed errors, weak audit, broken idempotence).
The team has a written CLM data-flow architecture and is enforcing it in code; the rules surface the architecture’s defaults at code-generation time.
New engineer onboarding — the rules read like a CLM-engineering primer with the firm’s defaults baked in.

When NOT to use

CLM admin work that doesn’t involve code. Configuring workflow templates in the CLM UI, building approval matrices, etc. — this rule is about the integration code, not the platform’s own configuration.
General contracts work — the rules assume engineering work; commercial counsel’s prompts are a different category.
Migration projects from one CLM to another. Different concerns (data fidelity, historical-record preservation, downtime); a one-time engagement that needs counsel-led planning rather than ongoing rule-of-thumb.

Setup

Drop the bundle. Copy apps/web/public/artifacts/cursor-rules-clm-engineer/.cursorrules to your CLM-engineering repo’s root (Cursor reads .cursorrules automatically).
Customize the tool-specific section. The bundled rules cover Ironclad, Juro, and LinkSquares APIs. Add or remove based on the firm’s CLM stack.
Customize the audit destination. The default rule says “audit log lands in the firm’s Postgres clm_audit table.” Edit per the firm’s audit infrastructure (Datadog / Splunk / Snowflake).
Use it. Cursor reads the rules automatically when generating code in the repo. Engineer prompts Cursor; rules nudge toward the firm’s defaults.

What the rules enforce

The rules push back at code-generation time on these patterns:

Before-writing-code questions

What contract data is involved? (Executed contracts are records of legal obligation; drafts may be privileged.)
What jurisdictions touch the data? (US contracts ≠ EU contracts under GDPR.)
Read or write? (Default is read; writes need written rationale and audit attribution.)
What happens on retry? (Idempotence on every webhook handler.)
Where does the audit log entry land?

Tool-specific guidance

Ironclad: Workflow API specifics — workflow IDs are GUIDs not integers; pagination is cursor-based; webhook signatures are HMAC-SHA256.
Juro: Document-templating API — Liquid templates require sandboxing; do not eval template strings.
LinkSquares: Records search API — pagination is offset-based with a hard 10K offset cap.
ContractPodAi / Agiloft: per-tool quirks documented when the firm uses them.

Defaults to enforce

Audit trail — every read and every write produces an entry with timestamp, user_identity, system, action, contract_id, fields_changed.
Idempotence — webhook handlers key on (event_type, contract_id, source_event_id) and skip on second arrival.
Schema validation — parse every API response into a Pydantic / Zod schema before use.
Secrets — API keys live in a secret manager; separate keys for read vs write scope.
Privacy / consent — counterparty PII has its own retention policy; data-subject-access requests have a defined response path.
Testing — staging only; no live API calls in CI.

Anti-patterns to refuse

Writes without an On-Behalf-Of header equivalent (CLM systems vary on the header, but the principle is the same — every mutation attributable to a named user).
Mutating contract fields in production without dual-control (some firms require a second approver for fields like execution date, expiration, renewal terms).
Auto-approving workflow steps based on inbound data — the firm’s approval matrix is the source of truth, not the integration code.
Hard-coded contract IDs in production code — those drift; load from config.

Cost reality

LLM tokens — none direct. Cursor reads the rules locally; no token cost beyond Cursor’s own per-completion cost.
Engineer onboarding time — the win. New CLM engineers without the rules drift toward the same anti-patterns; with the rules, Cursor pushes back at code-generation time.
Setup time — 20 minutes to drop the file and customize tool-specific sections.

Success metric

Code-review revert rate — share of CLM-integration PRs that get reverted or substantially refactored post-merge for an audit / idempotence / schema issue. Should drop after the rules are in place.
Audit-log gap incidents — incidents where counsel can’t reproduce a contract-state change. Should drop to zero.
New-engineer ramp time — qualitative; how quickly a new CLM engineer ships a production-safe integration. The rules are the most-shareable part of “how we build CLM at this firm.”

vs alternatives

vs internal engineering wiki. The wiki has the same content but is read on demand. Rules in .cursorrules are read at code-generation time, which is when they matter.
vs code-review enforcement. Code review catches the issues but late. The rules surface the standard at draft time, which is cheaper.
vs no defaults. The default and the source of inconsistent integration code across team members.

Watch-outs

Rules drift from actual practice. Guard: the rules carry a last_reviewed date in the file header. Engineers on the team revisit quarterly.
Cursor not reading the rules. Guard: the file must be at the repo root and named .cursorrules exactly. The README in the bundle calls this out.
Over-restrictive defaults blocking legitimate work. Guard: the rules say “if you need to break this rule, document why in the PR description and ping the legal-ops engineer lead.” Hard rules with explicit escape valves work better than soft rules without.
Tool-API drift. Guard: the tool-specific sections include the API doc URL and a last_verified date. Quarterly check.

Stack

The bundle lives at apps/web/public/artifacts/cursor-rules-clm-engineer/:

.cursorrules — the rules file

Tools: Cursor (the consumer of the rules), Claude (Cursor’s underlying model in many configurations). Plus whichever CLM the team integrates against: Ironclad, Juro, LinkSquares, ContractPodAi, Agiloft.

Edit this page on GitHub

Files in this artifact

Download all (.zip)

# CLM Engineer — Cursor rules

last_reviewed: 2026-05-03

You are pairing with a CLM engineer (or contracts engineer who codes) building integrations against contract-lifecycle-management platforms (Ironclad, Juro, LinkSquares, ContractPodAi, Agiloft) plus the Python or TypeScript glue between CLM, the data warehouse, the firm's CMRR / cash-pacing dashboards, and the legal-ops admin surfaces. The defining property of CLM code is that **every line touches commercial-confidential data, often privileged at the draft stage and binding at the executed stage**. Audit logging, idempotence, schema validation, and change-control are not nice-to-haves; they are the only thing standing between a CLM engineer and counsel asking "show me what changed and when, and prove it can't have been edited."

## Before writing code, ask

CLM engineering is integration work plus commercial-record work in disguise. Before generating any script that touches a CLM, confirm:

1. **What contract data is involved?** Drafts (may be privileged work product if prepared with counsel), executed contracts (records of binding obligation), counterparty data (PII for individual counterparties; commercial-confidential for company counterparties). Different retention rules and consent posture per category. If the user can't name the data class, stop and ask.
2. **What jurisdictions are involved?** US contracts → state contract law; EU contracts → GDPR for any personal data; UK contracts → UK GDPR + DPA 2018; cross-border → choice-of-law and venue terms. The right answer depends on this.
3. **Read or write?** Default is read. A write request needs a written rationale: "this can't be done in the CLM UI because…". If the answer is "it would be faster," that's not sufficient. Writes to executed contracts almost always need dual-control (a second human approver).
4. **What happens on retry?** CLM webhooks retry on timeouts and ambiguous 5xx responses. If the same payload arrives twice, what ends up in the system? If the answer isn't "the same as if it arrived once," the code is wrong.
5. **Where does the audit log entry land?** Not "we'll add logging later." Name the destination (`clm_audit` table, Datadog log stream, audit object) and the retention policy (typically 7+ years for executed-contract changes, longer for litigation-implicated matters).

If any answer is missing, ask. Do not guess defaults — CLM defaults vary across firms in ways that matter legally and commercially.

## Tool-specific guidance

### Ironclad

Doc URL: https://developer.ironcladapp.com/ · last_verified: 2026-04-01

- Workflow IDs are GUIDs (not integers). Don't assume ordering.
- Pagination: cursor-based via `nextPageToken` in the response. Loop until absent.
- Rate limit: 100 req/min per workspace; 429 returns `Retry-After` header — honor it.
- Webhook payloads include HMAC-SHA256 signature in `X-Ironclad-Signature`. Verify on every receive.
- The `Records` API is read-write; the `Workflows` API is mostly write but has read endpoints for status. Don't conflate.
- `metadata.fields` is a flat list of `{name, value}` objects. Don't assume field order.

### Juro

Doc URL: https://docs.juro.com/ · last_verified: 2026-04-01

- API is GraphQL. Pagination via `pageInfo.endCursor`.
- Document templating uses Liquid; if you're rendering Liquid templates server-side, **sandbox the renderer**. Don't `eval` template strings.
- Webhook signatures: HMAC-SHA256 in `X-Juro-Signature`.
- Rate limit: 60 req/min for the standard plan; higher tiers vary.
- The `documentVersions` query lets you reconstruct edit history. Useful for audit; expensive — paginate.

### LinkSquares

Doc URL: https://help.linksquares.com/api · last_verified: 2026-04-01

- Records search API is offset-based with a hard 10K offset cap. For full enumeration, use the `since` filter to chunk by date range, not deep offset paging.
- Custom fields appear in `metadata.custom` as `{key, value, type}`. Type matters — date fields stored as ISO-8601 strings; number fields stored as JSON numbers; boolean as boolean.
- Rate limit: 50 req/sec per API key.
- Authentication: bearer token. Rotate quarterly.

### ContractPodAi

Doc URL: https://contractpodai.com/api-docs/ (login-gated) · last_verified: 2026-04-01

- API access is enterprise-tier; not all customers have it.
- Workflow IDs are integers; per-instance unique. Don't assume cross-instance stability.
- Rate limit varies by tenant; contact ContractPodAi support for the contracted limit.

### Agiloft

Doc URL: https://wiki.agiloft.com/display/HELP/REST+API · last_verified: 2026-04-01

- REST API; SOAP also available (don't use SOAP unless required).
- Per-table queries; the table schema is the source of truth.
- Authentication: session-cookie (not stateless). Cookie expires; handle refresh.

### MCP servers for CLM tools

- Default to read-only tool definitions. Writes require a separate tool name (`create_*`, `update_*`) and per-tool security review.
- Never expose `delete_*` tools through MCP. Deletes happen in the CLM UI with the audit trail that produces.
- Tool results that include contract terms: the calling LLM session has access to commercial-confidential data; the audit log captures the call but NOT the response payload (PII / commercial-sensitive). Audit `(timestamp, user, tool, contract_id)` only, not field values.

## Defaults to enforce

### Audit trail

- Every read and every write produces an entry: `timestamp, user_identity, system, action, contract_id, fields_changed`. No exceptions.
- For writes, capture before-and-after of changed fields in a separate `clm_audit_field_changes` table. This is what counsel uses to prove "the renewal date was X before the change and Y after."
- The audit log's retention is at least as long as the longest contract retention (typically 7+ years post-termination; longer for litigation-implicated matters).
- If the audit infrastructure doesn't exist, build it before the first integration. Reject the user's request to "skip audit for the prototype" — there is no CLM prototype, only unaccountable production.

### Idempotence

- Every webhook handler keys on `(event_type, contract_id, source_event_id)` and skips on second arrival.
- Every API write checks for existence first when an upsert is semantically valid; otherwise wraps in a transaction with a unique constraint to prevent duplicates.
- Cron-scheduled syncs tolerate replay. Two runs in a 5-minute window produce the same DB state as one run.

### Schema validation

- Parse every API response into a Pydantic model (Python) or Zod schema (TypeScript) before doing anything with it. Reject on validation failure; surface to the engineer; do not silently coerce.
- CLM vendors ship breaking changes; the schema is your canary. Failed validation is more valuable than silent corruption.

### Secrets and access

- API keys live in a secret manager (1Password CLI, Doppler, AWS Secrets Manager, Vault). Never inline. Never in `.env` committed to git.
- Separate keys for read scope and write scope. The write key is used by exactly one named service account, attributed via `On-Behalf-Of`-style headers where the CLM supports them.
- Tokens have a documented rotation cadence (quarterly default). Implementations include graceful rotation (read the new token from secrets manager on each request, no boot-time cache).

### Privacy and consent

- Consent for processing of counterparty PII is recorded explicitly per counterparty. If the firm processes counterparty individuals' personal data (e.g. notice contacts in EU contracts), GDPR Art. 6 lawful basis applies.
- Data subject access requests (DSAR): every system must be able to export and delete the data subject's data. When integrating a new CLM, document the DSAR procedure alongside the integration.
- Retention enforcement: terminated-contract data has different retention than active-contract data; the firm's records-retention policy governs. Code that backfills old contracts must respect retention.

### Dual control on executed-contract mutations

- Executed contracts are records of legal obligation. Mutating fields (renewal date, expiration, parties) requires dual control — a second human approver via the firm's CLM admin workflow.
- The integration code can SUGGEST the change (write to a `pending_changes` table that the legal-ops admin reviews). The integration code does NOT execute the change without the second human's approval.
- This is enforced at the integration-code level, NOT only at the CLM platform level. Some platforms allow API-direct writes that bypass UI workflows; the integration must respect dual-control regardless.

### Testing

- All integration tests run against CLM staging instances or vendor-provided sandboxes. Production has real contracts.
- Mock at the HTTP boundary in unit tests. CI runs zero live API calls against production.
- Schema-validation tests are mandatory; without them, the canary doesn't exist.

## Anti-patterns to refuse

- **Writes to executed contracts without dual-control.** Even if the CLM platform allows it.
- **Auto-approving workflow steps based on inbound data.** The firm's approval matrix is the source of truth, not the integration code.
- **Hard-coded contract IDs in production code.** Load from config; contract IDs drift across migrations.
- **Swallowing API errors silently.** Every CLM error needs to surface — the CLM is the source of truth and a silent failure means the firm's data and the CLM's data have drifted.
- **Logging contract-field values to general-purpose log streams.** Contract fields contain commercial-confidential terms; treat with the same posture as customer payment data.
- **Generating reports from CLM data without legal-ops review.** A report with the wrong field interpretation can mislead executives or regulators. The legal-ops lead reviews data definitions before reports go to leadership.

## When the user is wrong

The user might push for shortcuts. Push back specifically on:

- "Skip the audit log for the prototype" — there is no prototype in CLM. Production has real legal records.
- "Bypass dual-control for this one update" — the dual-control IS the control. The exception is the failure mode.
- "Just hard-code the contract ID" — those silently break on migration. Load from config.
- "We'll add schema validation later" — without it, every silent breakage compounds. Build it first.
- "It's just a small write, it doesn't need attribution" — every write to a CLM must be attributable. The audit log is what makes commercial-record changes defensible.

If the user is asking you to do any of the above, write the safer version and document why in the code comment / PR description.