n8n-flow

Evidence collection orchestration for ediscovery with n8n

Difficulty

advanced

Setup time

180min

For

legal-ops · ediscovery-lead · in-house-counsel

Legal Ops

Stack

An n8n flow that orchestrates the collection-phase of ediscovery (the EDRM “Collection” stage) — pulls custodian-list data from the firm’s HRIS, generates per-custodian collection requests against the firm’s data sources (Google Workspace, Microsoft 365, Slack, file shares, custom SaaS), tracks collection completion and chain-of-custody, dispatches collected data to the Relativity workspace (or Everlaw / Logikcull) for processing. Every step writes to an immutable audit log counsel uses to defend collection adequacy. Replaces the legal-ops admin’s spreadsheet-and-screenshot manual collection with a deterministic flow.

When to use

Firms with regular ediscovery — typically those with active litigation portfolios where collection is happening multiple times per year.
Custodian count per matter is large enough that manual collection is operationally infeasible (typically >5 custodians per matter).
The firm has IT-engineering capacity to wire the connector layer (Google Workspace Vault, M365 eDiscovery, Slack Discovery API, etc.). The flow is the orchestration; the connectors are per-system.
Counsel signs off on collection scope per custodian; the flow executes against the approved scope.

When NOT to use

Single-custodian collections — manual is fine; the flow’s setup cost (180 minutes plus connector wiring) doesn’t earn back.
Replacing chain-of-custody documentation expertise. The flow generates audit records; the ediscovery lead validates that the records meet the jurisdiction’s chain-of-custody standard. Different jurisdictions have different requirements.
Auto-defining collection scope. Counsel defines scope per the matter; the flow executes against the scope, doesn’t author it.
First-of-firm matters without an established collection-procedure baseline. The flow encodes a procedure; if there’s no procedure to encode, define it first.

Setup

Import the flow. Drop apps/web/public/artifacts/evidence-collection-ediscovery-n8n/evidence-collection-ediscovery-n8n.json into your n8n instance.
Wire credentials. Per source: Google Workspace (Vault API; service account with delegated authority), Microsoft 365 (Compliance Center API; per-tenant app registration), Slack (Discovery API — only available on Enterprise Grid), HRIS (custodian source). Plus Relativity / Everlaw / Logikcull (the e-discovery platform) and Postgres (audit log).
Author the per-source collection-scope template. Per data source, document: what scopes are collectible (date range, search terms, custodian-specific filters), what the per-source rate limits are, what the expected output format is.
Configure the chain-of-custody template. Per matter and per custodian: who collected (service account name + human reviewer), when, what was collected, hash of the collection at completion. Template in _README.md.
Set up the e-discovery platform integration. Relativity Processing API or equivalent for Everlaw / Logikcull. The flow uploads to a per-matter workspace; processing pipeline (deduplication, OCR, etc.) runs in the platform.
Dry-run on a closed matter. Replay collection for a matter completed last quarter. Confirm the collected volume matches what was originally produced and that the chain-of-custody records match what counsel certified.

What the flow does

Eight nodes. Per-custodian-per-source orchestration, with chain-of-custody at every step.

Collection Request Trigger — webhook from the legal-ops platform when counsel marks collection scope approved.
Load Custodian + Scope — pulls custodian list + per-custodian per-source scope from the matter’s collection plan.
Per-Source Dispatch — fans out one branch per data source per custodian. The flow’s most complex part — each source has its own API and its own rate-limit constraints.
Source: Google Workspace Vault — Vault matter created (or reused), hold issued, search executed against custodian’s Gmail / Drive / Calendar within scope, results exported.
Source: M365 Compliance — Content search executed against custodian’s mailbox / OneDrive / Teams within scope, results exported via the Compliance Center.
Source: Slack Discovery — Slack Enterprise Grid Discovery API; per-custodian per-channel export within scope.
Hash + Chain-of-Custody Append — each per-source export is hashed (SHA-256), and a chain-of-custody record is appended to the audit table: {matter_id, custodian_id, source, scope_summary, collected_at, collected_by_service_account, hash, file_count, byte_count}.
Upload to E-Discovery Platform — push exports to the per-matter Relativity workspace; trigger processing job; record platform-side load ID in the audit log for traceability.

Cost reality

Connector / source-platform costs — Google Vault, M365 E5 with Advanced eDiscovery, Slack Enterprise Grid all carry per-seat costs. The flow doesn’t reduce those; it makes them used effectively.
n8n executions — long-running (large exports take hours); use n8n’s queue mode for production.
E-discovery platform processing cost — Relativity / Everlaw / Logikcull all charge per-GB-processed; the flow doesn’t change that math.
Legal-ops admin time — the win. Manual orchestration of a 10-custodian collection across 4 sources is days of work; the flow runs in hours unattended.
Setup time — 180 minutes for the flow itself + significant per-source connector wiring (the connectors are the bulk of the actual setup).

Success metric

Time-from-counsel-approval to collection-complete — should drop from days/weeks (manual) to hours (flow), modulo source-platform export-job duration.
Chain-of-custody completeness — should be 100% per matter. Any gap is a defensibility risk.
Volume drift — flow’s collected volume vs counsel’s expected scope. Within 10% is normal (filter calibration); >25% triggers re-scope review.

vs alternatives

vs e-discovery platform’s native collection modules (Relativity Collect, Everlaw Collections). Pick those if your team lives in the platform and the platform’s connectors cover your sources. The flow is for custom-source matters or matters spanning more sources than any single platform covers natively.
vs commercial collection-orchestration tools (Reveal Brainspace, OpenText EnCase, Cellebrite, Onna). Pick those for the highest-end matters with forensic-grade requirements. The flow is the lightweight middle ground for routine corporate ediscovery.
vs manual collection. Workable at small scale; doesn’t scale to multi-custodian matters.

Watch-outs

Chain-of-custody integrity. Guard: every per-source export is hashed at collection time and again before upload to the e-discovery platform. Hash mismatches halt the upload and alert the e-discovery lead.
Scope creep on automated collection. Guard: the flow’s scope is read from the counsel-approved collection plan; widening scope mid-run requires plan amendment, not a flow tweak. The audit log captures the plan SHA per run.
Source-platform rate-limit exhaustion. Guard: per-source rate limiters in the flow’s per-source nodes. Slack Discovery API in particular has aggressive rate limits — the flow paces accordingly.
Privilege exposure during collection. Guard: collection captures everything in scope; privilege review happens downstream in the e-discovery platform (the privilege review batch skill is the next stage). The flow does NOT pre-filter privileged content — that’s a downstream decision.
Custodian privacy concerns. Guard: the flow operates against the systems the custodian uses for work; personal accounts (personal Gmail, personal Slack) are out of scope unless counsel explicitly named them. The collection plan documents the boundary.
Cross-jurisdiction data-localization. Guard: EU-resident custodian data may be subject to GDPR data-localization considerations; the flow’s per-custodian scope flags EU-resident custodians for data-handling review before export to a non-EU e-discovery workspace.

Stack

The bundle lives at apps/web/public/artifacts/evidence-collection-ediscovery-n8n/:

evidence-collection-ediscovery-n8n.json — the flow export (skeleton — actual per-source connectors are firm-specific)
_README.md — credentials, audit-table schema, per-source connector notes, chain-of-custody template

Tools: n8n, Relativity (or Everlaw / Logikcull), Slack (notification only). Source-platform connectors: Google Workspace Vault, Microsoft 365 Compliance, Slack Discovery, custom SaaS per the firm’s stack.

Edit this page on GitHub

Files in this artifact

Download all (.zip)

# Evidence collection for ediscovery — n8n flow (skeleton)

Orchestrates the EDRM "Collection" stage: per-custodian per-source dispatch against Google Workspace Vault, M365 Compliance, Slack Discovery, and custom SaaS sources. Hashes every export, writes chain-of-custody to an immutable audit table, uploads to the e-discovery platform.

**This is a skeleton flow.** The bundled n8n JSON shows the structure (request → load plan → dispatch per source → audit) and includes a working Google Vault saved-query node as an exemplar. Production deployment requires the firm's ediscovery engineer to:

1. Complete the per-source nodes (Google Vault has create-query → start-export → poll-export → fetch-blob; bundled flow shows only create-query).
2. Wire the M365 Compliance and Slack Discovery branches (skeleton has placeholders).
3. Replace the placeholder hash in `Hash + Chain-of-Custody` with actual export-bytes hashing.
4. Add the upload-to-Relativity / Everlaw / Logikcull node at the end.
5. Add per-source rate limiters.

The flow's value is in the structure (audit shape, dispatch pattern, chain-of-custody discipline) — the per-source connector code is firm-specific.

## Database tables

```sql
-- Counsel-approved collection plan. One row per (custodian, source) pair.
CREATE TABLE collection_plans (
    collection_plan_id   TEXT NOT NULL,
    plan_sha             TEXT NOT NULL,
    matter_id            TEXT NOT NULL,
    custodian_id         TEXT NOT NULL,
    source               TEXT NOT NULL,
    scope_json           JSONB NOT NULL,
    status               TEXT NOT NULL CHECK (status IN ('draft','approved','executed','superseded')),
    approved_by          TEXT,
    approved_at          TIMESTAMPTZ,
    PRIMARY KEY (collection_plan_id, custodian_id, source)
);

-- Chain-of-custody, append-only.
CREATE TABLE collection_audit (
    audit_id                          BIGSERIAL PRIMARY KEY,
    matter_id                         TEXT NOT NULL,
    collection_id                     TEXT NOT NULL,
    custodian_id                      TEXT NOT NULL,
    source                            TEXT NOT NULL,
    plan_sha                          TEXT NOT NULL,
    collected_at                      TIMESTAMPTZ NOT NULL,
    collected_by_service_account      TEXT NOT NULL,
    hash                              TEXT NOT NULL,
    file_count                        INTEGER NOT NULL,
    byte_count                        BIGINT NOT NULL,
    scope_summary                     TEXT,
    upload_load_id                    TEXT,  -- e-discovery platform load ID, written when upload completes
    upload_completed_at               TIMESTAMPTZ
);

CREATE INDEX collection_audit_matter_idx ON collection_audit (matter_id, collected_at);

-- Immutability:
REVOKE UPDATE, DELETE, TRUNCATE ON collection_audit FROM PUBLIC;
GRANT INSERT, SELECT ON collection_audit TO <ediscovery_app_role>;
-- upload_load_id and upload_completed_at can be UPDATEd via a function that
-- enforces "only when previously NULL" — implement as a stored procedure
-- if you need to record platform-side load IDs after collection.
```

## Per-source connector notes

### Google Workspace Vault

API doc: https://developers.google.com/vault/

- Service account with delegated authority to access user data.
- Create-query → start-export → poll-export-status → fetch-blob sequence. Exports are async; polling can take minutes to hours.
- Vault matter must exist; the flow can create-or-reuse.
- Hold should be in place at the matter level before query (separate workflow — see [litigation hold orchestration](../litigation-hold-orchestration-n8n/)).
- Rate limits: per-project quotas. Vault tends to be export-job-bound rather than rate-limit-bound.

### Microsoft 365 Compliance

API doc: https://learn.microsoft.com/en-us/microsoft-365/compliance/

- Per-tenant app registration with Compliance Center scopes (eDiscovery.Manage etc.).
- Content search → run-search → start-export → download-export sequence.
- Advanced eDiscovery (eDiscovery Premium) is an E5 add-on — confirm tenant licensing.
- Rate limits: per-tenant; varies by SKU.

### Slack Discovery

API doc: https://api.slack.com/enterprise/discovery (Enterprise Grid only)

- Discovery API only available on Slack Enterprise Grid.
- Per-channel and per-user export endpoints. The Discovery API is rate-limited aggressively (single-digit req/sec for most endpoints).
- Output is JSON-line message records; preserve files via separate file-export endpoint.
- Pagination is cursor-based; loop until empty.

### Custom SaaS

For internal tools or smaller SaaS that the team uses:

- Document the source's export shape and chain-of-custody implications.
- Build a connector node that writes to the same per-source pattern as the bundled examples.
- Hash the export at fetch time, append to audit table.

## Chain-of-custody record format

Each `collection_audit` row is the chain-of-custody record. Counsel demonstrates collection adequacy via these records:

```
Matter: M-2026-0042
Collection: coll-20260503-abc123
Custodian: jane-doe@firm.com
Source: google-vault
Collected at: 2026-05-03T14:00:00Z
Service account: ediscovery-bot@firm
Hash (SHA-256): a3f2b1c4...
File count: 1,247
Byte count: 4,231,789,022
Scope: { "email": "jane-doe@firm.com", "start_time": "2024-01-01", "end_time": "2026-04-30", "terms": "(\"Acme deal\" OR \"Project X\") AND -from:counsel@firm" }
Upload to e-discovery: load-2026-05-03-abc123 (Relativity workspace 'M-2026-0042')
```

For court submissions, the chain-of-custody records typically need to be produced in a more formal format — a paralegal exports the audit records and formats per jurisdictional requirements. The flow's records are the source data.

## Credentials

- `PLACEHOLDER_PLAN_DB_CRED_ID` — read access to `collection_plans`.
- `PLACEHOLDER_AUDIT_DB_CRED_ID` — write access to `collection_audit`.
- `PLACEHOLDER_GOOGLE_VAULT_CRED_ID` — service account with delegated authority.
- `PLACEHOLDER_M365_CRED_ID` — per-tenant app registration with Compliance Center scopes.
- `PLACEHOLDER_SLACK_DISCOVERY_CRED_ID` — Slack org-admin token with `discovery:read` scope.
- `PLACEHOLDER_RELATIVITY_CRED_ID` — Relativity REST API credentials (or Everlaw / Logikcull equivalent).

## Dry-run procedure

1. Provision tables on a non-production DB.
2. Wire credentials to staging endpoints (test Google project, test M365 tenant, test Slack workspace).
3. Replay a closed matter's collection plan against staging sources (with anonymized custodian data).
4. Verify chain-of-custody records and platform-side load IDs.
5. Switch to production credentials only after a full successful dry-run.

## Known limits / production-readiness gaps

This is a skeleton. Before production:

1. Per-source export polling — Google Vault and M365 Compliance exports are async; the flow needs a poll-and-resume pattern (not bundled).
2. Per-source export-blob fetching — once the export is ready, the flow needs to download the blob and hash it (skeleton uses placeholder hash).
3. M365 Compliance branch — entirely skeleton; needs Content Search + Search Result Export wiring.
4. Slack Discovery branch — entirely skeleton; needs cursor-based per-channel paging.
5. E-discovery platform upload — not bundled; per-platform Relativity / Everlaw / Logikcull connector required.
6. Per-source rate limiting — the per-source nodes need rate limiters in production.
7. Error recovery — failed-export retry / replay logic not bundled.

This skeleton's value is in the orchestration shape and the audit / chain-of-custody discipline; the connector layer is the firm's ediscovery engineering work.

{
  "name": "Evidence collection ediscovery (skeleton)",
  "nodes": [
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "collection-request",
        "responseMode": "lastNode",
        "options": { "rawBody": false }
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000001",
      "name": "Collection Request",
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 2,
      "position": [240, 400],
      "webhookId": "collection-request",
      "notesInFlow": true,
      "notes": "Webhook from legal-ops platform: {matter_id, collection_plan_id}. The collection plan is the counsel-approved scope; this flow executes against it, doesn't author it."
    },
    {
      "parameters": {
        "operation": "executeQuery",
        "query": "WITH plan AS (\n  SELECT collection_plan_id, plan_sha, custodian_id, source, scope_json\n  FROM collection_plans\n  WHERE collection_plan_id = $1 AND status = 'approved'\n)\nSELECT * FROM plan;",
        "options": { "queryReplacement": "={{ $json.collection_plan_id }}" }
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000002",
      "name": "Load Collection Plan",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2.4,
      "position": [460, 400],
      "credentials": {
        "postgres": { "id": "PLACEHOLDER_PLAN_DB_CRED_ID", "name": "Postgres — collection plans" }
      }
    },
    {
      "parameters": {
        "jsCode": "// For each (custodian, source) pair, prepare a per-source dispatch payload.\n// The flow's per-source nodes receive these payloads.\nconst rows = $input.all().map(r => r.json);\nconst trigger = $('Collection Request').item.json;\n\nif (rows.length === 0) {\n  return [{ json: { status: 'halted', reason: 'no_approved_plan_rows', collection_plan_id: trigger.collection_plan_id } }];\n}\n\nconst out = rows.map(row => ({\n  json: {\n    matter_id: trigger.matter_id,\n    collection_plan_id: trigger.collection_plan_id,\n    plan_sha: row.plan_sha,\n    custodian_id: row.custodian_id,\n    source: row.source,\n    scope: typeof row.scope_json === 'string' ? JSON.parse(row.scope_json) : row.scope_json,\n    requested_at: new Date().toISOString(),\n    collection_id: `coll-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`,\n  }\n}));\n\nreturn out;"
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000003",
      "name": "Per-Source Dispatch",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [680, 400],
      "notesInFlow": true,
      "notes": "Fans out one item per (custodian, source) pair. The downstream switch routes by source."
    },
    {
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": { "caseSensitive": true },
                "conditions": [
                  { "leftValue": "={{ $json.source }}", "rightValue": "google-vault", "operator": { "type": "string", "operation": "equals" } }
                ],
                "combinator": "and"
              },
              "outputKey": "google"
            },
            {
              "conditions": {
                "options": { "caseSensitive": true },
                "conditions": [
                  { "leftValue": "={{ $json.source }}", "rightValue": "m365-compliance", "operator": { "type": "string", "operation": "equals" } }
                ],
                "combinator": "and"
              },
              "outputKey": "m365"
            },
            {
              "conditions": {
                "options": { "caseSensitive": true },
                "conditions": [
                  { "leftValue": "={{ $json.source }}", "rightValue": "slack-discovery", "operator": { "type": "string", "operation": "equals" } }
                ],
                "combinator": "and"
              },
              "outputKey": "slack"
            }
          ]
        },
        "options": { "fallbackOutput": "extra" }
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000004",
      "name": "Source Switch",
      "type": "n8n-nodes-base.switch",
      "typeVersion": 3,
      "position": [900, 400]
    },
    {
      "parameters": {
        "method": "POST",
        "url": "https://vault.googleapis.com/v1/matters/{{ $env.GOOGLE_VAULT_MATTER_ID }}/savedQueries",
        "authentication": "predefinedCredentialType",
        "nodeCredentialType": "googleApi",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            { "name": "Content-Type", "value": "application/json" }
          ]
        },
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={\n  \"displayName\": \"{{ $json.collection_id }}\",\n  \"query\": {\n    \"corpus\": \"MAIL\",\n    \"dataScope\": \"ALL_DATA\",\n    \"searchMethod\": \"ACCOUNT\",\n    \"accountInfo\": { \"emails\": [\"{{ $json.scope.email }}\"] },\n    \"mailOptions\": { \"excludeDrafts\": false },\n    \"startTime\": \"{{ $json.scope.start_time }}\",\n    \"endTime\": \"{{ $json.scope.end_time }}\",\n    \"terms\": \"{{ $json.scope.terms }}\"\n  }\n}",
        "options": {
          "response": { "response": { "responseFormat": "json", "neverError": false } },
          "timeout": 60000
        }
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000005",
      "name": "Google Vault: Saved Query",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [1120, 280],
      "credentials": {
        "googleApi": { "id": "PLACEHOLDER_GOOGLE_VAULT_CRED_ID", "name": "Google Vault service account" }
      },
      "notesInFlow": true,
      "notes": "Creates a saved query in the matter; an export job is the next step (separate API call). Real production flow needs the full create-query → poll-export sequence; skeleton shown."
    },
    {
      "parameters": {
        "jsCode": "// Compute SHA-256 of the export, append chain-of-custody record.\n// Skeleton — production flow includes the actual export-fetch step.\nconst crypto = require('crypto');\nconst input = $input.first().json;\nconst dispatch = $('Per-Source Dispatch').item.json;\n\n// In production: fetch the actual export bytes here, hash them.\n// Skeleton uses a deterministic placeholder so the audit record shape is correct.\nconst placeholderHash = crypto.createHash('sha256').update(`${dispatch.collection_id}-${dispatch.source}`).digest('hex');\n\nreturn [{\n  json: {\n    matter_id: dispatch.matter_id,\n    collection_id: dispatch.collection_id,\n    custodian_id: dispatch.custodian_id,\n    source: dispatch.source,\n    plan_sha: dispatch.plan_sha,\n    collected_at: new Date().toISOString(),\n    collected_by_service_account: $env.COLLECTION_SERVICE_ACCOUNT || 'ediscovery-bot@firm',\n    hash: placeholderHash,\n    file_count: input.fileCount || 0,\n    byte_count: input.byteCount || 0,\n    scope_summary: JSON.stringify(dispatch.scope).slice(0, 500),\n    skeleton_warning: 'This skeleton flow does not fetch and hash actual export bytes. Production: replace with fetch + bytewise hash.',\n  }\n}];"
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000006",
      "name": "Hash + Chain-of-Custody",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [1340, 400]
    },
    {
      "parameters": {
        "operation": "insert",
        "schema": "public",
        "table": "collection_audit",
        "columns": "matter_id, collection_id, custodian_id, source, plan_sha, collected_at, collected_by_service_account, hash, file_count, byte_count, scope_summary",
        "additionalFields": {}
      },
      "id": "7a7a7a7a-0001-0000-0000-000000000007",
      "name": "Audit: Collection Complete",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2.4,
      "position": [1560, 400],
      "credentials": {
        "postgres": {
          "id": "PLACEHOLDER_AUDIT_DB_CRED_ID",
          "name": "Postgres — chain-of-custody (append-only)"
        }
      }
    }
  ],
  "connections": {
    "Collection Request": { "main": [[{ "node": "Load Collection Plan", "type": "main", "index": 0 }]] },
    "Load Collection Plan": { "main": [[{ "node": "Per-Source Dispatch", "type": "main", "index": 0 }]] },
    "Per-Source Dispatch": { "main": [[{ "node": "Source Switch", "type": "main", "index": 0 }]] },
    "Source Switch": {
      "main": [
        [{ "node": "Google Vault: Saved Query", "type": "main", "index": 0 }],
        [],
        []
      ]
    },
    "Google Vault: Saved Query": { "main": [[{ "node": "Hash + Chain-of-Custody", "type": "main", "index": 0 }]] },
    "Hash + Chain-of-Custody": { "main": [[{ "node": "Audit: Collection Complete", "type": "main", "index": 0 }]] }
  },
  "settings": {
    "executionOrder": "v1",
    "timezone": "America/New_York",
    "saveExecutionProgress": true,
    "saveManualExecutions": true
  },
  "active": false,
  "versionId": "1"
}