Churn prediction is the practice of scoring each customer on how likely they are to cancel or downgrade before they actually do it, so a CSM can intervene while there is still time to change the outcome. It turns retention from a reactive function — react after the cancellation email arrives — into a proactive one — work the at-risk account 60 to 90 days out.
It is not the same as a health score, and it is not the same as churn rate. A health score is a composite snapshot of account state; churn rate is a backward-looking GRR/NRR input that tells you what already happened. Churn prediction is a forward-looking probability: “this account has a 38% chance of not renewing in the next 90 days.” A health score can be an input to that probability, but the two are different objects.
The leading indicators that actually move the model
A churn model is only as good as its features. The signals that carry the most weight, roughly in order:
- Product usage decay. The single strongest leading indicator. Not absolute usage — the trend. A login count that drops 40% quarter-over-quarter predicts churn far better than a low-but-stable one. Track weekly active users per account, depth of feature adoption, and seats provisioned vs. seats active.
- Champion departure. When your economic buyer or power user leaves the company, renewal risk spikes. Detect it from email bounces, title changes on LinkedIn, or a sudden drop in that contact’s activity.
- Support signal. Rising ticket volume, falling CSAT, repeated escalations, or — counterintuitively — a drop to zero (the account stopped trying).
- Engagement with CS. Missed QBRs, declining email open rates, slow responses, no-shows on calls.
- Commercial signals. Late payments, downgrade requests, procurement asking for month-to-month terms, contraction at the line-item level.
- Onboarding miss. Accounts that never reach first value (TTV) churn at multiples of accounts that do. The 90-day onboarding window is the single most predictive input you have.
A model that leans only on usage will miss the champion-departure and commercial classes entirely, which is why purely product-telemetry-driven scores under-predict in enterprise.
Scoring models, from cheapest to most defensible
- Rules / threshold model. Hand-written rules: “usage down >30% AND a missed QBR AND under 90 days to renewal → at-risk.” Transparent, explainable to the CSM, cheap to build, easy to game. Where most teams should start.
- Weighted scorecard. Assign points per signal, sum, band into green/yellow/red. This is what most health-score features in Gainsight, ChurnZero, and Vitally ship out of the box. Better than nothing; the weights are usually guessed, not fitted.
- Supervised ML (logistic regression, gradient boosting). Train on labeled historical churn. This is where real lift comes from — the model learns the weights and interactions instead of you guessing them. Requires a clean labeled dataset: at minimum a few hundred churn events with feature history at the time of risk, not at the time of cancellation (or you leak the label).
Evaluate with precision/recall and a confusion matrix, not “accuracy.” On an 8% annual churn base rate, a model that predicts “nobody churns” is 92% accurate and completely useless. What you care about is: of the accounts the model flagged red, how many actually churned (precision), and of the accounts that churned, how many did the model flag in time (recall).
Where AI helps — and where it overpromises
Where it genuinely helps: ML beats hand-tuned scorecards when you have enough labeled history, because it finds non-obvious interactions (low usage is fine for an account that always logs in monthly to export a report; it is a five-alarm fire for one that used to be daily). LLMs are good at the unstructured layer scorecards ignore — summarizing the sentiment trend across a year of support tickets and emails, or flagging “the champion sounds checked out” from call transcripts. Use the LLM to enrich features, not to be the classifier.
Where it overpromises: Three failure modes recur. First, the cold-start problem — a model needs labeled churn to learn from, and a Seed-stage company with 40 customers and 3 churn events has nothing to train on. Buying an “AI churn prediction” feature there is theater; use rules. Second, base-rate confusion sold as accuracy — vendors quote “90% accurate” against a low churn base where the naive model is already 92%. Always ask for precision and recall on red flags. Third, prediction without prescription — a probability that nobody acts on is a dashboard decoration. The model has to feed a playbook (auto-create a save task, trigger an exec outreach, escalate to the renewal manager), or it changes nothing.
Common pitfalls
- Label leakage. Training features captured at cancellation (usage already at zero, support tickets already closed) instead of at the prediction horizon. The model looks brilliant offline and fails live. Guard: snapshot features as of 90 days before the churn event, never the day of.
- Acting too late. A 30-day prediction window is too short to save an enterprise renewal — the decision was made months ago. Predict at 60-90 days for enterprise, where the save motion has runway.
- One model for all segments. SMB self-serve churn (price, low usage) and enterprise churn (champion loss, exec misalignment) have different drivers. A single model blends them into mush. Segment first, then model.
- Scoring without ownership. A red flag with no named CSM and no SLA to act on it dies in the dashboard. Pair every red account with an owner and a “respond within X days” rule.
Related
- Customer health score — the composite that often feeds the model
- Customer churn — the outcome you are predicting
- Churn rate calculation — the backward-looking measure
- NRR vs GRR — where retention shows up financially
- Customer success metrics — the wider metric set