Machine-Learning

The Government Already Has the Data

15 July 2025 · 15 mins

Fraud-Analytics PPP-Fraud Healthcare-Fraud DOJ-Enforcement Data-Fusion-Center PACE

Medicare claims, tax returns, PPP and EIDL applications — the government increasingly holds the structured transaction data that lets fraud enforcement start with a query, not a tip.

From Kaggle to MCP: Open-Source Medicare Fraud Detection

20 December 2025 · 14 mins

Fraud-Analytics Healthcare-Fraud CMS Medicare Open-Source GitHub

The PPP fraud pipeline worked because the SBA released unusually inspectable data. Medicare's public data is fragmented, de-identified, and missing the features detection needs. Here's what exists on GitHub, where it falls short, and what CMS would need to release to make outside healthcare-fraud analysis more practical.

The Backtest: What Excluded Medicare Providers Look Like Before They Get Caught

20 May 2026 · 27 mins

Fraud-Analytics Healthcare-Fraud CMS Medicare LEIE Anomaly-Detection

The previous post described a Medicare fraud backtest nobody had built. Here are the results. 289 excluded providers across 41 states, matched to pre-exclusion billing data, compared against 3.39 million peers. Thirteen of fifteen features showed statistically significant differences — and the same behavioral fingerprint shows up in never-excluded providers who have independent enforcement histories.

Building a Medicare Fraud Backtest in One Claude Code Session

25 May 2026 · 19 mins

Fraud-Analytics Healthcare-Fraud CMS Medicare Claude-Code Agentic-AI

A walkthrough of building a Medicare fraud backtest overnight in Claude Code — from a plain-English spec to 289 matched providers across 41 states, a fraud-similarity model with AUC 0.79, and a manual public-record check of high-scoring peers. Including the three times the pipeline failed, the data duplication bug, and the engineering decisions that shaped the final design.

↑