Red-Flag Radar: How to Spot Over-Promising AI Vendors in Legal Tech

Joey Organisciak

May 12, 2025

YOUR LEGAL TEAM'S TRUE NORTH

Pages

Information

YOUR LEGAL TEAM'S TRUE NORTH

Pages

Information

(Part 2 of the 4-post series “Cutting Through AI Hype”)

Setting the Scene

Legal teams are fielding more AI demos than CLE invites. Every vendor claims its model can “revolutionize” contracts, research, or intake—often in a single click. Yet behind the sizzle lurk compliance headaches, hidden services fees, and, occasionally, court sanctions. (Remember the Mata v. Avianca fiasco, where ChatGPT-fabricated citations cost two lawyers $5,000 in fines? (Seyfarth Shaw - Homepage))

The U.S. Federal Trade Commission’s 2024 crackdown on DoNotPay—fining the self-styled “robot lawyer” $193k for deceptive claims—underscores that regulators are watching the hype as closely as the hype-makers. (The Verge)

With so much noise, how can firms, corporate counsel, and investors separate real innovation from vaporware? Start with four red-flag tests.

The Four Red Flags (and How to Test for Them)

#	Red Flag	What It Looks Like	Quick Test	Why It Matters
1	“One-Click Miracles”	Demos promise a perfect brief/contract with no human review	Ask to run the tool on your own matter set and inspect the output line-by-line	Generative AI still hallucinates; lawyers remain liable (Mata shows the cost) (Seyfarth Shaw - Homepage)
2	Opaque Black Box	Vendor can’t explain how results are generated	Demand source links, confidence scores, or audit logs	40 % of leaders cite explainability as a top AI risk, yet only 17 % are mitigating it (McKinsey & Company)
3	Hidden Pro-Services Dependency	“AI” works only after weeks of vendor training and paid tuning	Ask who retrains the model when a new doc type arrives—and how long it takes	Gartner says >30 % of Gen-AI projects will be abandoned post-POC by 2025 (Informatica)
4	Problem–Solution Mismatch	Shiny clause-extractor when your real pain is intake backlog	Map tool features to an actual KPI (cycle time, WIP hours, error rate)	Misaligned tools become shelfware; the FTC fined DoNotPay for claims that ignored real-world legal standards (The Verge)

Five Due-Diligence Questions to Stop Hype in Its Tracks

Purpose & Provenance – Which specific workflow does the AI improve, and what data was it trained on?
Explainability – Can my lawyers click through to see the source language that drove each result?
Self-Service vs. Service Dependency – How many vendor hours were needed for your last client’s rollout?
Security & Privacy – Does any client data train a shared model or leave our jurisdiction?
Scalability – If we add a new matter type next quarter, do we need to buy another implementation SOW?

Document the answers and make them part of your procurement scorecard. If a vendor ducks or fumbles, you’ve likely found a red flag.

Case Compass: Our Take on Avoiding the Trap

At Case Compass, we engineered smart-intake automation to pass every question above:

Transparent by design – every AI-classified intake shows the exact language and weighting behind its decision.
Self-managed – ops teams tweak questionnaires, data extractions, and routing rules in minutes—zero billable-hour retraining.
KPI-driven – success is measured in reduced paralegal touch time and faster lead qualification, not vague “AI insights.”

That discipline keeps us (and our clients) out of the hype cycle and focused on verifiable ROI.

Key Take-aways

Hype loves a vacuum. Fill it with pointed, evidence-based questions.
Regulators are catching up. The FTC’s DoNotPay fine shows AI puffery now carries real penalties. (The Verge)
Explainability isn’t optional. Without a clear audit trail, you’re buying liability, not leverage. (McKinsey & Company)
Successful AI is workflow-native and user-controlled. That’s the benchmark—accept nothing less.

In Part 3, we’ll provide a step-by-step framework and scorecard template you can use to evaluate any legal-AI offering in 30 minutes or less. Stay tuned!