Joey Organisciak
May 12, 2025

(Part 2 of the 4-post series “Cutting Through AI Hype”)
Setting the Scene
Legal teams are fielding more AI demos than CLE invites. Every vendor claims its model can “revolutionize” contracts, research, or intake—often in a single click. Yet behind the sizzle lurk compliance headaches, hidden services fees, and, occasionally, court sanctions. (Remember the Mata v. Avianca fiasco, where ChatGPT-fabricated citations cost two lawyers $5,000 in fines? (Seyfarth Shaw - Homepage))
The U.S. Federal Trade Commission’s 2024 crackdown on DoNotPay—fining the self-styled “robot lawyer” $193k for deceptive claims—underscores that regulators are watching the hype as closely as the hype-makers. (The Verge)
With so much noise, how can firms, corporate counsel, and investors separate real innovation from vaporware? Start with four red-flag tests.
The Four Red Flags (and How to Test for Them)
# | Red Flag | What It Looks Like | Quick Test | Why It Matters |
---|---|---|---|---|
1 | “One-Click Miracles” | Demos promise a perfect brief/contract with no human review | Ask to run the tool on your own matter set and inspect the output line-by-line | Generative AI still hallucinates; lawyers remain liable (Mata shows the cost) (Seyfarth Shaw - Homepage) |
2 | Opaque Black Box | Vendor can’t explain how results are generated | Demand source links, confidence scores, or audit logs | 40 % of leaders cite explainability as a top AI risk, yet only 17 % are mitigating it (McKinsey & Company) |
3 | Hidden Pro-Services Dependency | “AI” works only after weeks of vendor training and paid tuning | Ask who retrains the model when a new doc type arrives—and how long it takes | Gartner says >30 % of Gen-AI projects will be abandoned post-POC by 2025 (Informatica) |
4 | Problem–Solution Mismatch | Shiny clause-extractor when your real pain is intake backlog | Map tool features to an actual KPI (cycle time, WIP hours, error rate) | Misaligned tools become shelfware; the FTC fined DoNotPay for claims that ignored real-world legal standards (The Verge) |
Five Due-Diligence Questions to Stop Hype in Its Tracks
Purpose & Provenance – Which specific workflow does the AI improve, and what data was it trained on?
Explainability – Can my lawyers click through to see the source language that drove each result?
Self-Service vs. Service Dependency – How many vendor hours were needed for your last client’s rollout?
Security & Privacy – Does any client data train a shared model or leave our jurisdiction?
Scalability – If we add a new matter type next quarter, do we need to buy another implementation SOW?
Document the answers and make them part of your procurement scorecard. If a vendor ducks or fumbles, you’ve likely found a red flag.
Case Compass: Our Take on Avoiding the Trap
At Case Compass, we engineered smart-intake automation to pass every question above:
Transparent by design – every AI-classified intake shows the exact language and weighting behind its decision.
Self-managed – ops teams tweak questionnaires, data extractions, and routing rules in minutes—zero billable-hour retraining.
KPI-driven – success is measured in reduced paralegal touch time and faster lead qualification, not vague “AI insights.”
That discipline keeps us (and our clients) out of the hype cycle and focused on verifiable ROI.
Key Take-aways
Hype loves a vacuum. Fill it with pointed, evidence-based questions.
Regulators are catching up. The FTC’s DoNotPay fine shows AI puffery now carries real penalties. (The Verge)
Explainability isn’t optional. Without a clear audit trail, you’re buying liability, not leverage. (McKinsey & Company)
Successful AI is workflow-native and user-controlled. That’s the benchmark—accept nothing less.
In Part 3, we’ll provide a step-by-step framework and scorecard template you can use to evaluate any legal-AI offering in 30 minutes or less. Stay tuned!