How to spot AI snake oil: the 7 red flags in a vendor demo

7 red flags to watch for when an AI vendor is selling you something. The demos that look impressive and fall apart in production, and how to catch them.

A good AI demo is a stagecraft. The vendor controls the inputs. The vendor controls the questions. The output looks magical. You sign the contract. Two weeks in, the magic is gone and you're paying $24K/year for a fancy autocomplete.

Here are the seven red flags that tell you you're watching theater, plus the questions that cut through.

Red flag 1: The demo only uses scripted inputs

What it looks like: the vendor types one of three pre-prepared questions and the AI responds beautifully. The questions are real-feeling but rehearsed.

Why it's a problem: AI behaves dramatically differently on inputs the vendor didn't optimize for. The polished demo prompt was tested 50 times before you saw it. Your actual production workload will look nothing like it.

The question to ask: "Can I type my own question into this demo right now?" If they say "we have a dedicated trial environment for that" — the live demo is theater.

Red flag 2: Accuracy claims with no failure rate

What it looks like: "Our AI is 98% accurate." "Our agent handles 95% of queries correctly."

Why it's a problem: 98% sounds great until you do the math. At 1,000 queries/day, 2% is 20 wrong answers daily. If you're sending those to customers, that's 20 wrong answers in front of 20 customers. The 2% matters.

Worse: vendors usually quote accuracy on their internal benchmark, which they designed. On your real workload, accuracy is often 10-20 points lower.

The question to ask: "What's your accuracy on a held-out test set you didn't design? And what happens to the 2% it gets wrong — does it know it's wrong, or does it confidently output the wrong answer?"

Red flag 3: They won't name the underlying model

What it looks like: "We use proprietary AI." "Our intelligence engine combines multiple models." "We've built a custom AI for your industry."

Why it's a problem: in 2026, most "AI" tools are wrappers around the same 4 underlying models (Claude, GPT-4, Gemini, sometimes a fine-tuned Llama). Vendors who won't name the model are usually hiding either (a) that they're using GPT-4 and marking up the API by 10x, or (b) that they're using a weaker open-source model and didn't want you to know.

The question to ask: "What underlying model do you call? Claude? GPT-4? Gemini?" The answer should be specific.

Red flag 4: "Eliminates hallucinations" or similar absolutist claims

What it looks like: "Our AI doesn't hallucinate." "Zero false outputs guaranteed." "Hallucination-free architecture."

Why it's a problem: no AI tool eliminates hallucinations entirely in 2026. They reduce them. Anyone claiming elimination is either misleading you or doesn't understand the technology they're selling.

The question to ask: "What's your hallucination rate, and what's your protocol for when one slips through?" A real answer talks about review gates, confidence scoring, and escalation paths. A theater answer says "ours doesn't hallucinate."

Red flag 5: Pricing is "Contact us"

What it looks like: pricing page says "schedule a demo to discuss pricing."

Why it's a problem: vendors hide pricing when (a) they price-discriminate based on what they think you can pay, or (b) the price is high enough that publishing it would lose them top-of-funnel volume. Both are bad for B2B buyers.

If you can't get a number on the first call, you're going to spend three calls negotiating. Then you'll get the price, which will start with a digit you weren't expecting.

The question to ask: "What's the price for a 12-person business with 5,000 monthly queries?" If they can't give you a real number on the first call, walk away or pad your timeline.

Red flag 6: They emphasize the AI's reasoning, not the workflow it completes

What it looks like: "Watch this, the AI is thinking through the problem step by step."

Why it's a problem: in production, you don't care that the AI is reasoning. You care that the right thing happens at the end. Vendors who showcase the reasoning are usually selling a chatbot that requires you to drive every step. Real agent products showcase end-to-end task completion.

The question to ask: "What does the production output look like, what does the user actually receive?" If the demo focuses on the AI's chain of thought rather than the finished output, the product is closer to "smart Google" than to "agent that does the work."

Red flag 7: No sample of real customer output

What it looks like: glowing customer logos on the homepage. Quotes from "Sarah, VP at Acme Corp." No actual examples of what the AI produces for real customers.

Why it's a problem: vendors who can't show real output are usually hiding that the output is mediocre, that they don't have real customers yet, or that what they call "AI" is actually rule-based with one AI step bolted on.

The question to ask: "Can I see the actual output from three real customer accounts, redacted as needed?" Real vendors with real customers can do this. Theater vendors stall.

What good actually looks like

Three signs you're talking to a real product, not a pitch deck:

1. They specify what model they call. "We use Claude Sonnet via the Anthropic API." "We use Azure OpenAI." Real specificity.

2. They quote accuracy with caveats. "On our test set we hit 92%. On a held-out set we hit 87%. Your real-world accuracy will land in that range, plus or minus 5 points." That's an honest vendor.

3. They show real production output. Sample emails their agents have sent. Sample summaries their agents have generated. Real numbers from real customer accounts. Redacted, but specific.

If a vendor passes those three tests, the product is probably real. If they fail two of the three, you're watching theater.

What this means for you

Vendor evaluation in 2026 is mostly about cutting through marketing. The four questions in this post, what model, what accuracy, what price, what output, are 80% of the signal you need. Run them on every AI tool you're considering.

The next post in this series covers AI safety: the four rules that actually matter for a growing team, separate from the safety theater that fills most vendor security pages.

Get started

Want a real number for your specific situation?

30-minute audit call walks through your workflows and outputs a fixed price for the 2-3 things worth automating first.

Get a free audit See all agents