What to actually ask an AI consultancy before you hire them

If you're a technical lead or operations director looking for a machine learning consultancy in the UK, you're spoiled for choice and starved for signal. Every agency has a GPT integration on their homepage. Half of them have rebranded from "digital transformation" to "AI consultancy" in the last 18 months. The pitch decks all look the same.

The questions below aren't gotchas. They're the kind of thing any competent engineering team should be able to answer directly. If a consultancy deflects, gets vague, or pivots immediately to a demo, that's information.

The checklist

01 / How do you evaluate your AI outputs?

This is the first and most important question. Any serious AI consultancy - whether in Manchester or elsewhere in the UK - should have an answer that includes automated evals running against a representative dataset. "We test it before we hand it over" is not an eval framework. Push for specifics: what metrics, what dataset, what pass rate before a release ships? If they can't describe their eval methodology, they're not testing rigorously - they're guessing and hoping.

02 / What does your system architecture look like for a project like ours?

You want to hear about components: where does data enter, how is it preprocessed, what model or models are involved, how are outputs validated before they act on anything, where are failure paths handled. If the answer is "it's an n8n flow that calls the OpenAI API," that's a workflow automation, not an AI system. Both have their place, but you should know which one you're buying.

03 / Who owns the stack after handover?

This one separates consultancies that build with you from ones that build for you. A good machine learning consultancy designs for handover: your team can read, modify, and maintain the code without ongoing dependency on the agency. Red flags include proprietary tooling with no migration path, an insistence that you'll "need support retainers" to keep the system running, and code delivered without documentation or tests. The goal of a good engagement is that you need us less at the end than at the start.

04 / Where does our data go, and who can see it?

More on this in our compliance post, but even in a preliminary conversation: are they sending your data to US-hosted APIs? Do they have a default opt-in to model training (most major providers have this off by default on paid tiers, but it should be confirmed)? For any regulated industry - legal, financial services, healthcare - the answer to "where does our data go" needs to be specific and documented, not approximate.

05 / Can you show me a failure in one of your previous systems?

This is the most revealing question on the list. Any team that's shipped AI into production has seen it fail. The good ones can tell you exactly how, what they learned, and how they changed the system or the eval set as a result. If they can only show you demos that work, they've either not shipped much or they're selectively presenting. Both are problems.

06 / What does your observability setup look like in production?

You need to be able to see what your AI system is doing. That means logging at the model level, not just the infrastructure level. Ask specifically about: prompt tracing, output sampling, latency by component, and how low-confidence outputs are surfaced. If their answer is "we use standard application monitoring," ask what that captures for the AI-specific parts. Usually: not much.

07 / How do you handle model version changes?

Foundation models are updated regularly, and those updates can change behaviour in ways that break your system without any code change on your side. A competent AI consultancy has prompt versioning, model pinning where necessary, and an eval suite that runs against the live model so you catch regressions before your users do. If they're not pinning model versions or running post-deployment evals, you're one OpenAI rollout away from a silent failure.

The Manchester context

The North West has a strong bench of agencies, and some of them are genuinely good. The problem isn't a shortage of capability - it's a lack of shared vocabulary for what "done" means in an AI project. The questions above give you a framework for that conversation. An agency that finds them reasonable is probably worth talking to further. One that treats them as adversarial is telling you something about how the engagement will go.

// quick reference: green vs red signals

green

evals with metrics

documented handover

model version pinning

GDPR-specific answers

can describe a failure

red

"we'll test it before launch"

proprietary platform lock-in

"latest model" (no version)

vague on data residency

only shows working demos

One more thing worth saying: price is not the signal you think it is. The cheapest AI consultancy in Manchester is often the one building an n8n flow and calling it an AI system. The most expensive one might be a large agency with impressive client logos and a team of account managers who'll pass your project to a junior. Neither price point tells you whether they can build something that works reliably in production.

Ask the questions. The answers tell you what you need to know.

The checklist

The Manchester context

Put the checklist to us.