AI Broke Technical Interviews. We have tried to fix them.

You are reviewing a candidate's submission.

The README is polished. The code is modular. The tests pass. Then you look closer: the average calculation divides by the wrong variable. The Dockerfile mounts node_modules at runtime. The API returns full database records to the frontend. The README mentions a Dockerfile.test that doesn't exist.

This is the new normal. AI can scaffold a passing take-home in 15 minutes. Your interview process was not designed for this.

The problem isn't that candidates are using AI

For 20 years, technical interviews had a predictable shape. LeetCode screens tested pattern recognition. Take-homes revealed how someone approached a problem from scratch — the structure they chose, the edge cases they caught, the quality of the README they wrote unprompted.

That ecosystem still exists. But it's no longer measuring what it was designed to measure.

The problem isn't that candidates are using AI. The problem is that they're not reviewing what the AI produces. And the deeper problem is that your evaluation framework can't tell the difference between:

A candidate who used AI as a tool and reviewed everything carefully
A candidate who pasted prompts and submitted whatever came back
A candidate who didn't use AI at all

When all three submit similar-looking code, what exactly are you grading?

The doubts you're now carrying

If you're running technical interviews, you're probably wrestling with some version of these:

Doubt	Underlying Problem
Is the assessment still valid?	If two candidates submit similar code — one who spent three hours thinking and one who spent fifteen minutes prompting — and the code quality is comparable, what exactly is being graded?
How much depth is there?	A candidate who can explain every design decision, defend the tradeoffs, and spot the flaw the interviewer planted is demonstrating something real. A candidate who cannot explain why the code is structured the way it is has demonstrated nothing
Did they even check this?	A bug that survives because the code was never run reveals blind trust in the AI; a bug that requires domain expertise to catch reveals the limits of the candidate's knowledge.
How much of this translates to actual work?	A candidate who can steer an AI to produce a passing submission may or may not be able to do the same thing six months into a role, on a codebase they did not scaffold themselves, with requirements that change mid-sprint.
How do I grade this fairly?	Candidates who used AI and reviewed carefully end up indistinguishable from candidates who coded carefully without it. Candidates who used AI carelessly are easy to identify. The middle is murky

These questions have no clean answers under the old model.

The open-book exam insight

Here's the mental model that makes this solvable:

Open-book exams divide a class into two groups. The first hears "open book" and thinks: I should know where everything is, practice problems, understand the material well enough to apply it under pressure. The second hears "open book" and thinks: I don't need to prepare at all.

The exam result is usually unambiguous. Open-book tests are harder than closed-book ones — because the questions require judgment, not recall.

AI-assisted interviewing has the same structure.

In an AI-assisted environment, syntax and boilerplate are commodities. If a candidate can pass your take-home by pasting the prompt into a model, the test was measuring the tool, not the candidate.

The goal is not to catch people using AI. It's to design evaluation conditions where AI fluency is necessary but not sufficient — where the candidate who has both judgment and tools is distinguishable from the candidate who has only tools.

AI readiness is tool leverage plus judgment, not tool dependence.

What to measure instead

The shift is from what code did they produce to how did they produce it, and do they own it.

Here's what that looks like at different levels:

Level	Key Question	Evidence of Competence
Junior	Do they understand and validate what the AI produced?	Code is tested and reviewed, not just generated. Can point to specific decisions made and problems caught. Iterates on output. The AI produced something, they fixed it.
Mid-level	Did they design the architecture before generating the code?	Architecture reflects their thinking, not the AI's defaults. Can explain data models, abstractions, and rejected alternatives The AI wrote a spec; they owned the spec.
Senior	Are they making strategic trade-offs or delegating them to the model?	Output that could not have been generated by prompt alone. Trade-offs were made deliberately, not inherited from the model. Can articulate what the AI got wrong and how they steered it.

The five competencies that matter -

When AI is part of the interview, you need to evaluate five things alongside traditional technical skills:

Competency	Junior	Mid-level	Senior
Task Framing	Articulate what they need before prompting, not just solve this	Plan the structure before generating code	Decompose the problem into a coherent sequence of tasks
Prompting and Steerability	Iterate when the first output is wrong	Steer with intent rather than brute-force retrying	Know when to reset versus when to refine
Output Evaluation	Read and test what the AI produced	Catch logical errors and edge cases	Evaluate security, performance, and maintainability
Design Continuity	Files import correctly, functions connect, no orphaned code	Consistency in decisions across the codebase	Architecture reflects their thinking, not the AI's defaults
Decision Ownership	Explain what the code does and why it's structured that way	Identify where to override or redirect the AI	Articulate trade-offs made and alternatives considered

The choice you’re making

AI has changed what technical interviews measure. It hasn't changed the underlying need: to identify who can reason, design, debug, and own decisions under constraint.

You have three options:

Ban AI and pretend it doesn't exist. This selects for candidates who can pass your specific format, not candidates who can work in the real world.
Allow AI and keep evaluating the same way. This selects for whoever has the best prompt-writing skills, not the best engineering judgment.
Redesign your evaluation to measure what matters when AI is available. This selects for candidates who have both the foundation and the fluency.

The third option is harder. It requires rethinking your rubrics, retraining your interviewers, and building evaluation infrastructure that can measure these five competencies systematically.

That's why we built Fairground

Fairground is designed around this framework. Every interview evaluates the five competencies alongside traditional technical skills. Candidates code with AI enabled. Interviewers get structured rubrics that separate AI fluency from AI dependency. The scoring layer tells you who has both the foundation and the tools.

If you want to hire engineers who can work in the world as it is becoming, you need assessments where AI helps, but judgment decides.

Learn more about Fairground