Your Custom Question Bank Is Compromised

You spent months building a custom question bank. Domain-specific problems. Carefully calibrated difficulty. Questions that actually predicted on-the-job performance.

And then everyone started cheating. Not some candidates. Everyone.

A candidate can paste your take-home into Claude and return a plausible answer in minutes. Or they can use something like Cluely, an overlay tool that went viral promising to beat any live coding screen invisibly. The founder got expelled from Columbia over it. He still raised $5.3M. Someone who would have failed your bar six months ago can now imitate the output of someone who would have passed. HackerRank's data says 83% of candidates would cheat if they thought they wouldn't get caught (HackerRank, 2025).

But none of that is actually the problem. The problem is that your format can't tell the difference between someone who used AI well and someone who used it badly.

The custom question paradox

Custom questions existed for a good reason. Generic assessments never captured what mattered for your specific team. Good teams wrote their own questions, tuned them, threw out the false positives, and kept the ones that correlated with real performance.

But AI changed the attack surface in an almost ironic way. The better your custom question is, the easier it is for an LLM to solve. Good interview problems are well-scoped: clear inputs, clear constraints, a plausible path to completion. Current coding models handle exactly that kind of problem well. So the same qualities that made your question a good predictor also made it easy to outsource. AI aces the interview. Flops in prod. The output was right. The process that produced it was not sustainable.

The take-home is dead, but the alternative is worse

A lot of teams responded the obvious way: kill the take-home. If you can't trust the output anymore, stop pretending you can.

But that move created its own problem. All the evaluation load shifted back to phone screens, panel loops, and onsites. Teams run 42% more interviews per hire than they did in 2021 (EGI, 2025-2026). More interviews, more senior time burned, still too slow for the candidates you actually want.

There is a reason Frank Dilo's thread asking "What does a modern interview process even look like now? AI can solve take-homes" got 93K views. Everyone is asking the same question. Google went the other direction entirely: back to in-person interviews to make cheating harder. I think that fixes the wrong problem. But it shows how deep the anxiety runs.

Take-homes lost integrity. Human-only loops don't scale. You have two options. Try to block AI and play whack-a-mole. Or allow AI and score whether the candidate uses it with actual judgment. The second path tests the actual job.

The harness engineer answer: measure process, not output

Your custom question only measured output. That was the real failure point.

A harness engineer isn't defined by whether they touched Claude or Cursor. They are defined by how they work through a problem with those tools in the loop. They decide what to delegate and what to keep manual. They inspect generated code before trusting it. They reject bad output. They notice when the model is confidently wrong.

If a candidate uses AI and gets to a good answer, great. Show me how they got there. If they blindly accepted a flawed implementation, that is a red flag regardless of the final output quality. If they spotted a subtle bug, defended a tradeoff, and explained why they overrode the model, that is the signal you actually wanted all along.

Don't catch AI use. Score it.

What we built at Fairground

Fairground is an interview platform that captures how engineers work with AI, not just what they submit. Keep your custom questions. Change what you measure about the answers.

The coding screener gives candidates a full IDE and AI tools. You upload your custom question. They work through it in an environment that looks like actual engineering work. We capture every prompt, every iteration, every validation decision. That is what lets you score AI judgment instead of grading final output. It runs 24/7, handles concurrent candidates, and doesn't require a senior engineer on a call.

Some signal is conversational though, not coding. The tech screener handles what coding alone can't capture: architecture tradeoffs, system design, failure modes. It runs a structured technical conversation around your rubric and your role, collecting evidence before you spend scarce interviewer time.

At some point you still want a human in the room. Canvas is for the live rounds. Your interviewers control the conversation in a collaborative environment with code editor, terminal, whiteboard, drawing, screenshare, and video in one place. AI tools stay available so you evaluate in the same reality the candidate would work in after joining.

Structured scorecards tie it together. Evidence on how candidates actually used AI at each stage, with confidence scoring attached. Not a magic number. How much confidence should the panel have in what they observed?

Your question bank isn't broken. You just built half the system.

The questions are still good. The measurement around them has not kept up with how engineers actually work now. Keep the questions. Update what you score. Moore at a16z says the "ChatGPT effect" that hit education is now hitting hiring. Mass AI-generated applications, mass AI-assisted interviews, and no way to tell who actually knows what they are doing. Unless you score the process.

Start free. 100 credits. No credit card. No sales call. Upload your custom questions and see what the process signal looks like.