Puzzle screens are easy to administer and hard to defend
The classic technical screen asks a candidate to solve a compressed algorithm problem while an interviewer watches. It is fast, standardized, and familiar. It also removes almost everything that makes modern engineering valuable: product context, existing code, messy constraints, tests, docs, AI assistance, and the ability to make a small decision and explain it.
A founder does not need to know whether a backend candidate can perform under a whiteboard ritual. They need to know whether that candidate can open a repo, understand the current shape of the system, make a judgment call, and leave the codebase healthier than they found it.
Use a real slice, not a fake marathon
The best work sample is small enough to finish in 60 to 90 minutes and real enough to expose judgment. Give the candidate a repo slice, a ticket brief, the failing behavior, and a short note about what matters most. Keep the task narrow: add one validation path, fix one broken workflow, write one test around a known bug, or refactor one rough edge without changing behavior.
The point is not to simulate a full sprint. The point is to see how quickly someone builds a useful mental model, where they ask for context, how they choose between a local fix and a broader cleanup, and whether their final explanation matches the code they actually wrote.
AI should be visible, not banned
If your engineers use Copilot, Cursor, Claude, ChatGPT, or internal agents on the job, banning those tools in the interview produces a false signal. It rewards candidates who are good at pretending they work alone and punishes candidates who have already learned the operating mode your team needs.
The better rule is simple: AI tools are allowed, but the transcript and decisions are part of the assessment. A strong candidate uses assistance to move faster while staying accountable for the diff. A weak candidate accepts suggestions they cannot explain, misses edge cases, or lets the assistant drag them away from the actual requirement.
Score the judgment, not the performance
A practical rubric should separate completion from judgment. Did the candidate identify the right boundary? Did they preserve existing behavior? Did they write or adjust tests where risk justified it? Did they communicate assumptions? Did they know when to stop?
That rubric also gives candidates a fairer experience. They are no longer guessing what the interviewer secretly values. They can see the job-like target: ship a useful change, explain the tradeoffs, and show how they think with the tools they would use after joining.
Use this as the next interview redesign prompt
If your current loop still depends on puzzle screens, Diego can help convert one role into a real-work AI interview with a repo slice, issue brief, and scorecard your team can calibrate in a week.
