Black-box AI refers to any system that produces an output such as a score, a ranking, a pass or fail, without showing how it got there. The decision comes out, but the reasoning stays hidden. You can only see the result and not the logic behind it.
In hiring, this shows up as a candidate score with no explanation attached, or a shortlist you can't fully account for. The tool has made a judgment but the recruiter can't see what it was based on.
The short answer: if you can't explain a hiring decision, you can't legally defend it.
When a recruiter screens a candidate manually, they can walk through their reasoning. With a black-box AI tool, that reasoning is unavailable to the recruiter, to the hiring manager, and to the candidate who wants to understand why they didn't progress.
That gap creates three concrete risks:
Legal exposure. Under GDPR, candidates have the right to request information about automated decisions that affect them. The EU AI Act goes further: AI tools used to screen or rank candidates are classified as high-risk systems. Those obligations include human oversight, bias testing, and the ability to explain how the system works. A tool that can't explain its outputs doesn't meet that bar.
Bias you can't detect. Black-box models often learn from historical hiring data and historical hiring data reflects historical biases. If a model has quietly learned to prefer candidates from certain universities, locations, or backgrounds, you won't know until the pattern shows up in your outcomes. By then, you may have screened out hundreds of qualified candidates, and you have no audit trail to trace it back to.
Recruiter distrust. When a tool produces results that recruiters can't interpret, one of two things happens: they override it constantly, which defeats the purpose, or they follow it without question, which removes human judgment from the process entirely. Neither is a good outcome.
The clearest sign is a score without a reason. If your AI screening tool tells you a candidate is a 72% match (or flags them as unsuitable) and can't tell you which specific responses or criteria drove that assessment, that's a black-box system.
Other indicators: scoring that changes for the same candidate across different sessions, outputs described as "proprietary" with no further explanation, and no ability to tie a score back to a specific candidate response.
There's a subtler version of this problem worth knowing about. Some AI tools, particularly those built on large language models (LLMs), will generate an explanation for a decision, but that explanation is produced after the fact. The model scores the candidate first, then constructs a plausible-sounding rationale. The explanation isn't the actual reasoning; it's a reconstruction of it and that distinction matters. If the explanation isn't derived directly from the scoring logic, it can't be audited, challenged, or relied on in a compliance context. A convincing explanation is not the same thing as a traceable one.
The opposite of black-box AI is deterministic, explainable AI, also called glass-box AI. In a deterministic system, the same input always produces the same output. Every score ties directly to a specific candidate response, assessed against criteria the recruiter defined. There's nothing hidden, and there's nothing to drift.
This matters for compliance, but it also matters for the day-to-day work of recruiting. When you can see exactly why a candidate scored the way they did, you can review it, challenge it, and make a better decision. That's what human oversight actually looks like, not a checkbox, but a genuine ability to interrogate the output.
Hubert uses deterministic scoring throughout its screening process. Same candidate, same answers, same score, every time. Every result ties to a specific response, which means every decision is auditable by the recruiter, by the hiring manager, and if needed, by a regulator.
If you're evaluating tools or reviewing what you already use, these are the questions that matter:
Can you show me exactly what drove this candidate's score? If the answer is vague, that's your answer.
What model is this built on? If the answer is a general-purpose LLM, ask how they guarantee consistent, repeatable scoring and what happens when the model is updated. A regulator won't accept "the LLM decided" as an explanation for a rejected candidate.
Does the same input always produce the same output? Probabilistic models can score the same candidate differently across sessions and that's not auditable or legally defensible.
What's your EU AI Act compliance status? Full high-risk obligations take effect soon and your vendor should be able to show you their documentation now, not scramble for it later.
Black-box AI in hiring isn't just a technical inconvenience. It's a legal risk, a fairness risk, and a trust problem for everyone involved; candidates, recruiters, and for the organizations that use these tools.
The good news is that explainability isn't a trade-off against speed or quality. A well-built screening system can process candidates at scale, consistently, and still tell you exactly why each candidate received the score they did. That's the standard worth holding vendors to.