Think of a black box AI as a machine you can feed information into, but cannot look inside. You give it a candidate's interview answers and it gives you a score. But how it got from one to the other? That part is completely hidden, often even from the company that built the tool.
Many AI hiring tools work this way, particularly those built on large language models (LLMs), the same technology behind popular AI chat tools such as ChatGPT and Claude. They are good at processing large volumes of responses quickly and generating summaries that sound confident and well-reasoned. The problem is that the scoring itself is not based on clear, fixed rules. It is based on statistical patterns, which means it can shift and sometimes even without anyone realizing.
Research has shown that LLM-based tools produce meaningfully different scores for identical candidate inputs depending on how and when they are evaluated. Not because the candidate changed. Not because the criteria changed. Just because that is how the model works. For a recruiter, that is a serious problem: if the same answer can get a different result depending on when it was processed, the score is not really measuring the candidate at all.
Glass box AI, also called explainable AI, works the other way around. The scoring logic is visible, documented, and consistent. When a recruiter looks at a candidate's score, they can see exactly which criteria were assessed, how much weight each one carried, and why that candidate landed where they did.
This matters for a simple reason: if you cannot explain a hiring decision, you cannot stand behind it. A candidate who was rejected can ask why and a manager can ask why. Under the EU AI Act, which classifies AI used in recruitment as high-risk, you may be legally required to answer. "The algorithm decided" will not be good enough. Glass box AI is built to meet that standard. Black box AI, by design, is not.
Here is something worth knowing: almost every AI company in the HR space now claims their product is "explainable" or "transparent." It has become a marketing term more than a technical one.
The important question to ask is not whether an explanation exists, but whether it is accurate. Some AI tools generate a reason for a score after the score has already been assigned, in the same way a student might write a justification for an answer they guessed. The explanation sounds logical, but it does not actually reflect how the decision was made.
Genuinely explainable AI shows you the real logic based on the same rules the system used to produce the score in the first place. This is only possible when the scoring model is consistent: meaning the same answers always produce the same score, without variation.
Hubert uses two separate layers in its interview process. The conversation itself – the questions, the follow-ups, the natural back-and-forth – is designed to feel warm and human, because candidates deserve that experience. But the scoring, where shortlisting recommendations are actually made, uses deterministic AI models that apply identical logic to every candidate, every time, with no variation. That is what makes genuine glass box AI possible.
Every candidate response is assessed against specific criteria, with explicit weights. Recruiters can see exactly how those weights played out for any individual candidate. There is no black box.
In practice, this consistency produces better results. Hemfrid, a home services company that processes over 15,000 applications per year, achieved 90% accuracy in predicting successful hires through Hubert's structured screening. ManpowerGroup reduced recruiter screening time by 67% while keeping candidate experience scores at 9/10. Consistent, auditable criteria produce sharper shortlists. That is not a coincidence.
At some point, your hiring process will be questioned. A candidate will push back on a decision, a legal team will ask for documentation, or an internal stakeholder will want to understand how AI was used.
When that moment comes, the answer needs to be specific: here is what we assessed, here is how it was weighted, and here is why this candidate scored the way they did. That answer is only available if you are using glass box AI. With a black box, the honest answer is: we do not fully know.
For hiring teams that take fairness, accuracy, and accountability seriously, that is not good enough.
What is the difference between glass box and black box AI? Black box AI produces a result without showing how it got there. Glass box AI makes its decision logic visible, so recruiters can see exactly why a candidate received a particular score.
Why does black box AI create risk in hiring? If an AI tool cannot explain its decisions, you cannot defend them to a rejected candidate, to a manager, or under employment law. The EU AI Act classifies AI used in recruitment as high-risk, requiring transparency and human oversight. Black box tools are structurally unable to meet that standard.
What does it mean for AI scoring to be consistent? Consistent scoring means the same candidate answers always produce the same score, no variation based on when or how the system was queried. Without consistency, a score does not reliably measure the candidate. With it, shortlisting becomes genuinely fair and auditable.
How does Hubert use glass box AI? Hubert's scoring uses fixed, documented rules. Each competency is assessed against specific behavioral criteria with defined weights, trained on over 100,000 expert human evaluations. The explanation a recruiter sees reflects the actual scoring logic, not a summary generated after the fact.