What causes bias in AI interviews and how to spot it

2026-06-04

Fredrik Törn

For talent acquisition leaders managing high-volume hiring, choosing an AI recruitment tool is now a compliance decision as much as an efficiency one. Bias in AI interviews is not hypothetical, it is an active risk that can expose your organization to legal challenge, regulatory scrutiny, and real harm to candidates. Knowing where it enters the process and how to detect it is now a baseline competency for any TA leader evaluating AI recruitment tools.

Learn more

Where does algorithmic bias enter AI interview screening?

Bias does not appear at a single point, it can enter through multiple layers of the AI pipeline, from the data used to train models to the design of the interview itself.

Can biased training data corrupt an AI assessment model?

Yes, every AI assessment model learns from historical data, and if that data reflects past hiring decisions made by humans who favored certain demographics, educational backgrounds, or communication styles, the model will risk replicating those patterns. Candidates who don't match a historical profile that was itself the product of bias may be systematically disadvantaged before a single interview question is asked.

Signal to look for: Ask vendors how their training data was sourced and reviewed. Were answers assessed without knowledge of the demographics of the candidate? Were calibration sessions conducted to remove unreliable evaluations? If a vendor cannot answer these questions in detail, bias mitigation was not a design priority.

Does inconsistency in AI scoring create unfair hiring practices?

Inconsistency is, by definition, unfair, and research published in 2025 demonstrates that LLMs used for candidate assessment produce meaningfully different rankings across demographic groups for otherwise identical candidates (Seshadri et al., 2025), and diverge significantly from human expert judgment depending on contextual conditions (Varshney & Ganuthula, 2025) If two candidates give the same response and receive different scores depending on when or how the model was queried, the process is arbitrary, and arbitrary hiring decisions are unfair hiring practices. Also, inconsistency provides a missed opportunity for the business as a good candidate may have been unfairly scored by the LLM, and not passed on to the shortlist or subsequent steps in the recruitment process.

Signal to look for: Ask vendors whether their scoring models are deterministic, meaning the same input always produces the same output. If the vendor's response involves language about "probabilistic outputs" or "model variation," that inconsistency risk has not been addressed.

What is the problem with AI explanations that don't reflect actual scoring logic?

The problem is that a post-hoc explanation cannot be used to defend a hiring decision in an audit, and in LLM-based systems, the explanation shown to recruiters is often generated after the score has been assigned: a plausible-sounding rationale rather than the actual scoring logic. This means bias embedded in the scoring process can be obscured by a surface-level narrative that appears reasonable.

Signal to look for: Ask vendors whether the explanation shown to recruiters is mechanically tied to the scoring process, or generated separately after the fact. Vendors should be able to describe this distinction clearly.

Can variable interview design introduce bias even with a good scoring model?

Yes, if different candidates for the same role are asked materially different questions, the comparison is not valid regardless of how the scoring model performs. Structural unfairness in interview design will produce biased shortlists even from a well-calibrated algorithm. Also, the more dynamic the interview template is, the more the interview will become an unstructured interview. And research has clearly demonstrated how bad unstructured interviews are at predicting future performance at work (e.g. Sackett et al., 2022; Schmidt & Hunter, 1998). So, the more dynamic and variable interview design may risk creating both bias and lower quality.

Signal to look for: Verify that every candidate receives the same structured interview: same questions, same competency framework, same scoring criteria. Also confirm that interview content has been reviewed for questions that may systematically disadvantage candidates based on cultural background or communication style.

How should TA leaders evaluate AI recruitment tools for bias risk?

The following questions should be part of any vendor evaluation.

On training methodology: Were models trained against expert human evaluations, or optimized against historical hiring outcomes? What behavioral frameworks, for example, Behaviorally-Anchored Rating Scales, structured the evaluation process?

On scoring consistency: Are assessment models deterministic? Are models versioned and locked for the duration of a hiring cycle so every candidate is assessed under identical conditions?

On explainability: Is the explanation shown to recruiters faithful to the scoring logic, or generated separately or post-hoc? Can the vendor trace a specific score back to specific scoring criteria without narrative interpretation added after the fact?

On bias testing: Has the model been tested for adverse impact across protected groups? How frequently are bias audits conducted post-deployment? What precautions have been taken to mitigate model drift?

On human oversight: Does the system make automated rejection decisions, or does the final decision remain with the recruiter? Under the EU AI Act, AI in recruitment is classified as high-risk. High-risk systems require effective human oversight.

What does fair AI interview screening look like in practice?

Fair AI interview screening is automation designed around structured interviewing science, with bias mitigation built into every layer: training data, model architecture, scoring consistency, explainability, and post-deployment monitoring. The practical markers are consistent across vendors who take this seriously:

Every candidate receives the same structured, competency-based interview
Scoring is deterministic: same input, same output, full explainability
Explanations are faithful to scoring logic (not generated narratives)
Bias testing is repeated post-deployment
The final hiring decision stays with the recruiter

At Hubert, these principles are the design architecture. Hubert's assessment models are deterministic and proprietary, uses Behaviorally-Anchored Rating Scales and have been validated against 1,000,000+ expert human evaluations. The explanation a recruiter sees is the exact same logic that generated the score, not a narrative produced after the fact. The result is a shortlist recruiters can stand behind: faster, fairer, and legally defensible by design.

References

Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2022). Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. Journal of Applied Psychology, 107(11), 2040.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological bulletin, 124(2), 262.

‍

Insight

What causes bias in AI interviews and how to spot it

June 4, 2026

Fredrik Törn

Learn more