When to use LLMs in hiring and when not to

2026-07-03

Josephine Daly

I recently read a comment on LinkedIn that said LLMs are here to stay in hiring. I read it with what we in Sweden call skräck-blandad förtjusning: delight mixed with terror. Because the comment is completely right, and that is both the good news and the problem. So here is my long-form answer.

Learn more

Are LLMs here to stay in hiring?

Yes, in certain areas, LLMs are here to stay in hiring. Large language models are the most capable language technology ever built. They draft, summarize, translate, and converse at a level that would have seemed like science fiction five years ago. Recruitment is a language-heavy profession: job ads, outreach, interview questions, feedback, coordination. Of course this technology belongs in hiring.

The delight is real. However, the terror I mentioned before is about something more specific: not whether LLMs belong in hiring, but where. Because the industry is currently putting them everywhere, including the one place they should never be.

‍

What are LLMs genuinely good at in hiring?

LLMs excel at generating and understanding language, which covers a surprising amount of a recruiter's week. Used well, they are a genuine gift to the profession:

Drafting: job ads, outreach messages, screening questions, and feedback templates, produced in seconds and edited by a human.
Conversation: engaging candidates in a natural, warm dialogue, in their own language, at any hour. This is where LLM technology shines brightest; a good conversational layer makes an automated interview feel human rather than like a form.
Structuring: turning a messy job description into a clean set of competencies, or an unstructured conversation into organized notes.
Translation and accessibility: meeting candidates in 30+ languages without hiring 30 translators.

Notice what these tasks have in common. In every one of them, the LLM produces something a human reviews, or shapes an experience without deciding anyone's outcome. If the output is slightly different tomorrow, nothing bad happens. Variation is a feature; it is what makes the conversation feel alive.

Where do LLMs fail in hiring?

LLMs fail at judgment, and candidate scoring is judgment. An LLM generates output by predicting likely words, not by applying fixed criteria. That means the same interview answer can receive a different score every time it is evaluated.

This is not a hypothetical concern, you can even try it yourself. One engineer recently ran his own CV through a popular open-source LLM-based screening tool 100 times and got scores ranging from 66 to 99 out of 100. At a realistic cutoff, the identical candidate passed or failed depending on nothing but the run.

And there is a second failure hiding inside the first. When you ask an LLM why it gave a score, it generates a confident, plausible-sounding explanation after the fact. That explanation is a story, not the scoring logic. So an LLM-scored process is inconsistent and unauditable at the same time: you cannot predict the result, and you cannot verify the reasoning.

So what is the rule?

The rule is simple: use LLMs for language, never for judgment. Or put another way: an LLM can touch everything in your hiring process except a candidate's outcome.

Ask one question about any AI task in your funnel: if this output were slightly different tomorrow, would a candidate be treated unfairly? If the answer is no (a job ad drafted two ways, a conversation phrased differently), an LLM is a fine choice, probably the best choice. If the answer is yes (a score, a ranking, a pass/fail, a shortlist position), the task demands consistency, and consistency is precisely what probabilistic models cannot promise. Assessment needs deterministic models: same input, same output, full explainability, with every criterion and weighting fixed in advance.

This is not a fringe position; it is where regulation is heading. The EU AI Act classifies AI in recruitment as high-risk, requiring transparency, accuracy, and human oversight. A scoring system that cannot repeat its own results makes those obligations very hard to meet.

What does this look like in practice?

In practice, it looks like two AI layers doing two different jobs, which is exactly how we built Hubert. The conversational layer uses AI to give every candidate a warm, natural interview, in chat or voice, whenever suits them. The assessment layer uses deterministic proprietary models to score every response against fixed, job-relevant criteria, so identical answers always earn identical scores and every score comes with a faithful explanation. And the final decision always belongs to a human recruiter.

Candidates get the best of what LLM-era technology offers: a conversation that feels human, with a 9/10 average satisfaction score. Recruiters get the thing LLMs cannot provide: shortlists that are consistent, explainable, and legally defensible.

So was the LinkedIn comment right?

The comment was right, and the framing was wrong. LLMs are here to stay in hiring; the interesting question was never whether, but where. Hiring is one of the few decisions that shapes the entire direction of a person's life. The technology that drafts your job ad does not need to be accountable. The technology that decides whether a nurse, a driver, or a graduate gets their shot absolutely does.

So yes: skräck-blandad förtjusning. Delight at what this technology gives candidates and recruiters. Terror at watching it get pointed at the one task it cannot do. The teams that understand the difference will build hiring processes that are faster and fairer at the same time. The teams that do not will be explaining their scoring variance to a regulator, or worse, to a candidate who deserved better.

Insight

When to use LLMs in hiring and when not to

July 3, 2026

Josephine Daly

Learn more