News & Press: Monthly Newsletter Articles

Hiring Algorithms Raise Questions of Validity and Bias

Tuesday, September 10, 2019  
Share |

Rebecca Koenig | usnews.com

July 3, 2019

 

Getty Images

INSCRUTABLE ONLINE resume portals.

Trick interview questions with no right answers.

Recruiters who ghost applicants after months of intense communication.

Plenty of hiring practices are as unscientific as they are frustrating.

So it's easy to understand why both employers and employees are eager to try new methods for selecting job candidates. To that end, technology companies are developing hiring algorithms that purport to harness the power of big data and artificial intelligence to perfectly match talented workers with open positions.

These tools come in many forms. They interpret the results of games intended to predict job performance. They assess recorded video interviews, scanning faces and voices for cues allegedly related to personality traits. They scrape social media posts for clues about culture fit and latent skills.

There's only one problem: It's not clear they work.

A good hiring test meets three criteria, says Fred Oswald, an industrial-organizational psychologist with expertise in personnel selection and analytics and a professor in the department of psychological sciences at Rice University. It's reliable, which means a job applicant who takes the test multiple times receives similar scores. It's valid, in that it predicts employee criteria relevant to job performance. And it treats job applicants fairly no matter their gender, race and age, among other characteristics protected by federal law.

So far, technology vendors have offered researchers little solid evidence that their hiring algorithms satisfy these criteria, Oswald says, yet that hasn't stopped employers from adopting these tools. That means workers may encounter high-tech hiring systems not much savvier than random selection.

In the following interview, Oswald assesses the science of hiring algorithms and proposes questions employers and workers should ask about their value. This interview has been condensed and edited for clarity.

Let's start with some definitions. What are hiring algorithms? What makes them artificially intelligent?

Big data and artificial intelligence can be pretty broad terms when you look at the vendor market. Even conceptually, researchers might think of it broadly. There's the method of data collection, there's the data themselves, and then there are the algorithms. I would put all of that into the AI space.

Adaptive testing is part of AI, whether it's a game or face recognition to try to look at emotion. The responses you give inform what the next questions are going to be.

Another area is the algorithms applied to the data. A game or face recognition technology might collect vast amounts of data, far more than any traditional test of personality or job knowledge. There's some promise that the rich source of data will also provide some incremental insights.

Can you provide an example of how these tools work – or fail to work?

I'll give an extreme example. You're applying in health care and you've been trained as a nurse. You're asked to play a game as a measure of teamwork. But you also have on your resume information about your internship and experiences that have involved teamwork, and you also have technical knowledge through your education.

How do you weight all this information to predict outcomes?

If you have an internship that lasted a year versus a game that lasted 10 minutes, and I'm trying to assess teamwork, there might be some trade-offs. I would have to think carefully about how much to weight that game information when it comes to making hiring decisions.

Maybe it's not enough to do well at the game and not enough to say you had the internship. Maybe you have to have both. Maybe these algorithms would help figure that fact out.

That's where validity comes in. You have to have data external to the test as a litmus test about whether the test works or not. Sometimes we get hung up on internal scores, like, "I got a high score on this game, I must be a good employee." But that doesn't guarantee anything except that you're good at this game at this one point in time.

What's the appeal hiring algorithms have for employers?

There's the idea that these algorithms are sophisticated. We've seen AI beat humans soundly at chess and Jeopardy. We've been influenced by the "Moneyball" phenomenon. There's the idea that predictive models can offer promise.

Even in traditional testing arenas, there's always been the compelling case of a test that is going to be easier to administer and give you accurate decisions.

Probably the most important, at least from the organizational perspective, is the ability to make some quick decisions about talent. In traditional testing, you might have to spend time to score the measures or send them out somewhere to get scored. If you can automate that to some extent, that would be a benefit to organizations. The idea that I can get talent quickly and I can use this sophisticated technology to get that talent is compelling.

If the traditional model has high validity but the job applicant predicted to have high performance is gone by the time you reach them because your system is so slow, maybe you'd be OK with lower validity that leads to faster decisions.

AI is changing the job of human resources as much as it's being used to select employees. Maybe a company doesn't have the time to engage an HR staff in their organization. So they turn to these automated systems that, yes, may not be perfect, but the alternative is worse.

Do hiring algorithms have benefits for workers?

Tests are those things in school that you avoided, and no one likes tests when they apply for a job. When you avoided tests in school, you might have snuck home to play games. Everyone loves games.

So maybe the idea is there's some applicant engagement. Maybe it's less painful to provide data to the hiring organization, or you think of the organization in a positive light because you were in this engaging experience.

Is there evidence that these tools work?

Having done a reasonable amount of examining what these AI talent assessment tools are offering across companies, I find very little information about reliability and validity. I find more about fairness. There are claims, at least, about the fairness of the tools. But I do find very little data about whether these tools predict outcomes.

If I were an organization or even if I were a job applicant, I might ask whether these tools are predictive of organizational outcomes.

Psychologists are becoming more involved, but there needs to be still even more research and science informing these technologies. It doesn't have to be research done at labs in universities. It can involve data from companies.

This leads to proprietary issues, though. Companies may have data on validity with AI technologies, and it's possible they're not sharing it. They might predict outcomes, and companies are holding this as proprietary.

Are companies' claims about fairness convincing?

We know there is algorithmic bias that reflects some patterns in the world we don't want to repeat in our hiring practices. At least we want to evaluate and understand why those patterns exist and figure out what to do about it.

Traditional tests are often standardized. Everyone gets the same questions. When we depart from that, we worry about potential fairness issues. Basically these games and the internet and this audio/visual recording are unstandardized tests.

What if somebody has done a lot of great things but didn't tell you about them on the internet? If you're going to scrape the internet, you're not going to find this out.

What if you're visually impaired and you're being asked to play a game, and visual acuity is not relevant to the job? Those are fairness issues related to reliability issues.

According to a research paper by law firm Littler Mendelson, with some new hiring tools, "the algorithm itself is measuring and tracking behavior that has no direct relationship to job performance," such as which websites job candidates visit in their free time or what they post on social media. Even if these behaviors really correlate with who will be a good worker, we may not understand why that's the case. Does it matter whether we have theories about why these tools might work?

Just because you find a relationship doesn't mean that relationship tells you what you think it means or is necessarily informative moving forward.

If the website was for golf clubs and the reason it was predictive was because there's a good old boys club that gets together, and that drives performance – maybe it's a network where they're supporting each other to others' detriment – it's not the kind of structure you want in your organization.

But if the website was about technical knowledge and these applicants were losing sleep getting on the web and learning more about medical technology, and they're in health care jobs, that's really something you want to go with.

"Theory" sounds so impressive and abstract, but sometimes it boils down to straightforward thinking about defining the problem carefully. What are we trying to measure? That's a reliability issue. What are we trying to predict? That's a validity issue. What is job performance? What is employee satisfaction? What is teamwork?

They're hard questions. In some ways it's harder than these complex algorithms for companies to really think about what they want to predict.

In this strong economy, workers have many job options. To attract employees, should companies focus on pay and benefits instead of complicated hiring algorithms?

Selection is part of a larger system. Other factors clearly matter. You need good training, good managers, good recruiting in the first place to get the applicants you want applying for your job.

What is going to bring the right applicants to your door? Pay incentives and an organizational culture that is positive. That can mean a variety of things. Some applicants are going to want autonomy or variety in their job, or they're going to want some social component, a teamwork culture.

Hiring organizations might be looking to AI saying, "We've had all these problems; it must be a selection problem. Let's use AI to solve our problems."

There are a variety of solutions to a problem. You can select a different person; you can motivate a person; you can ask somebody to leave. When a company is calling me because I'm a selection person, and they say, "We have all these problems in our organization," one of my first questions is, "Well, is it really a selection problem?"

We've talked about the pitfalls of AI hiring tools. Do you think they have any promise?

I do have excitement about AI. I am not a total curmudgeon here. There are some exciting possibilities for what can be done in the talent management setting, whether you're doing selection or improving performance.

These technologies always change. We're not passive recipients of these systems; we actively create them.

What developments would give you more hope about the future of hiring algorithms?

We need to think seriously about what is being measured, not merely that it's an immersive and interesting experience. We've spent a century developing ability and knowledge tests as well as personality tests. We should expect to spend a serious amount of time in developing and refining whatever games are trying to measure as well, whether that is ability or personality or virtual reality games that mirror work-related situations.

Even in our research community in organizational psychology, we are still getting involved in big data because we're behind the curve in terms of what is being developed in applied statistics and computer science. If we keep working hard, we can benefit from the experience that's been gained in those areas. But in turn, I think applied statisticians and computer scientists need to work with organizational psychologists in terms of getting good data, not just big data.

How do we structure our games and our tools to make sure we're getting the reliability, validity and fairness that we're all hoping for and we're all presumably investing in when we're paying for these technologies? More research is needed, as professors like to say.


FCFP Sponsors