Saturday, June 26, 2010

June '10 IOP Part II: "Test bias"

In Part I of this two-part post, I described the first focal article in the June 2010 issue of Industrial and Organizational Psychology (IOP), devoted to emotional intelligence. In this post, I will describe the second focal article, which focuses on test bias.

The focal article is written by Meade & Tonidandel and has at its core one primary argument. In their words:

"...the most commonly used technique for evaluating whether a test is biased, the regression-based approach outlined by Cleary (1968), is itself flawed and fails to tell us what we really want to know."

The Cleary method of detecting test bias is conducted by regressing a criterion (e.g., job performance) on test scores along with a dichotomous demographic grouping variable, such as sex or race, and looking at the interaction. If there is a significant difference by group variable (e.g., different slope, different intercepts), this suggests the test may be "biased". This situation is called differential prediction or predictive bias. The authors contrast this with "internal methods" of examining tests for bias, such as differential item and test functioning and confirmatory factor analysis which do not rely upon any external criterion data.

Meade & Tonidandel state that while the Cleary method has become the de facto method for evaluating bias, there are several significant flaws with the approach. Most critically, the presence of differential prediction does not necessarily mean the test itself is the cause (which we would call measurement bias). Other potential causes are:

- bias in the criterion
- reliability of the test
- omitted variables (i.e., important predictors are left out of the regression)

Another important limitation of the Cleary method is the susceptibility of slope difference tests to low power--this can result in findings of no slope differences due to small samples rather than a true absence. In addition, because of the type I error rate of the intercept test, one is likely to conclude that intercept differences are present when none truly exist.

Because of these limitations, the authors recommend the following general steps when investigating a test:

1. Conduct internal analyses examining differential functioning of items and the test.
2. Examine both test and criterion scores for significant group mean differences before conducting regression analyses.
3. Compute d effect size estimates for group mean differences for both the test and the criterion.

The authors present several scenarios where tests may be "biased" as conceived in the traditional sense but may or may not be "fair"--an important distinction. For example, one group may have higher performance scores, but there is no group difference in test scores. Use of the predictor may result in one group being hired at greater or lesser rates than they "should be", but our judgment of the fairness of this requires consideration of organizational and societal goals (e.g., affirmative action, maximum efficiency) rather than simply an analysis of the tests.

The authors close with several overall recommendations:
1) Stop using the term test bias (too many interpretations, confounds different concepts).
2) Always examine both measurement bias and differential prediction.
3) Do not assume a test is unusable if differential prediction is indicated.

There are several commentary articles that follow the focal one. The authors of these pieces make several points, ranging from criticisms of techniques and specific statements to suggestions for further analyzing the issue. Some of the better comments/questions include:

- How likely is it that the average practitioner is aware of these issues and further is able to conduct these analyses? (a point the authors agree with in part)

- This approaches advocated mainly work with multi-item tests; things get more complicated when we're using several tests or an overall rating.

- It may not be helpful to create a "recipe" of recommendations (as listed above); rather we should acknowledge that each selection scenario is a different context.

- We are still ignoring the (probably more important) issue of why a test may or may be "biased". An important consideration is the nature of group membership in demographic categories (including multiple group memberships).

Meade & Tonidandel provide a response to the commentaries and acknowledge several valid points raised but end this article with the same proposition that they started the focal article with:

"For 30 years, the Cleary model has served as the dominant guidelines for analyses of test bias in our field. We believe that these should be revisited."

Saturday, June 19, 2010

June '10 IOP Part I: Emotional Intelligence

The July 2010 issue of Industrial and Organizational Psychology has two very good focal articles with, as always, several thought-provoking commentaries following each. This post will review the first article and I'll cover the next in the following post.

The first focal article, by Cary Cherniss, attempts to provide some clarity to the debate over emotional intelligence (EI). In it, Cherniss makes several key arguments, including:

- While there is disagreement over the best way to measure EI, there is considerable agreement over its definition. Cherniss adopts Mayer, et al.'s definition of "the ability to perceive and express emotions, assimilate emotion in thought, understand and reason with emotion, and regulate emotion in the self and others."

- EI can be distinguished from emotional and social competence (ESC), which explicitly links to superior performance.

- Among the major extant models, the Mayer-Salovey-Caruso model (MSCEIT) represents EI, while the others (Bar-On; Boyatzis & Goleman; Petrides, et al.) primarily consist of ESC aspects. Importantly, this does not make the Mayer, et al. model "superior", just more easily classified as an ability measure.

- There are significant problems with all of the major measures of EI/ESC, such as convergent/discriminant validity and inflation (depending on the measure). According to Cherniss, "it is difficult at this point to reach any firm conclusions--pro or con--about the quality of the most popular tests of EI and ESC."

- Emerging measures, such as those involving multiple ratings (e.g., Genos EI), appear promising, although they may be more complex and expensive than performance tests or self-report inventories.

- Other, even more creative, measures hold even more promise. This includes video and simulation tests. This point was echoed in several commentaries. I've posted a lot about the promise of computer-based simulation testing, and EI may be one of the areas where this type of measurement holds particular promise.

- Context is key. I think this is one of Cherniss' most important points: the importance of EI likely depends greatly on the situation--i.e., the job and the "emotional labor" required to perform successfully. This point was also echoed in several commentaries. EI may be particularly important for jobs that require a lot of social interaction and influence, team performance, and in jobs that involve a high level of stress.

- Several studies have shown modest correlations between EI/ESC measures and job performance, but there are issues with many of them (e.g., student samples, questionable criteria). Newer meta-analyses (e.g., Joseph & Newman, 2010) suggest "mixed-model EI" measures may hold more promise in terms of adding incremental validity.

The commentaries provide input on a whole host of points which would be difficult to summarize here (I will say I enjoyed Kaplan et al.'s the most). Needless to say there is still an enormous amount of disagreement regarding how EI is conceptualized, measured, and its overall importance. Then again, as Cherniss and others point out, EI as a concept is in its infancy and this type of debate is both healthy and expected.

Perhaps most importantly, users of tests should exhibit particular caution when choosing to use a purported measure of EI as the scientific community has not reached anything close to a consensus on the appropriate measurement method. Of course pinpointing the exact moment when that occurs will be--as with all measures--a challenge.

Wednesday, June 09, 2010

What we can learn from the baseball draft

Longtime Major League Baseball commentator Peter Gammons was recently interviewed on National Public Radio. What does baseball have to do with recruitment and selection? Quite a bit, actually, but in this case the conversation was even more relevant because he was discussing the accuracy of the baseball draft.

I think you will agree with me after reading/listening to his observations that the draft has several lessons for all kinds of employers:

- More information is better. As scouts have been able to gather and crunch more data about prospects, the accuracy of predicting how a pick will fare in the big leagues has increased. We know from assessment research that more measures are better (up to a point) in predicting performance.

- The type of information matters. Scouts used to focus on relatively narrow measures such as running speed. Today they consider a whole host of factors. Similarly, modern assessment professionals consider a wide range of measures appropriate to the job.

- Personality matters. While lots of data about skills is important in prediction, personality/psychological factors also play a big role in determining success. Personality has also been one of the hottest topics in employment testing over the last 20-30 years.

- It's rare for a single candidate to shine head and shoulders over the rest. Making a final selection is usually a challenge.

- Assessment is imperfect. Even with all the information in the world, a host of other factors influence whether someone will be successful, including which team the person is on, their role, and how they interact with other teammates. This also means "low scoring" candidates can--and do--turn into superstars (Albert Pujols of the St. Louis Cardinals was a 13th-round draft pick).

- Ability to learn is important. We know from assessment research that cognitive ability shows the highest correlation with subsequent job performance (esp. for complex jobs), and many have suggested that it is the learning ability component of cognitive ability that matters most.

- Good recruitment and assessment requires resources. Organizations that take talent management seriously are willing to put their money where their mouth is and devote resources to sourcers, recruiters, and thorough assessment procedures.

By the way, 2009's first overall draft pick, Stephen Strasburg, had an excellent debut last night for the Washington Nationals, striking out 14.

Wednesday, June 02, 2010

May '10 J.A.P.

Summer journal madness continues with the May issue of the Journal of Applied Psychology. It's a diverse issue, check it out:

- Taras et al. conducted a very large meta-analysis of the association between Hofstede's cultural value dimensions (e.g., power distance, masculinity, individualism) and a wide variety of individual outcomes. One interesting finding is the stronger relationship between these values and emotions (organizational commitment, OCBs, etc.) compared to job performance.

- Are high performers more likely to stay or leave? In a study of over 12,000 employees in the insurance industry over a 3-year period, Nyberg found the answer was: it depends. Specifically, it depends on the labor market and pay growth.

- Think g (cognitive ability) is just related to job performance? In a (albeit small) study by Judge, et al., it turns out it was also related to physical and economic well-being. Maybe their next study will address my personal hypothesis: g is related to choice of car.

- A study by Lievens, et al. (in press version here) found with a sample of 192 incumbents from 64 occupations that 25% of the variance in competency ratings (like you might find in a job analysis) was due to the nature of the rater's job, such as level of complexity. Not surprisingly, the greatest consensus was reached for jobs that involved a lot of equipment or contact with the public.

- Self-efficacy (i.e.., confidence) has been proposed as an important predictor of job performance. In a study by Schmidt & DeShon, the authors found that this relationship depends on the ambiguity present in the situation--in situations high in ambiguity, self-efficacy was negatively related to job performance; in situations low in ambiguity, the opposite was true.

- Finally, for anyone citing Ilies, et al.'s 2009 study of the relationship between personality and OCB, there have been a couple corrections.