Showing posts with label assessment. Show all posts
Showing posts with label assessment. Show all posts

Wednesday, 29 January 2014

Year in Review: Attraction and assessment

Last year as ever we covered research on how we get people into jobs, and how they perform in them. To kick us off, here's a few fascinating findings unearthed by our colleague Christian Jarrett from the Research Digest.

Firstly, it's hard to spot liars. In a study asking participants to watch videos of genuine and bogus accounts of previous jobs, their ability to tell one from the other was barely better than by guessing. But the headline was that many participants were hardened interviewers, yet their performance was no better than those who had never conducted an interview before. Interview experience may still help with validation: in a dynamic interview, techniques are available to probe and explore, which may provide more critical perspectives. But in terms of 'reading the signs', veterans don't do better.

How do we recruit high performers in a competitive field? Increasingly, it seems, organisations are going to greater lengths to stand out from the crowd - see this media account of the Cicada 3301 mystery if you want an extreme example to occupy your afternoon. And research suggests the basic concept is solid: holding everything constant, a less typical method of reaching out to your applicant base, such as a postcard rather than an email, may produce better results: in a recent experiment, providing Google with a response rate of 5% rather than 1%.

Before you get around to assessing your applicants, it’s important to ensure you get suitable people to apply in the first place. A big part of this is candidate quality, but recent research argues that quantity may be more important than we think, especially if we are worried about cheating. Mathematical models suggest that even if cheating is profligate, were you to test enough people – and so free to be more selective, taking the top 20% rather than top 50% - you could end up selecting higher-calibre candidates than if you stuck with a cheat-proof but low-volume process.

What are modern recruitment methods actually assessing? Industry best practice involves identifying criteria that matter to the job, and then trying to obtain a distinct measure of each. But a body of research suggests candidates who do well are often coasting on a meta-ability, the 'ability to identify criteria', meaning how well you can figure out what is expected of you in a situation. Research this year suggests that we may need to accept that this ability, ATIC, is useful for performing the job, just as it is in getting the job, by allowing you to discern the course of action that is likely to satisfy others or fulfil unspoken expectations of managers, customers or stakeholders.. This asks hard questions about how we should design selection processes: high-ATIC candidates can’t show their stuff when assessment criteria are transparent and obvious to all, so might ambiguous jobs be better assessed for using ambiguous processes? A provocative idea to chew on.

We assume extraverts sell more and that cognitive ability is always an asset in jobs. Yet both these taken-for-granted facts were held up for scrutiny this year. Evidence suggested that 'ambiverts' who sit between the extravert and introvert extremes tend to do better in sales roles: in the study in question, earning $151 revenue every hour vs. $115 for the highly extraverted. Meanwhile, a body of research argues that high cognitive ability can actually be a liability for certain types of work [such as?], but a critical review disputes this, claiming that all else being equal, "the smarter you are, the better you will perform on just about any complex task."

Not every candidate can be successful, so it's useful to know who feels hard done by; after all, these people are your customers, partners, or prospective applicants of the future. Research suggests that candidates are likely to believe they were given a fair shake if their personality resembles one of two constellations: Resilient types or 'going with the flow' Bohemians. Those of an Overcontrolling disposition are more liable to feel victimised by unwelcome results.

Sometimes candidates are genuinely victimised. Evidence suggests that candidates with a non-native accent are less likely to be hired, on the pretext that the candidate doesn't appear politically savvy - a nebulous judgment hard to prove or disprove. Employers should ensure that checks and balances are in place to avoid such systematic prejudice squeezing talented individuals from the system.

So what to do, hirers of the world? Be realistic: it may be harder to eliminate cheating than to soften its effects. And your processes may not be purely measuring what you want, but still capturing candidates with the capability to do their job. And, rather than relying on interviewer superpowers, use checks and balances and appropriate weighting to make sure a bogus interview doesn't blow you away. Don't abide by stereotypes: look harder at that quietly confident salesperson, or that impassioned presentation from that entrepreneur with an accent. Cognitive ability remains important for job performance. Ultimately, to catch the best and brightest, it could be down to you to be creative in your recruitment methods.

Thursday, 1 August 2013

Who feels treated unfairly after taking an assessment?


Applicant reactions are the feelings people have about taking a given set of assessments in order to secure employment. We know that assessment design matters: applicants are happiest when given scope to show their capability through relevant challenges that did not demand inappropriate information.

Applicant factors matter too - obviously passing or failing the assessment can colour their perception, as can their 'attributional style' - but up to now there has been no consistent effect of applicant personality. Recent research takes a different angle that suggests perceptions of fairness depend on the applicant's personality type .

Finnish researchers Laura Honkaniemi and team suspected the problem with previous applicant reaction research was that it focused on correlations with personality traits - individual qualities such as extroversion, neuroticism and so on. This presumes that personality has its effect in a fairly linear way - 'more extraverted people will be happier in assessments', rather than involving an interplay between different features. Honkaniemi's team, working with personality data from applicants to Finnish Fire and Rescue Services, used an analytic technique to find four different personality types within the group of 258 applicants, which I describe below.

These individuals had all completed a set of physical assessments and then a psychological regime including interview, cognitive tests, a group exercise and a personality assessment (the Finnish version of the PRF). The final research sample (40 participants opted out of this) detailed applicant reactions, specifically on the use of psychological assessment, by rating items like 'I don’t understand what the psychological tests have to do with the future job tasks', after the assessment but before the outcomes were known. These items related to three areas: face validity - did the assessments seem relevant to the job?, predictive validity - do I think they can predict job performance? - and fairness perceptions - do I think this is a fair way to do things? No effect was found for the first two variables, but fairness was influenced by personality type for both successful and unsuccessful candidates.

Two of the personality types saw the process as significantly fairer than another. The first, typified by high conscientiousness, low neuroticism, and above average agreeableness and extraversion, is commonly identified within this personality typing process and labelled the Resilient type. Another hasn't been previously reported, the researchers dubbing it the 'Bohemian' due to its combination of low extraversion and low conscientiousness. In contrast the Overcontrolled group gave significantly lower fairness ratings. This is another classic type involving high neuroticism and low extroversion and agreeableness. Previous research has suggested this type is more likely to infer malevolence behind ambiguous behaviour, so their negative perceptions are consistent with this. Conversely, the Resilient profile, as the label implies, carries with it a strong tendency to adjust to circumstances and move forward, so less concerned with picking apart perceived wrongs. The authors speculate that the new type of Bohemian may have a  'let all flowers bloom' approach, their impulsive, uncompetitive nature making judgment unlikely.

A few notes: firstly, these personality types are rarely 'pure' but reflect nuances of the larger sample. Here, the Undercontrolled had higher extroversion than the Resilients, the opposite of what is seen in other studies. Secondly, the personality types are more than the sum of their parts: all these effects were obtained while controlling for the effects of the 14 individual personality traits within the PRF.

Applicant reactions matter. They can influence test performance, sour opinions of the employer, and affect a new hire's self-perception. Understanding who may experience a process as more unfair might be useful to employers for offering targeted support and feedback that takes their likely reactions into account.

ResearchBlogging.orgLaura Honkaniemi, Taru Feldt, Riitta-Leena Metsäpelto, & Asko Tolvanen (2013). Personality Types and Applicant Reactions in Real-life Selection International Journal of Selection and Assessment, 21 (1), 32-45 DOI: 10.1111/ijsa.12015

Further reading:
Ployhart, R. E., & Harold, C. M. (2004). The Applicant Attribution-Reaction Theory (AART): An integrative theory of applicant attributional processing. International Journal of Selection and Assessment, 12, 84–98.

Monday, 8 April 2013

'Figuring out what they're after': a common thread between assessment performance and job performance?

A while back we shared a review of the Ability To Identify Criteria (ATIC), suggesting that difference in how people perform on a selection process like an interview is due partly how good they are at figuring out what the process wants to hear. The article suggested that this may not be entirely bad, as ATIC appears to have a role in job performance as well. Now the authors have published empirical work looking closer at this issue. Their data suggests that figuring out situational demands may have a very substantial hand in both selection and job performance, and may even be the major link between the two.

First author Anne Jansen and colleagues (principally from University of Zürich) recruited 124 participants into a simulated assessment process, pitched as a way to give them experience of job selection. Participants were incentivised to do well, with the top two candidates each day financially rewarded, and had to pay a small fee to enter the process. This encouraged motivated participation that was more in line with real selection experiences. Participants were informed of the job description ahead of time, and on assessment day, turned up in groups of 12 to undertake interviews, a cognitive test, presentations and group discussions, observed by multiple assessors (Occupational Psychology MSc students).

After each exercise, participants were asked to document their hunch of what dimensions it was trying to measure; this was compared to answers given by the assessors beforehand, with close matches leading to higher situational demand/ATIC scores. No such information was explicitly provided (otherwise ATIC becomes redundant) so participants had to rely on indirect cues, such as the job descriptions, reading between the lines of instructions, being sensitive to what assessors seemed to be attuned to. In addition, each participant gave authorisation for their real-work supervisors to be contacted online to give feedback on their real job performance; in total, 107 responded.

Overall assessment centre scores correlated with job performance, with a relationship of .21. Both AC scores and job performance also correlated with the ATIC scores for participants: someone who was savvy in figuring out what the AC asked of them did better in the AC, and also did better in the workplace. Jansen's team constructed a statistical model in which cognitive ability fed ATIC, which itself strongly contributed to performance on assessments and in the workplace. Once all of these factors were accounted for, assessment performance itself was no predictor of workplace performance. This suggests, at the least, that ATIC and the factors that sit behind it are a substantial underpinning of how assessments adequately predict workplace performance.

One way to look at this is the growing identification of 'just another factor': IQ, EI, resilience, practical intelligence - that researchers argue counts in the workplace. But actually, this line of research advocates a shift in perspective. It asks us to accept that performance doesn't just depend on the resources you bring to the job, but to your perception of what the job is. This interactionist perspective is less concerned with raw capability and more about orientation. And it raises new considerations: in jobs where orientation is clear-cut - four duties, get on with it - shouldn't we be minimising it in selection? Whereas at the other extreme, could applicants for jobs with high ambiguity be tasked with finding their own way through the application process?

ResearchBlogging.orgJansen A, Melchers KG, Lievens F, Kleinmann M, Brändli M, Fraefel L, & König CJ (2013). Situation assessment as an ignored factor in the behavioral consistency paradigm underlying the validity of personnel selection procedures. The Journal of applied psychology, 98 (2), 326-41 PMID: 23244223

Further reading: The original review is

ResearchBlogging.orgKleinmann, M., Ingold, P., Lievens, F., Jansen, A., Melchers, K., & Konig, C. (2011). A different look at why selection procedures work: The role of candidates' ability to identify criteria Organizational Psychology Review, 1 (2), 128-146 DOI: 10.1177/2041386610387000

Thursday, 31 January 2013

Do test cheats matter if you test enough people?


Over the past decade, the cheapness and convenience of online testing has seen its usage grow tremendously. Its critics raise the openings it makes for cheaters, who might take a test many times under different identities, conspire with past users to identify answers, or even employ a proxy candidate with superior levels of the desired trait. Its defenders point to countertactics, from data forensics to follow-up tests taken in person. But the statistical models employed by researchers Richard Landers and Paul Sackett suggest that in recruitment situations, the loss of validity due to online cheating can be recovered simply due to the greater numbers of applicants able to take the test.

Landers and Sackett point out that test administrators normally intend to select a certain volume of candidates through testing, such as the ten final interviewees. The accessibility factor of online testing could allow you to grow your candidate pool, say from 20 to 50. Considering these numbers, its possible to now select those that scored better than 80% of the other candidates, rather than merely those in the top half. And if some of your candidates cheat, oomphing their scores to the 82nd percentile when they only deserve the 62nd, that's still a better calibre than the 50-or-better you would have been prepared to accept from your smaller face-to-face pool.

Landers and Sackett moved from these first principles to modelling out some realistic large data sets containing a range of true ability scores. They considered sets where cheating gave a small (.5 SD improvement) or large (1 SD) bonus to your test score; against this was another factor, how much your natural ability influenced your likely to cheat, from no relationship, r=0, into increasingly strong negative relationships, from -.25 to -.75, modelling the idea that weaker performers are more likely to cheat. And finally, they varied the prevalence of cheating in increments from zero up to 100%.

The researchers ran simulations in each data set by picking a random subset - the 'candidate pool' - and selecting the half of the pool with better test scores. In the totally honest datasets, the mean genuine ability score of selected candidates was .24. but that value was lower for sets that contained cheaters, as some individuals passed without deserving it. Landers and Sackett then added more candidates into each pool, allowing pickier selection, and reran the process to see what true abilities were obtained. In many data sets the loss of validity due to cheating was easily compensated by growth of applicant pool. For instance, if cheating has only a modest effect and is only mildly related to test ability (r= -.25) then doubling the applicant pool yields you genuine scores of .24 even when 70% of candidates are cheating, and higher scores when the cheaters are fewer in number, such as .31 for 30% cheaters.


Great...but wait. there are two important take-aways relating to fairness. It's true that if we're getting .31 averages instead of .24, our selected candidates should be more job-capable, even some of those who did cheat, and that's a win for whoever's hiring. But in the process we've rejected people who by rights deserved to go through. Essentially, this is a form of test error, and so not a uniquely terrible problem, but it's one we shouldn't become complacent about just because the numbers are in the organisation's favour.

Secondly, and as anyone trained in psychometric use will be aware, increasing selection ratios from top 50% to top 25% is no casual prerequisite. Best practice is that without evidence, such as an inhouse validity study, cut-offs on a single test should be capped at the 40th percentile, meaning you pass 60% of candidates. In particular, raising thresholds can have adverse impact on minority groups, on whom many tests still show differentials (although these are closing over time). As minorities tend to make up a minority of any given applicant pool, such differentials can easily squeeze the diversity out of the process before you even get a chance to sit down with candidates and see what they have to offer in a rounded fashion.

Nevertheless, this paper brings a fresh angle to the issue of test security.


ResearchBlogging.orgLanders, R., & Sackett, P. (2012). Offsetting Performance Losses Due to Cheating in Unproctored Internet-based Testing by Increasing the Applicant Pool International Journal of Selection and Assessment, 20 (2), 220-228 DOI: 10.1111/j.1468-2389.2012.00594.x

Further reading:

Tippins, N. T. (2009). Internet alternatives to traditional proctored testing: Where are we now? Industrial and Organizational Psychology, 2, 2–10.

Friday, 16 November 2012

Can you be coached to better outcomes on a situational judgment test?

The Situational judgment test (SJT), which asks respondents to choose their preferred course of action in a workplace scenario, has become a popular way of assessing fit to attributes of a job or organisational culture. It's used by governments, military, polices forces, and for educational selection such as certification of GPs (medical General Practitioners). Like other popular techniques, it has spawned an industry that promises to help people pass them. Can coaching enhance performance on such a test?

Filip Lievens and his team examined this in a real-world context - laboratory studies can lack the motivation to learn that drives coaching's benefits - in the form of August admissions to a Belgian medical school, where candidates take a battery of assessments including an SJT. A challenge is that candidates who seek coaching may differ from their counterparts in ways that could influence their eventual performance, independent of the effect of the coaching itself. Lievens' team addressed this through two routes. Firstly, they used a form of matching called propensity scoring, by which every coached candidate is matched against an uncoached one through deriving scores based on a range of individual factors, including demographic background, career aspirations, previous academic performance, and their tendency to prepare through other means, such as practice tests. Secondly, the team only included candidates who had previously failed the assessments in July, and had not engaged in any coaching prior to July. This meant that the July SJT performance could act as a pre-test measure of how candidates did before coaching was introduced. From a larger sample, Lieven's team ended up with 356 matched candidates that fit the stringent criteria.

Merely examining the August performance, it appeared that coaching did have an effect: matched candidates scored an average of 1.5 points higher, with an effect size of around .3. But by comparing the difference scores of how much candidates improved between July and August, the team found that coached candidates improved by 2.5 points more than uncoached, for an effect size of around .5. This is because the candidates who decided to receive coaching on average had been weaker performers the first time around - possibly one reason they invested in assistance. This effect size is fairly large - a boost of half a standard deviation - especially compared to those for coaching in cognitive tests, which fall between .1-.15.

SJTs are popular with candidates, being intuitive and overtly job-relevant. Employers are also fans: SJTs are strongly predictive of relevant job performance, with incremental value over and above that supplied by ability tests, and have less adverse impact, with demographic groups typically showing small average differences in performance. But this evidence suggests that their results can be influenced by coaching. Does the coaching result in an increase in the underlying ability? It may do, but programs tend to focus on 'teaching to the test' rather than broader ability, meaning results may be distorted. The researchers suggest this needs to be investigated, and that test developers explore different scoring techniques and broaden the attributes assessed by SJTs to make them difficult to exploit.

ResearchBlogging.orgLievens, F., Buyse, T., Sackett, P., & Connelly, B. (2012). The Effects of Coaching on Situational Judgment Tests in High-stakes Selection International Journal of Selection and Assessment, 20 (3), 272-282 DOI: 10.1111/j.1468-2389.2012.00599.x

Friday, 13 January 2012

2012 resolution: make better selection decisions


A simple resolution, but how to go about it?


1. Review practices to align with your organisation's unique context. As a whole, companies using 'best practice' approaches such as ability tests, structured interviews and monitoring recruitment sources do no better on aggregate than those who don't use these methods. This tells us that it isn't about slavishly following a right formula, but evaluating what's been proven to work elsewhere with your understanding of the local context of your organisation. So consider the below recommendations in this light.

2. Consider introducing well-designed, low effort assessments. There is research to suggest that automated assessments such as tests of knowledge or situational judgement, when well-designed, can do the job virtually as well as more intensive face-to-face assessment. Again, this will depend on your organisation and industry, but it may bear exploring for you.

3. Develop a policy on checking out job applicants online. Recruiters can find it tempting to google applicants or peruse them on social networking sites, getting free, quickly accessible, and otherwise hidden information about them. But there are questions about its fairness, risk of generating feelings of invasiveness, and possibility that it leads to decisions being made that aren't defensible. It's probably already going on in your organisation, so establish some ground rules for how you approach it.

4. Provide focused training to people who play roles in assessment simulations. In particular, evidence suggests focused training helps role players to introduce pre-determined prompts to nudge candidates into showing (or failing to show) critical behaviours; it appears that this may lead to more accurate ratings in some areas.

5. Be realistic about what you are actually measuring. Interview overall scores are strongly influenced by the picture gained from the early minutes where rapport is built. Happily, it seems that this isn't simply bias, but reflects some good information picked up - for instance, verbal ability, and some personality factors. Why not recognise this, perhaps by assigning quick ratings after that initial period.

Meanwhile, and more alarmingly, some researchers suggest that assessments scores of all kinds are heavily influenced by a personal attribute called 'ability to identify criteria'. Again, ATIC does seem to be a good predictor of workplace success in itself, but in both these examples the point is the risks when we assume we are measuring one thing - e.g., the competency "Leading for Success" when in fact we are measuring another.

And finally....

When I decided to exit research and enter bleary-eyed into the Real World(TM) I was concerned that having a PhD might be a disadvantage. Things turned out ok to me, but it turns out my feelings are well-founded: recruiters do see overqualification as a potential reason not to employ someone. Yet there are a host of reasons why overqualified applicants may be a great add to your organisation. So reconsider how you approach overqualified candidates.

Monday, 21 November 2011

Provoking behaviour: training roleplayers at assessment centres

Assessment days for evaluating work-relevant behaviours of applicants or job incumbents often draw on actors to perform as difficult team-members or curious clients in meeting simulations. A recent study has shown that these role-playing actors can be trained to effectively weave pre-written dialogue prompts into the improvised simulations. However, whether this helps measurement of participant behaviours is less clear.

The study authors Eveline Schollaert and Filip Lievens gave 19 role-players training, which in one condition included explicit guidance on using behaviour-eliciting prompts during assessment exercises; for example, "Mention that you feel bad about it" in order to provoke behaviours relating to a dimension of interpersonal sensitivity. Such prompts are often provided in prep material, but actual usage was unknown. The authors wondered whether role-players could realistically increase their prompt usage through training, or whether this is too much to ask an actor in the thick of a dynamic interaction.

At a subsequent assessment centre, the role-players interacted in simulations with 233 students from Ghent University. Role-players with prompt training were able to incorporate four to five times more prompts than those without such training, an increase from about two prompts per exercise to 10-12.

More prompts ought to elicit more relevant behaviours, so the authors expected observers to get a better picture of true 'candidate' performance. But this isn't clear. In the high-prompt condition, pairs of raters watching the same role-play didn't agree any more on their ratings, suggesting the behaviours remained just as obscured as without prompts. That said, there was better correspondence of some of the ratings to other measurements you would expect to be related - for instance, interpersonal sensitivity correlated better with an Agreeableness personality score acquired pre-centre. But half of the predicted increases in correlation weren't observed.

Regarding their unsupported hypotheses, the authors wonder whether the rating assessors should also have been trained on prompt use to encourage sensitivity to candidate reactions. I have additional concerns on the nature of the assessors -minimally trained masters students - used to draw conclusions about a professionalised domain. Nonetheless, this rare examination of role-player impact on face to face assessments suggests training can generate more dimension-focused contributions, which in turn may result in measurements with more predictive power.

ResearchBlogging.orgSchollaert, E., & Lievens, F. (2011). The Use of Role-Player Prompts in Assessment Center Exercises International Journal of Selection and Assessment, 19 (2), 190-197 DOI: 10.1111/j.1468-2389.2011.00546.x

Friday, 16 September 2011

Can we get away with using lo-fi assessment to recruit advanced positions?

In recruitment, the promise of comparable results for less effort is understandably tempting. It's offered by the offsetting of costly assessments with alternative measures that use pencils, screens and standardised questions instead of expert assessors. However, as some sources suggest a bad hire can cost twice or more that position's annual salary, the stakes are high. A new study kicks some assessment tyres to see whether that bargain is actually a banger.

Researchers Filip Lievens and Fiona Patterson looked at recruitment into advanced roles which typically seek the skills and knowledge to hit the ground running. They took their sample of 196 successful candidates from the UK selection process for General Practitioners in medicine (GPs). To get here, you've completed two years of basic training and up to six years of prior education, by which stage you're after someone ready to go, not a future 'bright star'. Lievens and Patterson were specifically interested in how much assessment fidelity matters, meaning the extent to which assessment task and context mirror that in the actual job.

Three types of assessment were involved, all designed by experienced doctors with assistance from assessment psychologists. Written tests assessed declarative knowledge through diagnostic dilemmas such as “a 75-year-old man, who is a heavy smoker, with a blood pressure of 170/105, complains of floaters in the left eye”. Assessment centre (AC) simulations meanwhile probe skills and behaviours in an open-ended, live situation such as emulating a patient consultation; these tend to be more powerful predictors of job performance, but are costly.

The third was the situational judgement test (SJT), a pencil and paper assessment where candidates select actions in response to situations, such as a senior colleague making a non-ideal prescription. SJTs are considered by many to be “low-fidelity simulations”, losing their open-endedness and embodied qualities, but hanging on to the what-would-you-do-if? focus. The authors were interested in whether its predictive power would be in the same class as the AC simulations, or mirror the more modest validity of its pencil and paper counterpart.

The data showed that all assessments were useful predictors of job performance, as measured by supervisors after a year spent in role. Both types of simulation - AC and SJT - provided additional insight over and above that given by the rather disembodied knowledge test – each explaining about a further 6% of the variance. But in comparison with each other, the simulations were difficult to tell apart, with no significant difference in how well they predicted performance.

It should be noted that the AC simulations did capture some variance over and above the SJT, notably relating to non-cognitive aspects of job performance, such as empathy, which is important as such areas are less trainable than clinical expertise. However, this extra insight was fairly modest, just a few percentage points of variance. More expensive AC assessments can provide additional value, but the study suggests that at least in this specific recruitment domain, you can get away with a loss of fidelity if the assessments are appropriately designed.

ResearchBlogging.orgLievens, F., & Patterson, F. (2011). The validity and incremental validity of knowledge tests, low-fidelity simulations, and high-fidelity simulations for predicting job performance in advanced-level high-stakes selection. Journal of Applied Psychology, 96 (5), 927-940 DOI: 10.1037/a0023496

Monday, 5 September 2011

How much should we trust job applicant ratings of their own emotional intelligence?

Self-rating is a popular way to measure emotional intelligence in the workplace. Under lab conditions it's been shown that these ratings vary depending on what your (imaginary) objective is: to give a 'true' picture or to successfully win a job. A new study translates this lab finding to the workplace, finding that applicants for jobs really do rate themselves higher on EI than counterparts already working in that organisation.

The study compared scores for 109 job applicants with 239 volunteers, matched by department and managerial level. They rated themselves on four classic components of EI: self emotion appraisal, others emotion appraisal, use of emotion, and regulation of emotion. Applicants significantly outscored incumbents in all areas, on average rating themselves more than a standard deviation better. The areas of greatest divergence were in use of emotions and regulation of emotions, which have much in common with the Big Five personality traits conscientiousness and emotional stability, which we know job applicants have a higher tendency to inflate.

On all but one of the components, applicant scores were significantly more bunched together than incumbent scores, which could be seen as additional support that they were manufactured, with candidates homing in on scores that were solidly good, avoiding suspicious high or unhelpful low scores.

The study is important because in other areas of research, score discrepancies can be found in the lab, due to different explicit instructions, that don't seem to surface in the real world, suggesting the overt nature of lab conditions can exaggerate or even manufacture differences. Yet here the effect is found again, suggesting that if we do want to rely on self-report to assess EI we should recognise that this inflation may take place, and that relying on the normative data that accompanies these tests may lead us to unrealistically high appraisals of candidates.


ResearchBlogging.orgLievens, F., Klehe, U., & Libbrecht, N. (2011). Applicant Versus Employee Scores on Self-Report Emotional Intelligence Measures Journal of Personnel Psychology, 10 (2), 89-95 DOI: 10.1027/1866-5888/a000036

Monday, 29 August 2011

Are job selection methods actually measuring 'ability to identify criteria'?



While we know that modern selection procedures such as ability tests and structured interviews are successful in predicting job performance, it's much less clear how they pull off those predictions. The occupational psychology process – and thus our belief system of how things work - is essentially a) identify what the job needs b) distil this to measurable dimensions c) assess performance on your dimensions. But a recent review article by Martin Kleinman and colleagues suggests that in some cases, we may largely be assessing something else: the “ability to identify criteria”.

The review unpacks a field of research that recognises that people aren't passive when being assessed. Candidates try to squirrel out what they are being asked to do, or even who they are being asked to be, and funnel their energies towards that. When the situation is ambiguous, a so-called “weak” situation, those better at squirrelling – those with high “ability to identify criteria” (ATIC) - will put on the right performance, and those that are worse will put on Peer Gynt for the panto crowd.

Some people are better at guessing what an assessment is measuring than others, so in itself ATIC is a real phenomenon. And the research shows that higher ATIC scores are associated with higher overall assessment performance, and better scores specifically on the dimensions they correctly guess. ATIC clearly has a 'figuring-out' element, so we might suspect its effects are an artefact of it being strongly associated with cognitive ability, itself associated with better performance in many types of assessment. But if anything the evidence works the other way. ATIC has an effect over and above cognitive ability, and it seems possible that cognitive ability buffs assessment scores mainly due to its contribution to the ATIC effect.

In a recent study, ATIC, assessment performance, and candidate job performance were examined within a single selection scenario. Remarkably it found that job performance correlated better with ATIC than it did with the assessment scores themselves. In fact, the relationship between assessment scores and job performance became insignificant after controlling for ATIC. This offers the provocative possibility that the main reason assessments are useful is as a window into ATIC, which the authors consider “the cognitive component of social competence in selection situations”. After all, many modern jobs, particularly managerial ones, depend upon figuring out what a social situation demands of you.

So what to make of this, especially if you are an assessment practitioner? We must be realistic about what we are really assessing, which in no small part is 'figuring out the rules of the game'. If you're unhappy about that, there's a simple way to wipe out the ATIC effect: making the assessed dimensions transparent, turning the weak situation into a strong, unambiguous one. Losing the contamination of ATIC leads to more accurate measures of the individual dimensions you decided were important. But overall your prediction of job performance measures will be weaker, because you've lost the ATIC factor which does genuinely seem to matter. And while no-one is suggesting that it is all that matters in the job, it may be the aspect of work that assessments are best positioned to pick up.

ResearchBlogging.orgKleinmann, M., Ingold, P., Lievens, F., Jansen, A., Melchers, K., & Konig, C. (2011). A different look at why selection procedures work: The role of candidates' ability to identify criteria Organizational Psychology Review, 1 (2), 128-146 DOI: 10.1177/2041386610387000