A guide to the validity and reliability of psychometrics
With more and more businesses opting to incorporate a psychometric test in their recruitment process, it’s becoming increasingly important to ensure that the assessment you’re using is scientifically robust. At the end of the day, the whole point of including psychometrics is to make better and quicker hiring decisions — if they’re not accurate, they’re a waste of time and money!
In order to find out which assessments are accurate, psychologists use measures like validity and reliability. Together, they serve as indicators for the quality of the test, so it’s essential you understand the term and ask a potential vendor for details — an assessment with high validity and reliability is one that you can safely use to help determine which candidate should be hired. But what do reliability and validity actually mean?
What are reliability and validity?
Measuring scientific tools has always been central to their accuracy and usefulness. Reliability and validity are the main measurement of psychometric tests — without knowing how reliable and valid your assessment is before purchasing it, you may be given bad or capricious results. But what are validity and reliability, and what’s the difference between them?
Validity
Simply put, psychometric validity is the degree to which the assessment measures what it claims to measure. It’s the answer to the question ‘to what extent do the results of this test indicate what they purport to measure?’.
Reliability
Psychometric reliability is the potential of the assessment to reproduce a result consistently. In other words, a reliable assessment will give the same person similar results regardless of the time, atmosphere, or circumstances.
What’s the difference between validity and reliability?
Albeit connected, validity and reliability indicate two separate types of accuracy. While validity requires reliability, an assessment can be reliable but not valid: for example, someone taking the test might score very high for ‘extraversion’ every time — this means the test can be reproduced (indicating it is reliable), but if the candidate is not actually extraverted, it won’t be valid.
If you require a less amorphous example, imagine an armed police officer at a gun range. Their goal is to hit the target (validity), every time (reliability). If they hit the heart every single time, they are a fantastic marksperson — they are both valid and reliable shooters. However, if they consistently hit the shoulder, while they are reliable shooters, they are not valid. Alternatively, if they hit the heart with every 5th attempt, they would not be valid nor reliable.
Why are psychometric reliability and validity important?
It’s not an exaggeration to say that, without reliability and validity, psychometric tests are pointless. The whole purpose of using an assessment is to inform your decision-making process with hard evidence — objective and scientific. If the assessment you’ve picked is not highly valid and reliable, you are simply not getting objective and scientific data. This is why it’s so important to both understand the terminology, but also request information from a vendor before you make your purchase.
How to tell if a psychometric assessment is valid?
As you can see, validity is extremely important for psychometric testing. An assessment that has low validity will cause more harm than good, as it will change your perception of a candidate in a way that does not correlate to their real self, pretty much missing the point of introducing psychometrics into your hiring process in the first place.
What factors influence psychometric validity?
In psychometrics, validity means ensuring that the results accurately reflect a person’s traits, behaviours, and cognitive abilities (depending on the type of assessment). The questions and words that are used to describe certain qualities, for example, could have an important impact on the validity of psychometric tests.
One of the biggest downfalls when it comes to validity in psychometrics is the sample of participants during the design stage. Age, gender, language, culture — the list goes on — are all factors in the way we understand, analyse, and output information. Focusing on one group of people with similar characteristics is a simple way to make the assessment not only biased, but also non-valid, as it only applies to one population. Instead, psychologists must opt for a large and heterogeneous sample to show that their assessment is valid across time, space, and cultures.
How is psychometric validity measured?
Validity is measured by a variety of data points and insights based on the relationship between the test and the personality traits it evaluates. There are a number of types of validity, each stressing a different type of accuracy which can be measured with different data points. To show that an assessment is, in fact, valid, psychologists must show it is valid across multiple forms, especially for more recent assessments and less established psychological constructs.
Types of psychometric validity
- Content validity: The degree to which the content (for example, items/questions or subsections) adequately reflects the relevant construct.
- Face validity: The degree to which a tool appears to be an adequate reflection of what it is supposed to measure. Of course, this is a very weak type of validity — it’s important, but not adequate just on its own.
- Construct validity: The degree to which scores are consistent with the hypothesis based on the abstract construct — is it actually measuring the theoretical component of the construct?
- Criterion/Criterion-based validity: The degree to which the results correspond to established benchmarks and gold standards.
- Discriminant validity: The degree to which constructs that should have no relationship are actually unrelated.
- Predictive validity: The degree to which the scores of one tool can be used to predict the future scores on a criterion measure, such as job performance.
- Cross-cultural validity: The degree to which a culturally-adapted tool is equivalent to the original tool.
How to tell if a psychometric assessment is reliable?
A non-reliable assessment can cause grave issues — you can’t depend on a test that might give different results at different times. It’s incredibly important to ensure a psychometric test is reliable.
What factors influence psychometric reliability?
Making a reproducible test is no joke. Constructing the questions and instructions correctly, for example, is essential — if they’re too ambiguous or difficult, people might answer them differently at different times. This also includes a clear and consistent scoring system that relies on proof. These are called systematic errors — flaws that stem from the design of the test.
However, there are also some subjective elements that can’t be easily controlled, such as the environment in which the test is taken, human error, or the test-taker not answering truthfully. These are referred to as unsystematic errors, or issues that are based on the specific test of an individual.
How is psychometric reliability measured?
You’d think testing for reliability would be a piece of cake — simply ask the same person to take the test multiple times. It even has a name: test-retest. The reality is, however, that whilst it is in fact the best way to measure reliability, it’s rarely used. This is because it’s pricey and risky. Firstly, psychologists would have to assess an individual at least twice, which costs money. But attrition is the real problem — the same person might not come back to be tested again, and if they don’t, it would be difficult for a scholar to know why they withdrew. Is it because of the test itself (for example, due to a low score)? If that’s the case, it will have a grave impact on the reliability of the assessment.
There are some other ways to measure reliability, though.
Types of psychometric reliability
- Test-retest reliability: The same test is conducted over time and the results are consistent, as mentioned above.
- Internal consistency reliability: Items within the test are evaluated for whether they measure what the test measures individually and within the assessment as a whole — so the scores on each item would correlate with the overall score. This is the preferred method psychometricians use today.
- Parallel forms reliability: Two different tests use the same content but separate processes or equipment, with consistent results.
- Inter-rater reliability: Two different raters score the assessment in the same manner and get similar results.
Are psychometric tests reliable and valid?
The short answer? Yes, psychometric tests can be highly reliable and valid. However, not all of them are, and that’s why it’s so important to understand what these terms mean and ask for information on how your vendor assesses these.
It’s important to note that validity and reliability are not black and white — they are measured on a spectrum: excellent, good, fair, poor. This is because in order for an assessment to be considered highly valid and reliable, it requires a vigorous testing process from different angles. It is entirely possible that one assessment would receive different scores for reliability and validity in different studies.
The bottom line is that assessments that have been properly and scientifically studied for this can be as reliable and valid as any other medical test — sometimes even more! That said, while subjective factors can be minimised or contained, they’re still there, so it’s important to note that they’re still not 100% perfect. This is why, at Thrive, we recommend using a psychometric assessment alongside structured interviews.
Interested in a scientifically-robust, highly reliable and valid psychometric assessment for your candidates and staff? Book a demo with Thrive today.