The World's Only Test Security Blog
Pull up a chair among Caveon's experts in psychometrics, psychology, data science, test security, law, education, and oh-so-many other fields and join in the conversation about all things test security.
Posted by Saundra Foderick
updated over a week ago
Testing programs often hear candidates say that the test they took “wasn’t fair.”
As you might expect, there is a tendency to brush off such complaints: programs work hard to ensure that their tests are fair.
(You can learn more about the major buckets of unfairness in testing in this article, and how exam programs commonly work to address that unfairness in this article.)
But what if the two views are based on slightly different expectations when it comes to fairness?
In the end, candidates, decision makers, and programs have similar goals.
Programs want to obtain information about—and attest to—each candidates’ true skills and knowledge in a given area. Whether the program is conferring a degree, a certification, or a badge, the program is giving a seal of approval to the candidates who pass their exam. So, the program needs to fairly assess each candidate’s performance on the exam.
The program wants that seal of approval to be accepted and valued, hoping that successful candidates will display pride and even become evangelists for the program. The program also wants decision makers (from employers to department chairs) to place trust and value in the fairness of the exam’s results and use those results in making decisions about the candidates.
Similarly, candidates want to feel that they are fairly assessed and that the exam is worth their time and expense. They want to feel that success with the exam is something they can be proud of.
On the other hand, decision makers need to be able to trust that the test results a candidate has received are an accurate representation of that examinee’s skills. These test results guide them as they make important employment and educational decisions.
In both of these cases, fairness—and perceived fairness—matters. A well-designed test development program can help to ensure fairness at every step.
From job task analyses (JTAs) to blueprinting, item writing, standard-setting, publication, and beyond, there are many areas where fairness impacts a test, for better or for worse.
Many exam programs outsource their test development and item design efforts to professional services (like this award-winning test development team) to ensure their exams are fair and psychometrically sound from start to finish.
But there are ideas and techniques that can move any exam from good to great. Here are the categories you should focus on if you’re wanting to move your exam from technically fair, to both technically and experientially fair.
For example, a while back, I was asked to figure out why candidates were highly critical of an MBA program’s flagship exam. The program had been supportive in terms of time and costs, and the environment for the initial JTA was collegial and interesting. The experts were true subject matter experts (SMEs), with most holding doctorates in the subject area and C-suite leadership positions in their respective companies. Still, the candidates surveyed said that the exam was not fair because it did not cover what mattered in real-world situations. They were angry.
Where did the process go wrong? Interestingly, the problem was that an inexperienced person selected the SMEs. Each SME had years of experience far beyond the master’s level and was surrounded by highly-skilled and competitive peers. Few of those SMEs could think back to what they needed to know to succeed at the master’s level. Being competitive, the SMEs often got sidetracked, adding objectives covering skills that were esoteric, obscure, or nice-to-know trivia in order to display their skills and knowledge.
The result? After sufficient complaints, the program brought in a skilled facilitator and started from square one. As you might imagine though, even with an updated competency model leading to a relevant and up-to-date exam, the program’s reputation suffered. When people feel that they are not treated fairly, they tend to share that feeling with others, and the program’s candidates did so, leading to distrust and suspicion.
Technical fairness and experiential fairness matter in item writing too. True, there is a world of difference between a poorly constructed bank of test items where test-wise candidates can gain an unfair advantage and neuro-diverse candidates are disadvantaged, and a bank created using standard best practices that try to prevent unfair advantages and harms. Still, there are additional steps that can increase both the perception of fairness and actual fairness.
Well-designed objectives promote fairness. Each objective should require a task that is truly necessary to display competence in the area being tested. And, taken together, the objectives should be sufficient to prove competency. There should not be one more objective or one fewer than necessary to fairly determine whether each candidate is competent in the required field or whether they are in need of further education or experience. Candidates feel fairly treated when the requirements are clear and can be assessed objectively as fully met or failed.
Even test and item format can promote fairness. Program style guides that strictly enforce a common look and feel across a program’s test items can provide the familiarity that lets candidates concentrate on the task at hand instead of concentrating on the wording or format that delineates the task. Within style guides, programs can require item types that are shown to promote comprehension across a wide range of neurodiverse candidates. And these techniques that can provide a fairer playing field for some neuro-diverse candidates help neuro-typical candidates as well.
For example, simple techniques such as reducing the reading load for any given item can ensure that weak readers or non-native speakers are not disadvantaged. Bullets and chunking can reduce reading load and remove the distracting clutter that disadvantages some types of neuro-diverse candidates. Using simple language and discouraging complex sentence structures can be equally helpful. In the same way, it helps to pay strict attention to terms that might have multiple connotations or similar words used to describe a single action or thing, as these words can add the sort of cognitive dissonance that can distract candidates from the task at hand.
Does fairness matter in standard setting? Of course. A test form with many easy test items will certainly ensure that most competent candidates pass. But, that same form will almost certainly deem at least a few incompetent candidates as competent. Why should that matter? Each candidate who passes the exam without the needed skills and knowledge is a candidate who can harm a program’s reputation or a client’s project. These candidates can also cheapen the value of the exam in the eyes of decision makers and those who worked hard to display true competency.
And an exam with items that are too difficult is unfair as well. While the program may ensure that the unprepared are not deemed competent, some of those with the needed skills and knowledge will be categorized as incompetent, possibly leading to the time and expense of additional study and retakes. Again, candidates talk about exams: an exam where almost everyone passes or an exam where almost all fail are both taken as signs of unfair testing practices. Those who are certified may feel less pride in their accomplishment, while those who fail will feel hard done by.
Fairness is part of publication and delivery too. For example, ensuring that text on the screen is in Serif font and high contrast can let the candidate focus on the required tasks instead of the interface. The ability to control text size and some aspects of the interface can make the difference between the anxious candidate and the candidate who can devote their complete attention to the task required. Even something as simple as providing the ability to display or conceal a timer or the number of items completed can remove a barrier to concentration, a barrier to success.
Similarly, single sign-on (SSO) and a reliable platform can reduce stress right before the start of the exam, removing a source of anxiety that might interfere with a candidate’s ability to shine. Mobile-friendly delivery can provide fairer access to those with limited access to specific devices or tight time schedules. Ensuring compatibility with standard screen readers, using web coding that works with common tablets, and permitting multiple types of input devices can help a wide range of learners display their skills and knowledge instead of focusing on factors that are not relevant to the subject being tested. Candidates who experience these sorts of barriers will not feel that they have been fairly assessed.
Of course, any discussion of fairness needs to consider how test theft and cheating affect a program’s fairness and perceived fairness. If a candidate believes that a significant number of peers are cheating, that candidate will either try to level the playing field by cheating as well or will be certain that the results are unfair. Even those who cheat are likely to feel that they are being treated unfairly if they think others are cheating. A well-designed program can deter cheating and test theft by reducing the exposure of any one item, either by automatic item generation (AIG), a novel item type such as Discrete Option Multiple Choice (DOMC), coded items such as SmartItems, or the use of a large and rotating item pool. Monitoring for exposure and having a clear and even-handed response plan in place can increase fairness as well.
Remember those candidates who talk about fairness? They also communicate their understanding of the fairness of review processes. If they see clear and fair consequences for those who cheat or steal exam content, they will share that as well, and they will feel that misbehavior did not provide an unfair advantage but provided a fair result.
To close the loop, truly excellent programs conduct regular form, item, and option audits, identifying those items that do not fairly assess candidates’ skills and knowledge and updating or replacing those items that do not adequately provide actionable information. As you might expect, a high pass rate does not guarantee fair results. A low pass rate does not guarantee high standards. Worse, neither will be perceived as fair by those who take the exam. A performance audit will not only identify how many candidates are failing a given item or a specific option, but it will also identify which candidates are failing that item. For example, if an item is correctly answered by a large number of those who fail the exam but seldom answered correctly by high-scoring candidates, it might be that those with greater competence recognize a complexity that the lower-scoring candidates do not understand. When a program finds a poorly performing item and fixes the problem, the test becomes fairer for all.
Fairness and perceived fairness matter to test takers, decision makers, and exam programs alike. From job task analyses (JTAs) to blueprinting, item writing, standard-setting, publication, and beyond, there are techniques that can move an exam from good to great. In order to ensure your exam is both technically and experientially fair, it is critical to focus on fairness throughout the entire exam development process.
For more than 18 years, Caveon Test Security has driven the discussion and practice of exam security in the testing industry. Today, as the recognized leader in the field, we have expanded our offerings to encompass innovative solutions and technologies that provide comprehensive protection: Solutions designed to detect, deter, and even prevent test fraud.
Get expert knowledge delivered straight to your inbox, including exclusive access to industry publications and Caveon's subscriber-only resource, The Lockbox.