Standardized Testing: How Tech Promotes True Standardization — Caveon

Written by David Foster, Ph.D. | June 29, 2021 at 8:06 PM

What Is Standardized Testing?

The goal of a standardized test is a noble one: to make everyone’s score valid and therefore truly comparable. In the past few decades, the way to do that has been to make sure that test takers see the same test content and have the same overall testing experience. The logic is that if everyone follows the same path, then test scores can be compared and important decisions about them can be fairly made.

How Exams Are Becoming Un-Standardized

Unfortunately, unintended influences on the scores creep in (or even barge in) on the process described above and consequently un-standardize it. When this happens, the test can no longer be considered standardized, and the scores can no longer be compared fairly. To add to the problem, these unintended influences often remain undetected and unknown. Two such influences are cheating and test theft. These two influences are pervasive, unintended, and mostly undetected and unknown. While the overall variety of such influences is very broad, I’m going to limit my comments to these two types of test fraud.

Cheaters and thieves are able to use new technologies to keep their activities covert while being successful at what they do. It is almost always the case that a testing program does not know that any particular test score was (or was not) affected by cheating. It is a most serious problem for our field when cheating happens and we are unable to trust our test scores.

Technology in Standardized Assessments: The Good and the Bad

Technology has often been called a two-edged sword, meaning it can be both helpful and harmful. Its benefits to the field of assessment are obvious—a program’s tests can be administered around the world and reach audiences never before possible, and exams can be rapidly scored to give quick feedback to the test taker, educators, and others. Additionally, technology through innovative item designs lets us measure skills in new ways and discover more about how students learn and how workers succeed. These are just a few examples of the hundreds of ways that technology is helpful.

With all the benefits of technology in the testing realm, at least one big dark side exists—technology also makes our exams and the scores they produce more vulnerable. Hidden cameras are used to record live test questions and share them on the internet, and then many test takers from around the world can easily pay for the answers to test questions, memorize them, and then use pre-knowledge to cheat on an exam—all in real time and without risk of getting caught. What an incredibly easy way to get a passing score, with little chance of punishment. Today, technology is used to create fake identification documents that look incredibly realistic. Those IDs are then used effectively by proxy test takers to impersonate actual test takers, again, with virtually no risk. The un-standardization of tests occurs under our noses on every meaningful exam.

What can we do about that un-standardization? Well, it turns out that we have options.

How Can Exams Become Truly Standardized?

Design and Create Exams that are Fraud-Proof

If we can’t detect cheating and test theft, then we need to set up testing conditions that prevent them (you can learn more about how to protect your exams here). This doesn’t mean that we hire more proctors or further train the ones we already have, that path is a lost cause. Instead, we need to design and create a new breed of tests—tests that are fraud-proof. Theft can be eliminated, along with almost all forms of cheating. That is worth repeating: With technology, we can (today) stop virtually all forms of test fraud from influencing our test scores!

Utilize Unique Test Forms and Items for Each Test Taker

The tests that are designed, built, and administered today are sitting ducks for test fraud—how we build tests and how we administer tests today must be re-engineered. Test forms must be unique for each test taker, which means that items must also be unique. No more static, predictable tests with the same items (read more on this below). If possible—and it is—items and tests are used once per test taker and never again. I know this sounds unrealistic and impossible from traditional testing perspectives, but it is possible. In fact, it was envisioned and described by renowned testing experts.

For example, in the middle of the 20th century, Frederick Lord was able to imagine a large population or “universe” of thousands or millions of items completely representing a domain of knowledge or skill (you can learn more about Lord’s vision here and here). Lord described an ideal test as one where test forms, each unique to every test taker, were comprised of items randomly sampled from that universe. He called these forms randomly parallel tests (RPTs) and cited several psychometric and statistical advantages of such tests. It was clear to Lord that RPTs, impossible to make at the time, are the ideal ways to test.

Take Advantage of Randomly Parallel Tests (RPTs)

One advantage of randomly parallel tests that Lord didn’t cite back then, but which is obvious to us today, is the comprehensive prevention of test fraud. RPTs are, by definition, unpredictable, while still representing the universe of test content. That means simply that if a test taker snaps a picture of their screen during the test, that picture is immediately worthless. Since no one else will see that item on a future test, or any others from that particular test, theft is functionally impossible. The market for such useless information would dry up overnight. It follows that using pre-knowledge and other forms of cheating would be ineffective as well.

Research the Technology That Will Best Help Your Program

With technologies in hand, such as Automated Item Generation (AIG), it is possible today to create the large population of items spoken by Lord. The process can be simplified even further by representing those universes of items in the concept and use of SmartItem technology. The SmartItem routinely renders representative one-and-done items and tests on the fly during test administration. Computerized adaptive testing (CAT) technology, with us for more than 30 years now, is another way to present unique content to each test taker. With a new vision in front of us, other creative approaches will be invented. With cheating out of the way, the goal of these technologies—that test scores truly represent a test taker’s ability across an entire domain or universe of competency—is closer to reality.

The Benefits of Randomization in Standardized Testing

Producing test scores that generalize across a target domain, and obliterate almost every form of test fraud, are two heretofore unrealized goals in the testing industry. Randomly parallel tests, however they are created, have at least those two results, and there are many other direct and indirect benefits as well.

One benefit of randomly parallel tests, according to Lord, is that such tests now have available strong statistical tools, such as sampling theory, that are in common use by every other science. The items on today’s traditional tests are not sampled randomly from a large population of items, but are produced and selected for test forms through a process where personal and systematic bias can affect the outcome.

As a field, the testing industry has begun to use randomization in our items (e.g., randomizing the order of options for selected-response item types) and in our tests (e.g., randomizing the order of items on the test) in order to prevent particular security threats. Random selection of appropriate content is simply the next step we can easily take.

Doesn’t giving unique test forms to each test taker go against the tenets of standardization?

This is the big question. Does standardization require fixed item format and content, with static tests that change very little? Or can standardization still exist when items and tests change from person to person? In the 5th edition of his textbook, Essentials of Psychological Testing, Lee J. Cronbach talked about the effects of upcoming computer technology on testing, and specifically talked about standardization:

Because of its consistency, the computer carries standardization to an extreme; yet it can achieve standardized measurement while presenting different questions…to every test taker. (p. 46)

Anticipating cheating circumstances of today, Cronbach describes a unique test for each test taker and how such testing makes cheating impossible. Speaking about a possible test for customs officers, he states:

For each examinee, the computer can arbitrarily alter the rates, rules or makeup of the shipment. Facing that kind of test, the candidate has no option but to learn to apply rate schedules. He gains nothing from finding out which rules a friend had to apply last week. (p. 47)

Cronbach calls these type of tests "standardized" because, in his words, “Any number of equivalent tests can be made up by selecting every set of items according to the same plan” (p. 46). If you are building test forms to the same plan or using the same procedure, you are creating a standardized test and have thousands or millions of unique test forms at the ready.

Conclusion

Ironically, the static tests that we call “standardized” today have contributed to their un-standardization by encouraging many extraneous influences, including cheating. It’s time to convert to the type of standardized testing Cronbach described 30 years ago, and to use the randomly parallel tests that Lord proposed almost 70 years ago. With this standardization, cheating will go away, and many other hard-to-handle influences will become obsolete as well.

View full post