The World's Only Test Security Blog
Pull up a chair among Caveon's experts in psychometrics, psychology, data science, test security, law, education, and oh-so-many other fields and join in the conversation about all things test security.
Posted by Dennis Maynes
updated over a week ago
A misquote from a news site concerning additional security announced by the TEA (Texas Education Agency) for the TAKS (Texas Assessment of Knowledge and Skill) caused me to pause and reflect about using statistical evidence to “prove” that someone cheated on a test. The reporter wrote, "Among other security measures, … scramble field test questions on tests to provide proof if someone is copying someone else's answer sheet." (Italics added.)
Being well aware of the controversy surrounding the use of statistics alone to detect potential cheating, I immediately doubted the accuracy of the above statement. Actually, in 2007, TEA announced that “the Texas Education Agency today will immediately initiate the following: … analyze scrambled blocks of test questions to detect answer copying…” TEA then later clarified that the scrambling would only involve field test items. News outlets were quick to criticize the scrambling plan, but I applauded TEA’s intent to use statistics to detect potential cheating.
We naturally ask whether statistical evidence can be relied on to detect invalid test scores. Many authors have expressed the opinion that "statistical evidence must be corroborated by eye-witness accounts before making allegations of cheating."
In reality, statistical evidence should be used to assess the validity of a test score, and not to "prove" cheating. Statistics alone can never prove that cheating occurred, because cheating is a combination of behavior and intent.
What statistics can tell us is that there is sufficient evidence that a score is invalid and should not be trusted. Statistics can also tell us that the evidence for one hypothesis outweighs the evidence for another.
For example, we may have sufficient evidence to conclude that it is more likely than not that a particular examinee accessed disclosed test content. Based on that foundation, I believe that corroboration of the statistical evidence is unnecessary if the statistics are reliable. But what is reliable statistical evidence?
In my opinion, reliable evidence must meet the following conditions:
Here's how that breaks down—statistical evidence is:
An additional fifth criterion the evidence must meet for taking action on a suspected instance of cheating is that the evidence must be strong. Statistical evidence is strong when the calculated probabilities are so small that we no longer believe the observed data are the result of normal test taking. Statistics can provide guidance for determining how strong is strong enough to take action, but ultimately the establishment of a probability threshold (i.e., the strength of the statistic) is a matter of policy that must be answered by the testing program administrator.
It is important with any statistical investigation to choose statistics that are well-suited and designed for the task at hand. For example, if the concern is that answer sheets are being modified, then erasure counts should be analyzed. Having analyzed over one hundred data sets for a wide variety of clients including state Departments of Education, admissions tests, certification programs, and licensure exams, I can unequivocally state that pre-knowledge of item content is currently one of the most predominant means of cheating on tests. In the heydays of paper-and-pencil testing, answer-copying was predominant. Depending on the type of potential cheating you would like to detect, appropriate statistics should be selected.
You can view our Ultimate Guide on Data Forensics to learn more. Until next time, may your tests remain secure.
For more than 18 years, Caveon Test Security has driven the discussion and practice of exam security in the testing industry. Today, as the recognized leader in the field, we have expanded our offerings to encompass innovative solutions and technologies that provide comprehensive protection: Solutions designed to detect, deter, and even prevent test fraud.
Topics from this blog: Data Forensics Detection Measures
Get expert knowledge delivered straight to your inbox, including exclusive access to industry publications and Caveon's subscriber-only resource, The Lockbox.