A misquote from a news site concerning additional security announced by the TEA (Texas Education Agency) for the TAKS (Texas Assessment of Knowledge and Skill) caused me to pause and reflect about using statistical evidence to “prove” that someone cheated on a test. The reporter wrote, "Among other security measures, … scramble field test questions on tests to provide proof if someone is copying someone else's answer sheet." (Italics added.)
Being well aware of the controversy surrounding the use of statistics alone to detect potential cheating, I immediately doubted the accuracy of the above statement. Actually, in 2007, TEA announced that “the Texas Education Agency today will immediately initiate the following: … analyze scrambled blocks of test questions to detect answer copying…” TEA then later clarified that the scrambling would only involve field test items. News outlets were quick to criticize the scrambling plan, but I applauded TEA’s intent to use statistics to detect potential cheating.
Can Statistics Prove Cheating on Exams?
We naturally ask whether statistical evidence can be relied on to detect invalid test scores. Many authors have expressed the opinion that "statistical evidence must be corroborated by eye-witness accounts before making allegations of cheating."
In reality, statistical evidence should be used to assess the validity of a test score, and not to "prove" cheating. Statistics alone can never prove that cheating occurred, because cheating is a combination of behavior and intent.
What statistics can tell us is that there is sufficient evidence that a score is invalid and should not be trusted. Statistics can also tell us that the evidence for one hypothesis outweighs the evidence for another.
For example, we may have sufficient evidence to conclude that it is more likely than not that a particular examinee accessed disclosed test content. Based on that foundation, I believe that corroboration of the statistical evidence is unnecessary if the statistics are reliable. But what is reliable statistical evidence?
The Conditions of Reliable Statistical Evidence
Reliable Evidence Is Factual, Objective, Credible, and Defensible
In my opinion, reliable evidence must meet the following conditions:
- It must be factual
- It must be objective
- It must be credible
- It must be defensible
Here's how that breaks down—statistical evidence is:
- Factual when it is based on test result data (an actual record of the test event),
- Objective when it provides a statistic with a probability statement,
- Credible when the statistics have been shown to work because the models accurately depict actual test taking, and
- Defensible when the underlying science withstands scrutiny.
Reliable Evidence Must Be Strong
An additional fifth criterion the evidence must meet for taking action on a suspected instance of cheating is that the evidence must be strong. Statistical evidence is strong when the calculated probabilities are so small that we no longer believe the observed data are the result of normal test taking. Statistics can provide guidance for determining how strong is strong enough to take action, but ultimately the establishment of a probability threshold (i.e., the strength of the statistic) is a matter of policy that must be answered by the testing program administrator.
The Statistics Are Well-Suited for the Task at Hand
It is important with any statistical investigation to choose statistics that are well-suited and designed for the task at hand. For example, if the concern is that answer sheets are being modified, then erasure counts should be analyzed. Having analyzed over one hundred data sets for a wide variety of clients including state Departments of Education, admissions tests, certification programs, and licensure exams, I can unequivocally state that pre-knowledge of item content is currently one of the most predominant means of cheating on tests. In the heydays of paper-and-pencil testing, answer-copying was predominant. Depending on the type of potential cheating you would like to detect, appropriate statistics should be selected.
You can view our Ultimate Guide on Data Forensics to learn more. Until next time, may your tests remain secure.