Beyond the Bubble Sheet: How Generative AI May Finally Liberate Assessment from Multiple-Choice Testing
Main Article Content
Abstract
Multiple-choice questions (MCQs) have dominated educational assessment for decades, particularly in large-scale and high-stakes examinations. Their popularity rests on perceived efficiency, reliability, and objectivity. However, extensive research has highlighted their serious limitations, including construct underrepresentation, susceptibility to guessing, shallow measurement of understanding, and negative washback on learning. This article argues that recent advances in generative artificial intelligence (GenAI), such as large language models, create a realistic opportunity to move beyond MCQs toward open-ended, constructed-response assessment at scale. GenAI systems have the ability to preserve administrative efficiency by automating the evaluation of student-generated answers while significantly increasing validity and insight into student thinking. In this article, we discuss why MCQs remain deeply entrenched in higher education, the pedagogical and epistemic costs of this dependence, and how GenAI-enabled assessment is likely to transform the nature of student preparation and the assessment of learning. The article concludes by describing the circumstances under which GenAI-based assessment can be responsibly implemented and the issues that remain to be worked through.
Article Details
Issue
Section
How to Cite
References
[1] D. Boud and N. Falchikov, “Rethinking Assessment in Higher Education,” Assess. Eval. High. Educ., vol. 32, no. 2, pp. 131–143, 2007.
[2] K. Scouller, “The Influence of Assessment Method on Students’ Learning Approaches,” High. Educ., vol. 35, no. 4, pp. 453–472, 1998.
[3] T. M. Haladyna, S. M. Downing, and M. C. Rodriguez, “A Review of Multiple-Choice Item-Writing Guidelines,” Appl. Meas. Educ., vol. 15, no. 3, pp. 309–334, 2002.
[4] M. Birenbaum and K. K. Tatsuoka, “Open-Ended Versus Multiple-Choice Response Formats,” Appl. Psychol. Meas., vol. 11, no. 4, pp. 385–395, 1987.
[5] J. Biggs and C. Tang, Teaching for Quality Learning at University. McGraw-Hill, 2011.
[6] R. E. Bennett, “Formative Assessment: A Critical Review,” Assess. Educ., vol. 18, no. 1, pp. 5–25, 2011.
[7] H. D. Brown, Language Assessment: Principles and Classroom Practices. Pearson Education, 2004.
[8] A. J. Nitko and S. M. Brookhart, Educational Assessment of Students, 6th ed. Pearson, 2011.
[9] S. Messick, “Validity,” in Educational Measurement, 3rd ed., R. L. Linn, Ed., Macmillan, 1989, pp. 13–103.
[10] M. D. Shermis and J. Burstein, Handbook of Automated Essay Evaluation. Routledge, 2013.
[11] S. M. Brookhart, How to Create and Use Rubrics for Formative Assessment and Grading. ASCD, 2013.
[12] L. W. Anderson and D. R. Krathwohl, A Taxonomy for Learning, Teaching, and Assessing. Longman, 2001.
[13] L. F. Bachman and A. S. Palmer, Language Assessment in Practice. Oxford University Press, 2010.
[14] D. M. Williamson, X. Xi, and F. I. Breyer, “A Framework for Evaluation and Use of Automated Scoring,” Educ. Meas. Issues Pract., vol. 31, no. 1, pp. 2–13, 2012.
[15] M. C. Rodriguez, “Three Options Are Optimal for Multiple-Choice Items,” Educ. Meas. Issues Pract., vol. 24, no. 2, pp. 3–13, 2005.