Validity in evaluation: where is the argument-based approach heading?
DOI:
https://doi.org/10.51798/sijis.v5i3.792Keywords:
psychometrics; validity; educational assessment; generative artificial intelligenceAbstract
The evolution of the concept of validity is examined in the context of the integration of Generative Artificial Intelligence and ethical stances, and with it, informed decision-making. The methodology used includes the history of concepts as laid out by Koselleck, analyzing how the concept of validity is a fundamental concept. The method used is a literature review, analyzing historical and contemporary perspectives and arguments from influential authors such as Messick and Kane. This conceptual journey leads us to recognize that validity is not a monolithic entity, but a complex fabric of multiple theoretical and practical threads, ranging from the internal logic of evaluations to the repercussions of their application in society. Furthermore, validity is recognized as a complex construct that cannot be simplified to a single aspect or characteristic of a test or evaluation, differentiating between validity and validation. The five historical periods distinguished in the literature that reflect paradigmatic changes in the understanding of validity were: gestational, crystallization, fragmentation, reunification, deconstruction, culminating with the period of diffusion. The most relevant conclusion is that validity is not static but dynamic, evolving with context and application. It also emphasizes the need for continuous validation adapted to emerging challenges, such as Generative Artificial Intelligence (GenAI), with the goal of ensuring that evaluations are accurate and fair amid a growing trend on ideas of quantum computing.
References
Acree, J., Hoeve, K.B., Weir, J.B. (2016). Approaching the validation of accountability systems. Unpublished paper and presentation. ERM 600: Validity and Validation, University of North Carolina at Greensboro.
Aloisi, C. (2023). The future of standardised assessment: Validity and trust in algorithms for assessment and scoring. European Journal of Education, 58, 98–110. https://doi.org/10.1111/ejed.12542
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2018). Estándares para pruebas educativas y psicológicas (M. Lieve, Trans.). American Educational Research Association.
Bachman, L. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2, 1–34.
Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford University Press.
Bardin, L. (2011). Análisis de contenido (3ª ed.). Ediciones Akal.
Borsboom, D. (2009). Educational Measurement (4th ed.). Structural Equation Modeling-a Multidisciplinary Journal, 16 (4), 702-711. https://doi.org/10.1080/10705510903206097Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Borsboom, D., Cramer, A., Kievit, R., Scholten, A., & Franic, S. (2009). The end of construct validity. In The concept of validity (pp. 135-170).
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Brennan, R. (2001a). An essay on the history and future of reliability from the perspective of replications. Journal of Educational Measurement, 36, 295–317.
Briggs, D. C. (2004). Comment: Making an argument for design validity before interpretive validity. Measurement: Interdisciplinary Research and Perspectives, 2(3), 171–191.
Carrillo, B.; Sánchez, M., & Leenen, I. (2020). El concepto moderno de validez y su uso en educación médica. Investigación en Educación Médica, 98-106. https://doi.org/10.22201/facmed.20075057e.2020.33.19216
Chapelle, C. (2012). Validity argument for language assessment: The framework is simple…. Language Testing, 29(1), 19-27. https://doi.org/10.1177/0265532211417211
Chapelle, C. (2021). Argument-Based Validation in Testing and Assessment. SAGE.
Chapelle, C., & Sauro, S. (2017). Introduction to the Handbook of Technology and Second Language Teaching and Learning. The Handbook of Technology and Second Language Teaching and Learning, 1–9. https://doi.org/10.1002/9781118914069
Chapelle, C., Enright, M., & Jamieson, J. (2008). Building a Validity Argument for the Test of English as a Foreign Language. Routledge.
Chapelle, C., Enright, M., & Jamieson, J. (2010). Does an Argument-Based Approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3-13.
Chomsky, N., Roberts I., & Watumull, J. (8 de marzo de 2023). Noam Chomsky: The False Promise of ChatGPT. The New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html
Cizek, G. J., Kosh, A. E., & Toutkoushian, E. K. (2018). Gathering and Evaluating Validity Evidence: The Generalized Assessment Alignment Tool. Journal of Educational Measurement, 55(4), 477–512.
Cohen, L., Manion, L. & Morrison, K. (2007). Research methods in education. Routledge.
Cook, D., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: a practical guide to Kane’s framework. Medical Education, 49, 560-575. doi: 10.1111/medu.12678
Cronbach, L. J. (1971). Test Validation. In R. Thorndike (Ed.), Educational Measurement (2nd ed., p. 443). American Council on Education.
Cronbach, L. J. (1982). Designing evaluations of educational and social programs. Jossey-Bass.
Cronbach, L. J. (1989). Construct validation after thirty years. In R. E. Linn (Ed.), Intelligence: Measurement, theory, and public policy (pp. 147–171). Urbana: University of Illinois Press.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302
Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621– 694). Washington, DC: American Council on Education.
De Jong Gierveld, J. (1987). Developing and testing a model of loneliness. Journal of Personality and Social Psychology, 53(1), 119-128. https://doi.org/10.1037/0022-3514.53.1.119
De Jong Gierveld, J., & van Tilburg, T. G. (1992). Triangulatie in operationalization method. In G. J. N. Bruinsma & M. A. Zwanenburg (Eds.), Methodologie voor Bestuurskundigen: Stromingen en Methoden (pp. 273-298).
De Jong Gierveld, J., & van Tilburg, T. G. (2011). Manual of the Loneliness Scale 1999. Vrije Universiteit, Department of Social Research Methodology.
Delgado-Rico, E.; Carretero-Dios, H., & Ruch, W. (2012). Content validity evidences in test development: An applied perspective. International Journal of Clinical and Health Psychology, 12(3), 449-459. https://www.redalyc.org/pdf/337/33723713006.pdf
Derrida, J. (1997). Una filosofía deconstructiva. Zona erógena, 35.
Diana Arya, Anthony Clairmont, Daniel Katz & Andrew Maul (2020). Measuring Reading Strategy Use. Educational Assessment, 25:1, 5-30. https://doi.org/10.1080/10627197.2019.1702464
Embretson, S. (2007). Construct Validity: A Universal Validity System or Just Another Test Evaluation Procedure? Educational Researcher, 36, 449-455. https://doi.org/10.3102/0013189X07311600
Embretson, S. (2016). An Integrative Framework for Construct Validity. https://doi.org/10.1002/9781118956588.ch5.
Embretson, S., & Gorin, J. (2001). Improving Construct Validity With Cognitive Psychology Principles. Journal of Educational Measurement, 38(4), 343–368. https://doi.org/10.1111/j.1745-3984.2001.tb01131.x
Evelyn S. Johnson, Angela Crawford, Laura A. Moylan & Yuzhu Zheng (2020). Validity of a Special Education Teacher Observation System, Educational Assessment, 25:1, 31-46, DOI: 10.1080/10627197.2019.1702461
Fabrigar, L. R., Wegener, D. T., & Petty, R. E. (2020). A Validity-Based Framework for Understanding Replication in Psychology. Personality and Social Psychology Review, doi:10.1177/1088868320931366
Fan, J. (2014). Chinese test takers’ attitudes towards the Versant English Test: a mixed-methods approach. Language Testing in Asia, 4(1). doi:10.1186/s40468-014-0006-9
Ferrara, S. (2007). Our field needs a framework to guide development of validity research agendas and identification of validity research questions and threats to validity. Measurement: Interdisciplinary Research and Perspectives, 5(2–3), 156–164.
Gafni, N. (2016). Comments on implementing validity theory. Assessment in Education: Principles, Policy & Practice. https://doi.org/10.1080/0969594X.2015.1111195
Gallent-Torres, C., Zapata-González, A., & Ortego-Hernando, J.L. (2023). El impacto de la inteligencia artificial generativa en educación superior: una mirada desde la ética y la integridad académica. RELIEVE, 29(2), art. M5. http://doi.org/10.30827/relieve.v29i2.29134
García-Medina, A.; Martínez-Rizo, F.; Cordero-Arroyo, G., & Caso-Niebla, J. (2017). Evolución del concepto de validez en la medición educativa. https://www.researchgate.net/publication/325346472_Evolucion_del_concepto_de_validez_en_la_medicion_educativa
Garfield, E. (1979). Citation Indexing—Its Theory and Application in Science, Technology, and Humanities. Wiley.
Haertel, E. (2013). How is testing supposed to improve schooling? Measurement: Interdisciplinary Research and Perspectives, 11(1-2), 1-18.
Hoeve, K.B. A validity framework for accountability: educational measurement and language testing. Lang Test Asia 12, 3 (2022). https://doi.org/10.1186/s40468-021-00153-2
Hornberger, M., Bewersdorff, A., & Nerdel, C. (2023). What do university students know about Artificial Intelligence? Development and validation of an AI literacy test. Computers and Education: Artificial Intelligence, 5, 100165. https://doi.org/10.1016/j.caeai.2023.100165
Jawhar, S., Al, M., Alhawsawi, S. & Alkushi, A. (2021). Validating English Language Entrance Test at a Saudi University for Health Sciences. Arab World English Journal (AWEJ), 12(2), 49-71. DOI: https://dx.doi.org/10.24093/awej/vol12no2.4
Jong-Gierveld, J. (1987). Developing and testing a model of loneliness. Journal of Personality and Social Psychology, 53(1), 119–128. https://doi.org/10.1037/0022-3514.53.1.119
Kane, M. (2002). Inferences about Variance Components and Reliability-Generalizability Coefficients in the Absence of Random Sampling. Journal of Educational Measurement, 39 (2), 165-181.
Kane, M. (2006a). Content-Related Validity Evidence in Test Development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 131–153). Lawrence Erlbaum Associates Publishers.
Kane, M. (2006b). Current Concerns in Validity Theory. Journal of Educational Measurement, 38(4), 319-342. https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
Kane, M. (2011). Validating score interpretations and uses. Language Testing, 29(1), 3–17. doi:10.1177/0265532211417210
Kane, M. (2013a). Validating the interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1-73. https://doi.org/10.1111/jedm.12000
Kane, M. (2013b) The Argument-Based Approach to Validation, School Psychology Review, 42:4, 448-457.
Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342. https://www.jstor.org/stable/1435453
Kerlinger, F. y Lee, H. (2001). Investigación del comportamiento: métodos de investigación en ciencias sociales. McGraw Hill.
Koretz, D. (2008). Measuring up. What educational testing really tells us. Harvard University Press.
Koselleck, R. (2000). Los estratos del tiempo: estudios sobre la historia. Paidós Ibérica.
LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451–475. doi:10.1177/0265532217713951
Lavery, M., Bostic, J., Kruse, L., Krupa, E., & Carney, M. (2020). Argumentation Surrounding Argument‐Based Validation: A Systematic Review of Validation Methodology in Peer‐Reviewed Articles. Educational Measurement: Issues and Practice. doi:10.1111/emip.12378
Lindquist, E. F. (Ed.). (1951). Educational measurement. American Council on Education.
Lingard L. Writing with ChatGPT: An Illustration of its Capacity, Limitations & Implications for Academic Writers. Perspectives on Medical Education, 12(1): 261–270. DOI: https://doi.org/10.5334/pme.1072
Lissitz, R. (2009). The concept of validity: Revisions, new directions, and applications. Information Age Publishing.
Markus, K. & Borsboom, D. (2013). Frontiers of Test Validity Theory. Measure, Causation and Meaning. Routledge.
Messick S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher (18), 2, 5-11.
Mislevy, R., Steinberg, L., & Almond, R. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–62.
Nasution, N. E. A. (2023). Using artificial intelligence to create biology multiple choice questions for higher education. Agricultural and Environmental Education, 2(1), em002. https://doi.org/10.29333/agrenvedu/13071
Newton, P., & Shaw, S. (2014). Validity in educational & psychological assessment. SAGE.
Paul E. Newton & Jo-Anne Baird (2016) The great validity debate. Assessment in Education: Principles, Policy & Practice, 23:2, 173-177. https://doi.org/10.1080/0969594X.2016.1172871
Pedrosa, I., Suárez-Álvarez, J., & García-Cueto, E. (2013). Evidencias sobre la validez de contenido: avances teóricos y métodos para su estimación. Acción Psicológica, 10(2), 3-18. https://dx.doi.org/10.5944/ap.10.2.11820
Pellegrino, J. W., DiBello, L. V., & Goldman, S. R. (2016). A framework for conceptualizing and evaluating the validity of instructionally relevant assessments. Educational Psychologist, 51(1), 59–81. https://doi.org/10.1080/00461520.2016.1145550
Santamaría, F. (2012). De la analítica al (neo) pragmatismo. El giro de la filosofía anglosajona. Revista Colombiana de Humanidades, 80, 105-143. https://www.redalyc.org/pdf/5155/515551990007.pdf
Schilling, S. G. (2004). Conceptualizing the Validity Argument: An Alternative Approach. Measurement: Interdisciplinary Research and Perspectives, 2(3), 178–182.
Schmidt, T. , & Strasser, T.(2022). Artificial Intelligence in Foreign Language Learning and Teaching Anglistik, Volume 33, Issue 1 (2022), 165 – 184. DOI: https://doi.org/10.33675/ANGL/2022/1/14
Shepard, L. (2016) Evaluating test validity: reprise and progress. Assessment in Education: Principles, Policy & Practice, 23(2), 268-280. https://doi.org/10.1080/0969594X.2016.1141168
Sijtsma, Klaas. (2009). Correcting Fallacies in Validity, Reliability, and Classification. International Journal of Testing, 9, 167-194. https://doi.org/10.1080/15305050903106883.
Sireci, S. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50(1), 99–104.
Sireci, S. G. (2007). On Validity Theory and Test Validation. Educational Researcher, 36(8), 477–481. https://doi.org/10.3102/0013189X07311609
Sireci, S. G. (2016). On the validity of useless tests. Assessment in Education: Principles, Policy and Practice, 23. https://doi.org/10.1080/0969594X.2015.1072084.
Sireci, S., & Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema, 26(1), 100–107.
Sireci, Stephen & Doğan, Nuri. (2017). Interview with Stephen G. Sireci on Validity. Eğitimde ve Psikolojide Ölçme ve DEğerlendirme, 8, 158-168.
Thorndike, R. M. (1997). Measurement and evaluation in psychology and education (6th ed.). Merrill Publishing Co/Prentice-Hall.
Toulmin, S. (1958). The uses of argument. Cambridge University Press.
Watson, P. (2002). Introducción: la evolución de las leyes del pensamiento. En Historia intelectual del siglo XX, (pp.11-15). Crítica.
Zumbo, B. & Chan, E. (Ed.) (2014). Validity and Validation in Social, Behavioral, and Health Sciences. Springer Cham.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Karla Karina Ruiz Mendoza, Luis Horacio Pedroza Zúñiga, Alma Yadhira López García
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.