Alina von Davier is the Vice President of ACTNext, by ACT, Inc., a Research, Development, and Business Innovation Division, as well as an Adjunct Professor at Fordham University.
Alina von Davier earned her PhD in mathematics from the Otto von Guericke University of Magdeburg, Germany, and her MS in mathematics from the University of Bucharest, Romania. At ACT, von Davier and her team of experts are responsible for developing prototypes of research-based solutions and creating a research agenda to support the next generation for learning and assessment systems (LAS). She pioneers the development and application of computational psychometrics and conducts research on blending machine learning algorithms with the psychometric theory. Prior to her employment with ACT, von Davier was a Senior Research Director at Educational Testing Service (ETS) where she led the Computational Psychometrics Research Center. Previously, she led the Center for Psychometrics for International Tests, where she managed a large group of psychometricians, and was responsible for both the psychometrics in support of international tests, TOEFL® and TOEIC®, and the scores reported to millions of test takers annually. Two of her volumes, a co-edited volume on Computerized Multistage Testing, and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking, were selected, respectively, as the 2016 and 2013 winners of the AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. In addition, she wrote or co-edited five other books and volumes on statistic and psychometric topics. Her current research interests involve developing and adapting methodologies in support of virtual and collaborative learning and assessment systems. Machine learning and data mining techniques, Bayesian inference methods, and stochastic processes are the key set of tools employed in her current research. She serves as an Associate Editor for Psychometrika and the Journal of Educational Measurement. Prior to joining ETS, she worked in Germany at the Universities of Trier, Magdeburg, Kiel, and Jena, and at the ZUMA in Mannheim, and in Romania, at the Institute of Psychology of the Romanian Academy.
Performance tasks in virtual environments result in rich data about the test takers' behavior. The more realistic the task, the more difficult it is to identify meaningful signals in these data from the noise and the artifacts. Several approaches that have been considered in educational assessments have been inspired by the work conducted in the data mining and in natural language processing (NLP) communities, while preserving the psychometric theory. This type of blended approaches has been described as computational psychometrics (von Davier, 2015; 2017). In this presentation several approaches are described: a data model for creating a structure for the log files; a visualization method using networks, and a scoring method using NLP and clustering methods. The methods will be illustrated with data from a simulation task from NAEP and a collaborative game from ACT.
André De Champlain is Director of the Psychometrics and Assessment Services department at the Medical Council of Canada (MCC).
André De Champlain's work includes the review of current scoring and standard setting methodologies for MCC examinations, as well as several research studies relating to psychometric, test development and other operational issues. He has served as Director for Innovations in Testing at the National Board of Osteopathic Medical Examiners and has spent nearly 15 years at the National Board of Medical Examiners as a research psychometrician. He holds a Ph.D. in Educational Assessment, Testing, and Measurement from the University of Ottawa.
Title: Implementing Automated Item Generation in a Large-scaled Medical Licensing Exam Program - Lessons Learned.
On-demand testing is commonplace with large-scale testing programs as it affords greater flexibility in scheduling as well as selection of a testing location for the candidate. However, this also imposes a number of challenges to programs, including higher item exposure rates. Robust item banks are therefore needed to support routine retirement and replenishment of items.The Medical Council of Canada (MCC) has been exploring a complementary item development process that might streamline costly traditional approaches while yielding a number of items necessary to support more frequent and flexible assessment. Specifically, the use of automated item generation (AIG), which uses computer technology to generate test items from cognitive models, has been studied for over five years.Cognitive models are representations of the knowledge and skills that are required to solve any given problem. In the context of a medical scenario, content experts might be asked to deconstruct the (clinical) reasoning process involved via clearly stated variables and related elements which are then entered into a computer program that uses algorithms to generate multiple-choice questions (MCQs). The MCC has been piloting AIG items for over five years with its Qualifying Examination Part I (MCCQE I), a requirement for medical licensure in Canada. The goal of this presentation is to provide an overview of the lessons learned in the use of AIG with the MCCQE I. In addition to providing us with more items, AIG has been beneficialin: -Yielding items of a quality level that is at least equal to that of traditionally written MCQs, based on psychometric criteria.-Providing a framework for the systematic creation of plausible distractors, adding potential added value for tailored diagnostic feedback. -Contributing to an enhancement of our test development process. We are hopeful that sharing of our experiences might not only help other testing organizations interested in adopting AIG, but also foster broader discussion.
Bruno D. Zumbo is Professor and Distinguished University Scholar, as well as the Paragon UBC Professor of Psychometrics and Measurement, at the University of British Columbia in Vancouver, Canada.
At UBC, mister Zumbo also holds an appointment in the Institute of Applied Mathematics and is affiliated with the Department of Statistics. Over the last 28 years his program of research has emerged to have broad interdisciplinary impact and as such is well-recognized in a variety of disciplines including psychometrics, statistics, assessment and testing, educational research, language testing, health and human development. Professor Zumbo’s research and teaching have been recognized with several international awards. Among his recognitions, he was recipient of the 2017 Killam Research Prize; 2012 Killam Teaching Prize, selected as a Fellow of American Educational Research Association (AERA) in 2011; Research Fellow Award in 2010 by the International Society for Quality of Life Studies, and 2005 Samuel J. Messick Memorial Lecture Award
Title: The Reports of DIF’s Death are Greatly Exaggerated; It is Like a Phoenix Rising from the Ashes
For the last nearly 20 years I have heard, on multiple occasions, about the (impending) demise of differential item functioning (DIF) research. I have been told that “DIF is dying”, “I have a bias against bias”, and that we should stop publishing reports of DIF methodology and results of DIF studies and instead focus our energies on more interesting and fruitful matters. Given that I have nonetheless continued my program of DIF research, I have felt like the American humorist, novelist and social critic Samuel Clemens (Mark Twain) who, legend has it, upon reading his published obituary quipped ‘rumors of my death are greatly exaggerated’. I believe that the prognosticators of DIF’s death were reflecting a frustration with the bland, ritualistic, descriptive use of DIF analyses. However, after each calling of the death of DIF, like the phoenix of Greek mythology, DIF research obtains new life and rebirth from the ashes of its predecessors and is again today finding a resurgence of interest and an expansion of its uses in assessment and testing research in a variety of settings. In this address, I will review some recent developments and applications of DIF research including Third Generation DIF, mixed-methods DIF, explanation-focused validation studies, DIF informed by an ecological model of item responding, latent class DIF, and the use of DIF in response processes validation research that show both its currency and hints to where it is going next.
David Magis is Research Associate of the Fonds de la Recherche Scientifique – FNRS at the Department of Education, University of Liège, Belgium.
David Magis obtained an MSc in biostatistics (Hasselt University, Belgium) and a PhD in statistics (University of Liège, Belgium). His specialization is in statistical methods in psychometrics, with special interest in item response theory, differential item functioning and computerized adaptive testing. His research interests include both theoretical and methodological development of psychometric models and methods, as well as open source implementation and dissemination in R. He is associate editor of the British Journal of Mathematical and Statistical Psychology and the International Journal of Testing, and published numerous research papers in psychometric journals. He is the main developer and maintainer of the packages catR and mstR for adaptive and multistage testing, among others. He was awarded the 2016 Psychometric Society Early Career Award for his contribution in open-source programming and adaptive testing.
Title: Adaptive testing: examples, simulations, and examples of simulations
This talk will cover the topic of adaptive testing in psychometrics, both at the item level (CAT) and at the module level (MST). The presentation is divided in three parts. First, a brief conceptual description of CAT and MST frameworks is provided, with emphasis on practical advantages and drawbacks of each of them. Second, it is explained why and how simulation studies can be very useful in adaptive testing, among others, to correctly and suitably establish a real test administration plan. Third, examples of such simulation studies will be outlined, based on the R packages catR and mstR developed for that purpose.
I am a Professor of Industrial Psychology at the University of Johannesburg where I am the Head of the Centre for Work Performance.
Previously I was Director of the Institute for Child and Adult Guidance at the University of Johannesburg and an Associate Professor of Psychology at Stellenbosch University. I am registered as a Counselling Psychologist with the Health Professions Council of South Africa. Specialties: Psychological assessment Test construction and validation Multivariate statistics Item response modelling and Rasch modelling
Title: Challenges that psychological testing in the multicultural South African context
I address the challenges that psychological testing in the multicultural South African context present to test developers and psychologists. In this respect I examine gains and losses that have been made over the last twenty years. The challenges include (a) a historically rooted distrust in psychological testing and assessment, (b) cross-cultural differences in the salience of individual attributes in conceptions about personhood, (c) testing in a language that for most people is not their first language, (d) massive inequality with respect to socio-economic status and access to resources, and (e) the high cost of testing. I show where testing has failed and where and how testing can contribute positively in creating greater equality and well-being. The merits of advancing indigenous conceptions of individual differences in testing are weighed against the merits of taking advantage of the established and developing conceptions made in Northern America and Europe. Examples of recent test development projects that attempt to address failures of the past and contribute toward an inclusive psychology are presented and discussed critically. Finally, I discuss guidelines for the development of tests in complex, multicultural, multilingual and unequal societies.
Irini Moustaki is a professor of Social Statistics at the London School of Economics and Political Science.
Irini Moustaki received her Bachelor degree in Statistics and Computer Science from the Athens University of Economics and Business and her MSc and PhD in Statistics from the LSE. Her research interests are in the areas of latent variable models and structural equation models. Her methodological work includes treatment of missing data, longitudinal data, detection of outliers, goodness-of-fit tests and advanced estimation methods. Furthermore, she has made methodological and applied contributions in the areas of comparative cross-national studies and epidemiological studies on rare diseases. She has co-authored two books on latent variable models and published extensively in journals such as JRSS A and C, Psychometrika, JASA, and Biostatistics. She received an honorary doctorate from the Faculty of Social Sciences, Uppsala University in 2014 and she an Honorary Professor in the Department of Psychological Studies at The Education University of Hong Kong, since July 2015. She was elected Editor in Chief of the journal Psychometrika in November 2014. She is also at the editorial board of the Structural Equation Modelling journal and has served in the past in the editorial boards of the Journal of Educational and Psychological Measurement and the journal of Computational Statistics and Data Analysis. She was elected in 2016 a member of the society of Multivariate Experimental Psychology. She is a co-investigator in a recently awarded grant by the Economic and Social Research Council to work on Methods for the Analysis of Longitudinal Dyadic Data with an Application to Intergenerational Exchanges of Family Support. She has given keynote talks at various conferences in the US and in Europe.
Title: The Contributions of Women in Psychometrics-Statistics: Past and Present
In this talk, I will present an overview of the history of Psychometrics in terms of methodological contributions, substantive applications in diverse fields and relationships with other disciplines. I will aim to highlight some of the key contributions made by women and how those contributions have affected and shaped the discipline of Psyhometrics as we know it today. The talk will also look into current gender differences in various fields in statistics in the academia and beyond.
John Hattie is Laureate Professor and Director of the Melbourne Education Research Institute at the University of Melbourne.
Mister Hattie is also Chair of the Australian Institute for Teaching and School Leadership and has previously held the positions of Editor of the International Journal of Testing and President of the International Test Commission. His research has addressed performance indicators, models of measurement, psychometrics, and research design and evaluation. His 2008 book Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement has made his research known to a wide public. He received his Ph.D. from the University of Toronto and his Master of Arts from the University of Otago.
Title: Visible Learning and Assessment
Based on the synthesis of meta-analysis relating to achievement and recently to How we learn, this presentation outlines the role of assessment as feedback to teachers about their impact. It emphasizes the importance of score reporting, and seeing assessment as a vehicle to show teachers who they had impact on, about what, and to what magnitude.
Leslie Rutkowski joined the University of Oslo in September 2015 as Professor of Educational Measurement in the Centre for Educational Measurement.
Prior to her current appointment, Leslie was an Assistant Professor of Inquiry Methodology at Indiana University. In 2007, she earned her PhD in Educational Psychology with a specialization in Statistics and Educational Measurement from the University of Illinois at Urbana-Champaign under the direction of Professor Carolyn Anderson. Directly following her PhD, Leslie was appointed as a research associate for the International Association for the Evaluation of Educational Achievement (IEA). Her experience with the IEA inspired her work in the field of international large-scale assessment (LSA), where Leslie’s research emphasizes methodological and applied perspectives. Her interests include the impact of background questionnaire quality on achievement results, latent variable models for achievement estimation, and methods for comparing heterogeneous populations in international surveys. In addition to a number of peer-reviewed papers in applied measurement journals, Leslie recently published the edited volume Handbook of International Large-Scale Assessment (Rutkowski, von Davier, and Rutkowski, 2014) with Chapman & Hall.Leslie was also recently awarded a four-year Research Council of Norway FINNUT grant to investigate and develop methods that directly account for population heterogeneity in international surveys and assessments.Leslie Rutkowski is Professor of Educational Measurement at the Centre for Educational Measurement at the University of Oslo. She earned her PhD in Educational Psychology, specializing in Statistics and Measurement, from the University of Illinois at Urbana-Champaign. Leslie’s research is in the area of international large-scale assessment. Her interests include latent variable and examining methods for comparing heterogeneous populations in international surveys. In addition to a recently funded FINNUT grant on developing international assessment methods, Leslie recently published the edited volume Handbook of International Large-Scale Assessment (Rutkowski, von Davier, and Rutkowski, 2014) with Chapman & Hall.
Title: Increased heterogeneity in international assessments and associated measurement challenges
In the 2018 cycle of PISA, 80 educational systems have committed to participation, with new additions that include Brunei Darussalam, a relatively wealthy newcomer to international assessments, the Philippines, which hasn’t participated in an international study since TIMSS in 2003, and Belarus, a first-time participant with a per capita GDP that is on par with Thailand and South Africa. Such a heterogeneous collection of participating educational systems poses challenges in terms of deciding what should be measured and how to measure it in a comparable way. This challenge extends to the newest instantiation of PISA: PISA for Development (PISA-D), which emphasizes economically developing countries, including Cambodia, Zambia, and Senegal. To that end, the issue of cultural heterogeneity in international assessments serves as a backdrop against which I consider several methodological challenges to measuring such diverse populations. In addition, I highlight some recent operational advances in international assessment for dealing with cross-cultural differences and propose ways that ILSAs might continue to account for system-level heterogeneity in the measures and methods used to analyze them. Finally, in the context of PISA-D, I propose a psychometric framework that can be used operationally for helping to determine whether and when a PISA-D participant is suited for transition to the main PISA study. Finally, I describe several areas, albeit in need of in-depth research, where international surveys might consider modifications to current operational procedures to improve cross-cultural comparability.
Maryam Wagner, PhD, is an Assistant Professor in the Department of Medicine at McGill University, Montreal.
Maryam Wagner is a Core Faculty member at the Centre for Medical Education and a program lead for the Assessment and Evaluation Unit. Maryam completed her doctoral studies at the Ontario Institute for Studies in Education/University of Toronto focusing on students’ use of cognitively diagnostic feedback for advancing writing. Her research is broadly situated in educational assessment. She has participated in a variety of research investigations including tracking medical students’ learning progressions in technology-rich learning environments, a validation study of a language assessment framework, and the development and validation of a multimodal scenario-based assessment system. She was awarded a Social Sciences and Humanities Research Council of Canada Postdoctoral Fellowship which she pursued at Carleton University working on diagnosing and supporting engineering students’ writing. Additionally, Maryam has acquired expertise in a range of research methods and methodologies including mixed methods research design.
Title: Examining the Potential and Uses Cognitive Diagnostic Assessment in Test Development and Validation
Cognitive diagnostic assessment (CDA) has drawn much attention in educational assessment in recent years (DiBello, Roussos, & Stout, 2007; Jang, 2005; 2008; Leighton & Griel, 2007; Nichols, 1994; Nichols, Chipman, & Brennan, 1995). CDA is an approach to providing diagnostically rich information about individual learner’s skill mastery using cognitive theories of learning alongside statistical modelling. CDA can be used for multiple purposes including tailoring instruction, placement, as well as contributing to increasing students’ metacognitive awareness of their learning in a domain (Butler & Winne, 1995; Jang, Dunlop, Park, & van der Boom, 2015). A further, and not widely explored potential of CDA is its potential for generating validity evidence throughout the processes of test development (Jang, Wagner, & Dunlop, 2016), as well as after test implementation (Wagner et al., 2015). In this presentation, I illustrate these applications of CDA by drawing on examples of its use in multiple contexts. Specifically, I discuss the recent development of a computer-based French as a Second Language assessment system serving placement and diagnostic purposes in a Canadian university wherein CDA principles were used in the test development process. I also discuss the application of CDA to a high-stakes reading test to empirically substantiate the knowledge and skill relationships with test items.
Stephen Sireci is Professor in the Psychometrics Program and Director of the Center for Educational Assessment at the University of Massachusetts Amherst.
Mister Sireci's research interests include educational test development and evaluation, particularly issues of validity, cross-lingual assessment, standard setting, and computer-based testing. He is a Fellow of AERA, a Fellow of Division 5 of APA, Chair of the International Test Commission Award Committee and past Co-Editor of the International Journal of Testing and the Journal of Applied Testing Technology. He received his Ph.D. in Psychology (Psychometrics) from Fordham University and his Master of Arts in Psychology from Loyola College.
Title: 21st-Century Validation Procedures for 21st-Century Tests
Descriptions of validity that imply it is an inherent property of a test are misguided because tests are designed for specific purposes. As the AERA et al. (2014) Standards for Educational and Psychological Testing point out, it is not a test that is validated, but rather the use of a test for a specific purpose. In this talk, I describe how test validation must focus on test use, and how not doing so can only be defended in a purely hypothetical setting. To position validity theory separately from test use suggests that tests can be validated solely by providing evidence that a test measures the construct it purports to measure. While such evidence is a fundamental requirement for validating the use of a test for a particular purpose, it is insufficient for supporting and defending test use. Describing validation independent of test use only makes sense if test scores are never used for any purpose. Hypothesizing how to "validate" such useless tests may be a fun academic exercise, but has no value for those who work on developing and evaluating tests that are used in the real world for measuring educational and psychological construct.
Sara Ruto heads the secretariat for People’s Action for Learning (PAL) network.
The PAL network currently comprises civil society organizations that are conducting citizen led assessments in 14 countries in Africa, Asia and Latin America. The focus of the assessments are reading and numeracy. In addition she manages an organisation known as ziziAfrique that focusses on evidence based intervention with a purpose of informing the quality of educational provision. Prior to serving in this position, Sara initiated the citizen led process in Kenya in 2009 that currently operates as Uwezo and thereafter managed the Uwezo East Africa learning assessment. She sits in several committees, such as Global Education Monitoring Report, the World Bank’s SABER Technical Advisory Board and INCLUDE Knowledge Platform. Her current role, as Chair of the Kenya Institute of Curriculum Development provide an opportunity to participate actively in the current education reform process in Kenya. She trained as a teacher in Kenyatta University in Kenya, and got her doctorate degree from Heidelberg University in Germany.
Title: Including the Excluded through Rethinking National Assessments: The Example of Citizen Led Assessments
As increasing numbers of children enroll in school and the world moves closer to achieving universal primary education, national assessments can be a powerful mechanism for testing if the system works for all children. Assessments can identify problem areas in children’s learning trajectories as well as patterns with respect to specific subpopulations that may be struggling more than others. However, for many countries in the global south, the design of traditional large scale learning assessments – whether national examinations or regional/international standardized tests – subverts these objectives from the very beginning. First, because they are designed as pen-paper assessments, they assume that the children taking these tests have the foundational skills necessary to be able to respond adequately to test items. In reality, very large proportions of children in the global south do not acquire these foundational skills even after several years in school; traditional test formats ensure that this invisible problem continues to go unnoticed. Second, because standardized testing is done in schools, the important process of understanding what “learning” looks like and how to measure it is not communicated to important stakeholders in children’s lives – family and community members, many of whom have perhaps not themselves been to school. And third, although test items are often designed to generate a deep understanding about children’s learning, only a very small number of highly trained individuals in any given context are able to understand and interpret testing data, thus limiting its usefulness as a tool for catalyzing action to a handful of people.
To summarize, most standardized testing ignores the realities not only of the children in the global south, but equally the realities of the adults within and outside the school system who are in a position to use testing data to inform action on the ground. The citizen led assessment model (CLA), implemented by the 14 member countries of the People’s Action for Learning (PAL) network, is designed to address these realities. The presentation will delve into how this happens.