Ideas in Testing Research Seminar Schedule, November 2, 2018
Coffee & Networking (9:15 — 9:45)
Welcome and Introduction (9:45 — 10:00)
Compute Adaptive Testing (CAT) (10:00 — 11:00)
Comparison of Item Selection Criteria in Multidimensional Computer Adaptive Testing with the Graded Response Items — Scott Morris (IIT), Michael Bass (Northwestern), Matthew Lauritson (IIT), Sheng Zhang (IIT), & Richard Neapolitan (Northwestern)
abstract slides
Applicant reactions to AIG: A CAT AIG feasibility study — Alan Mead (Talent Algorithms Inc.), Sheng Zhang (IIT), & Daniel Stopka (IIT)
abstract slides
Abstract: A pilot study of respondent perceptions of flawed
verbal analogy items automatically generated by software ("AIG items")
was conducted to understand examinees' perceptions of flawed items and
to estimate the psychometric properties of AIG items. A small sample of
respondents (N=23) flagged items partially in line with expectations
and partially in unexpected ways. Notably, respondents were far more
likely to flag items with no keys or an awkward analogy as flawed than
items with other flaws, including items with two keys. Difficulty of
AIG items had an acceptable range and central tendency, despite being
generated without control of the difficulty.
Computer Adaptive Testing via Adaptive Table of Specification with a multistage consideration: A simulation study — Ye Ma (University of Iowa) & Johnny Denbleyker (Houghton Mifflin Harcourt)
abstract
Break (11:00 — 11:15)
Psychometrics I (11:15 — 12:15)
Evaluating Alpha/Beta/Gamma Change with Ordinal Confirmatory Factor Analysis — Sean Wright, Scott Morris, & Daniel Gandara (Illinois Institute of Technology)
abstract slides
Evaluation of R Packages with IRT 2PL Dichotomous Model — Shuya Zhang, Maxwell Hong, & Ying Cheng (Notre Dame)
abstract
A Pseudo Power Analysis for CTT Item Analysis — Alan Mead (Talent Algorithms Inc.)
abstract slides
Abstract: Presents a heuristic power analysis for detecting
flawed items using CTT item analysis corrected item-total correlations
(CITCs). Samples as low as N=25 are shown to have excellent power for
detecting large effect sizes (e.g., due to miskeyed items) and more
modest power to detect items with zero population CITCs. Type I error
rates were uncontrolled and often excessive.
Lunch (12:15 — 1:00)
Research Discussion (1:00 — 1:30)
An Agenda for Psychometric Research Alan Mead (Talent Algorithms Inc.), Kirk Becker (Pearson), & Scott Morris, (IIT)
Innovative assessment options (1:30 — 2:30)
A Review of Games Based Assessment — Reya Green & Kristina Bauer (Illinois Institute of Technology)
abstract slides
Abstract: Presents a review of the GBA literature organized by
three dimensions (i.e., constructs assessed, research methodology, and
elements of the game) that highlights the common practices and methods
utilized (i.e., what is known) as well as summarizes key suggestions for
future research (i.e., what is unknown). The literature review revealed
that cognitive constructs were most commonly assessed, correlational
designs were the most frequently employed, and games varied in the number
and type of game elements utilized.
Automating Job Analysis using Natural Language Processing — Cavan Gray (Pearson)
abstract
Automatic Item Generation: Methods, Applications, and Sample Statistics — Kirk Becker (Pearson)
abstract
Break (2:30 — 2:40)
Psychometrics II (2:40 — 3:40)
Semi-Supervised Learning for Criterion-Related Validity Studies — Alan Mead (Talent Algorithms Inc.) & Daniel Stopka (IIT)
abstract
Abstract: The criterion-related validity studies conducted
by I/O psychologists are an example of a class of data science studies
called "supervised learning" studies. Recently, data scientists have
investigated a new hybrid class of studies, called "semi-supervised
learning" which blend supervised learning and unsupervised learning to
use cases with missing criterion (or predictor) data. Unfortunately, our
simulation of a typical criterion-related validity study suggested that
semi-supervised learning does not improve either accuracy or efficiency.
Reviving Lord-McNemar's Estimated True Gain Score in the Modern World — Johnny Denbleyker (Houghton Mifflin Harcourt) & Ye Ma (University of Iowa)
abstract slides
Rating Scale Anlysis Using Ideal Point Response Process — Georgi Petkov (Bowling Green State University)
abstract slides
Closing comments (3:40)
Questions about the seminar may be directed to Alan Mead
(), Scott Morris (), or Kirk Becker (). We hope you will join us.
Back to the main page