Papír és számítógép alapú tesztelés nagymintás összehasonlító vizsgálata matematika területén, 1-6. évfolyamon
Main Article Content
Absztrakt
There is little doubt today that a sizeable percentage of educational assessment is computer-based (CB). However, when computer-based assessment replaces paper-and-pencil (PP) testing, a number of questions arise regarding issues of equivalence. This paper compares results from PP and CB testing to identify domains and item formats where the two media may influence achievement. Mathematics tests comprising various item types and connected by anchor items were administered in PP and CB modes to six age groups from Years 1 to 6 in Hungarian schools (N=40 571 and 21 895, respectively). Online data collection was carried out on the eDia platform. The internal consistencies of the tests were good: Cronbach α was over .86 and .91 in PP and CB modes, respectively. Strong correlations were found between the total scores on thetwo versions of the test, and they showed an increasing trend over time, indicating that paper- and computer-based test performances become more similar with age (r_grade 1=.70; r_grade 6=.92). This paper argues that the media effect is related to the item format, type, complexity, length and content used on the tests; however, no single parameter can be identified generally, which could have resulted in a steadily significant media effect so that its use should be restricted. The children in the lower years performed higher on multiple-choice items in the CB environment because of the higher motivation for testing that resulted in less missing data. The average scores on open-ended CB tasks proved to be lower for items requiring calculation and complex operations and/or higher-level thinking skills. The length of the tasks, specifically scrolling through texts to find an answer, had an effect on the learners in the lower age groups, but if the tasks contained colourful pictures test-takers achieved higher scores on CB than on PP. Results indicate that if the test contains various item types, formats, contexts and levels of complexity, no derivation can be experienced in the test results on the different mediums. If the test contains similar items, e.g. only simple closed or open-ended tasks without any illustration, we should pay special attention to changes in performance depending on the media.