Reanalyzing the DISMAS test: Reliability and content validity

Hynek Cígler, Jan Širůček, Pavel Traspe, Ivana Skalková


The goal of this paper is the reanalysis of the Czech “Assessment of the structure of mathematical abilities” test (Traspe and Skalková, 2013), designed to assess problems related to the development of mathematical abilities in children aged approx. 5–11 years. The test contains 14 developmental scales and total scores – a total of 22 test scores with percentile norms. This study uses normative (N = 878) and clinical (N = 877) samples and focuses on three objectives: (1.) the estimation of composite scores reliability using stratified Cronbach's alpha; assessment of content validity and test fairness using (2.) a series of confirmatory factor analyses and (3.) differential item functioning analysis (DIF). Reliability estimates, which took into account the multidimensional structure of composite scores, led to a two-fold (in the case of total score, a three-fold) decrease in standard errors and narrower confidence intervals. Structural models supported the assumption of a weak factorial invariance between grades 2 to 5, except the Computing Automation subtest (the relationship of which with overall math ability strengthens with age). However, the factorial structures for first graders and younger children were different and there was no clear factor structure in the clinical sample. Results also suggested that the Mathematical Ideas subtest can serve as a screening method of the overall level of mathematical abilities. Single scales were not shown to be invariant according to the DIF analyses between grades and samples, which might mean that lower scores do not simply imply lower levels of mathematical ability, but also qualitative differences. This paper offers further recommendations for test use in common assessment situations.

(Fulltext in Czech)


DISMAS, CFA, confirmatory factor analysis, DIF, content analysis, reanalýza


Cígler, H., Jabůrek, M., Straka, O., & Portešová, Š. (n.d.). Test pro identifikaci nadaných dětí v matematice pro 3.–5. třídu. Brno: Masarykova univerzita.

Cígler, H., Jabůrek, M., & Širůček, J. (2014). Reanalyzing the DISMAS Test Data: Comparing IRT and CTT Based Estimates of the Error of Measurement. In The 9th Conference of The International Test Commission : Global and Local Challenges for Best Practices in Assessment, 2014. (pp. 211–212). San Sebastián: International Test Commission. Retrieved from grab CD Ong.pdf

Cronbach, L. J., Schonemann, P., & McKie, D. (1965). Alpha coefficients for stratified-parallel tests. Educational and Psychological Measurement, 25(2), 291–312.…446502500201

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.…7/BF02310555

Epskamp, S. (2014). semPlot: Path diagrams and visual analysis of various SEM packages’ output. R package version 1.0.1. Retrieved from

Furr, R. M., & Bacharach, V. R. (2014). Psychometrics : An Introduction. Los Angeles, CA: Sage.

Gadermann, A. M., Guhn, M., & Zumbo, B. D. (2012). Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide. Practical Assessment, Research & Evaluation, 17(3). Retrieved from…

Geary, D. C., Hoard, M. K., & Hamson, C. O. (1999). Numerical and arithmetical cognition: Patterns of functions and deficits in children at risk for a mathematical disability. Journal of Experimental Child Psychology, 74(3), 213–239.…cp.1999.2515

Hogan, T. P. (2013). Psychological Testing: A Practical Introduction. New York: Wiley.

Hönigová, S. (2014). Diagnostika struktury matematických schopností (DISMAS): Recenze metody. Testfórum, 3(4), 58–64.…/TF2014-4-29 IBM Corp. (2013). IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp.

Jensen, A. R., & Weng, L.-J. (1994). What is a good g? Intelligence, 18(3), 231–258.…16/0160-2896(94)90029–9

Linacre, J. M. (2015). Winsteps® Rasch measurement computer program. Beaverton:

Linacre, J. M. (2016). Winsteps® Rasch measurement computer program User's Guide. Beaverton, Oregon:

Linacre, J. M. (2017, 5. ledna). Local dependencies in DIF analyses. Zpráva v internetové diskuzi. Dostupné na…dif-analyses.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Marko, M. (2016). Využitie a zneužitie Cronbachovej alfy pri hodnotení psychodiagnos­tických nástrojov. Testfórum, 5(7).…/TF2016-7-90 Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.…7/BF02296272

R Core Team. (2015). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Retrieved from

Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. New York: Routledge.

Revelle, W., & Zinbarg, R. E. (2009). Coefficients Alpha, Beta, Omega, and the glb: Comments on Sijtsma. Psychometrika, 74(1), 145–154.…6-008-9102-z

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. Retrieved from

Russell, R. L., & Ginsburg, H. P. (1984). Cognitive analysis of children’s mat­hematics difficulties. Cognition and Instruction, 1(2), 217–244.…690xci0102_3

Satorra, B., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.

Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., Graeff, A. de, Groenvold, M., … Sprangers, M. A. (2010, August 4). Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression. Health and Quality of Life Outcomes. BioMed Central.…77-7525-8-81

semTools Contributors. (2015). Useful tools for structural equation modeling. R package version 0.4–9. Retrieved from…age=semTools

Sijtsma, K. (2009). On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha. Psychometrika, 74(1), 107–120.…6-008-9101-0

Traspe, P., & Skalková, I. (2013). DISMAS: Diagnostika struktury matematických schopností. Praha: Národní ústav pro vzdělávání.

Walker, C. M., & Beretvas, S. N. (2003). Comparing multidimensional and unidimensional proficiency classifications: multidimensional IRT as a diagnostic aid. Journal of Educational Measurement, 40(3), 255–275.…03.tb01107.x

Wang, M. W., & Stanley, J. C. (1970). Differential Weighting: A review of methods and empirical studies. Review of Educational Researchch, 40(5), 663–705.

Wechsler, D. (2002). WISC-III – Wechslerova inteligenční škála pro děti. Přeložili D. Krejčířová, P. Boschek, P., J. Dan. Praha: Testcentrum.

Woodhouse, B., & Jackson, P. H. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: II: A search procedure to locate the greatest lower bound. Psychometrika, 42(4), 579–591.…7/BF02295980

Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Otawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Zumbo, B., Gadermann, A., & Zeisser, C. (2007). Ordinal versions of coefficients alpha and theta for Likert rating scales. Journal of Modern Applied Statistical Methods. Retrieved from…/vol6/iss1/4

Patefield, W. (1981). An efficient method of generating random r × c tables with given row and column totals. Journal of the Royal Statistical Society. Series C (Applied Statistics), 30(1), 91–97.

Show all Hide