For OUCEA Annual Reports, click here. Other publications are listed below.

2017 or in press

Elwood, J., Hopfenbeck, T. & Baird, J. (2017) Predictability in high-stakes examinations: students’ perspectives on a perennial assessment dilemma, Research Papers in Education, 32 (1), 1-17.

Hopfenbeck, T.N., Lenkeit, J., El Masri, Y., Cantrell, K., Ryan, J. & Baird, J. (2017) Lessons Learned from PISA: A Systematic Review of Peer-Reviewed Articles on the Programme for International Student AssessmentScandinavian Journal of Educational Research [online].


Lenkeit, J (2016) Review of National Reports on PIRLS. Review of National Reports on PIRLS. Oxford University Centre for Educational Assessment Report OUCEA/16/1

Baird, J.-A, Caro, D.H. & Hopfenbeck, T.N. (2016) Student Perceptions of Predictability of Examination Requirements and Relationship with Outcomes in High-Stakes Tests in IrelandIrish Educational Studies (online). Click here to download – the first 50 downloads are free.

Baird, J. & Hopfenbeck, T.N. (2016) Curriculum in the Twenty-First Century and the Future of Examinations (Chapter 51), in: Wyse, D. Hayward, L. & Pandya, J. (Eds.) The Sage Handbook of Curriculum, Pedagogy and Assessment, Vol. 2.

Baird, J. & Gray, L. (2016) The meaning of curriculum-related examination standards in Scotland and England: a home–international comparison, Oxford Review of Education, 43 (2). 266-284. Click here to download – the first 50 downloads are free.

Baird, J., Johnson, S., Hopfenbeck, T.N., Isaacs, T., Sprague, T., Stobart, G. & Yu, G (2016) On the supranational spell of PISA in policy, Educational Research, 58 (2), 121-138. Special Issue: International Policy Borrowing and Evidence-based Educational Policy Making: Relationships and Tensions. Click here to download – the first 50 downloads are free.

Caro, D.H. & Biecek, P. (in press) intsvy: An R Package for Analysing International Large-Scale Assessment Data, Journal of Statistical Software.

Caro, D.H., Lenkeit, J. & Kyriakides, L. (2016) Teaching strategies and differential effectiveness across learning contexts: Evidence from PISA 2012Studies in Educational Evaluation, 49, 30-41.

Elliott, V., Baird, J., Hopfenbeck, T.N., Ingram, J., Thompson, I., Usher, N., Zantout, M., Richardson, J. & Coleman, R. (2016) A marked improvement? A review of the evidence on written marking. Education Endowment Foundation.

El Masri, Y.H., Baird, J. & Graesser, A. (2016) Language effects in international testing: the case of PISA 2006 science items, Assessment in Education: Principles, Policy & Practice [online]. Click here to download – the first 50 downloads are free.

El Masri, Y.H., Ferrara, S., Foltz, P.W. & Baird, J. (2016) Predicting item difficulty of science national curriculum tests: the case of key stage 2 assessments, The Curriculum Journal, 28 (1), 59-82.  Click here to download – the first 50 downloads are free.

Hopfenbeck. T.N. (2016) Å lykkes med elevvurdering (Succeeding with student assessment). Fagbokforlaget.

Hopfenbeck, T.N. & Kjaernsli, M. (2016) Students’ test motivation in PISA: the case of Norway, The Curriculum Journal, 27 (3), 406-422.

Newton, P.E. & Baird, J. (2016) The great validity debate (Editorial), Assessment in Education: Principles, Policy & Practice, 23 (2) – Special Issue on Validity. Click here to download – the first 50 downloads are free.


Baird, J., Meadows, M., Leckie, G, & Caro, D. (2015) Rater accuracy and training group effects in Expert- and Supervisor-based monitoring systemsAssessment in Education: Principles, Policy & Practice [online].

Baird, J., Hopfenbeck, T.N., Elwood, J., Caro, D. & Ahmed, A. (2015) Predictability in the Irish Leaving Certificate, Report commissioned by the State Examinations Commission, Ireland.

Caro, D.H. (2015) Causal mediation in educational research: An illustration using international assessment dataJournal of Research on Educational Effectiveness, 8 (4), 577-597.

Elwood, J., Hopfenbeck, T.N. & Baird, J. (2015 online) Predictability in high-stakes examinations: students’ perspectives on a perennial assessment dilemma, Research Papers in Education. Click here to download – the first 50 downloads are free.

Hopfenbeck, T.N. (2015) Lead Editors’ editorial introductionAssessment in Education: Principles, Policy & Practice, 22 (2), 179-181. Special Issue: Sociocultural Theoretical Perspectives on Assessment: Exploring Links, Limitations and Emerging Considerations.

Hopfenbeck, T.N. (2015) Formative assessment, grading and teacher judgement in times of change (Editorial), Assessment in Education: Principles, Policy & Practice, 22 (3), 299-301.

Hopfenbeck, T.N. (2015) On test development and accuracy in self-assessment (Editorial), Assessment in Education: Principles, Policy & Practice, 22 (4), 393-396.

Hopfenbeck, T.N., Flórez Petour, M.T. & Tolo, A. (2015) Balancing tensions in educational policy reforms: large-scale implementation of Assessment for Learning in NorwayAssessment in Education: Principles, Policy & Practice, 22 (1), 44-60. Special Issue: Assessment for Learning: Lessons Learned from Large-Scale Evaluations of Implementations.

Hopfenbeck, T.N. & Stobart, G. (2015) Large-scale implementation of Assessment for Learning (Editorial), Assessment in Education: Principles, Policy & Practice, 22 (1), 1-2. Special Issue: Assessment for Learning: Lessons Learned from Large-Scale Evaluations of Implementations.

Lenkeit, J., Caro, D.H. & Strand, S. (2015) Tackling the remaining attainment gap between students with and without immigrant background: an investigation into the equivalence of SES constructsEducational Research and Evaluation, 21 (1), 60-83.

Lenkeit, J., Chan, J., Hopfenbeck, T.N., & Baird, J.-A. (2015) A review of the representation of PIRLS related research in scientific journals, Educational Research Review, 16, 102-115.


Baird, J. (2014) Teachers’ views on assessment practices (Editorial), Assessment in Education: Principles, Policy & Practice, 21 (4), 361-364. Click here to download – the first 50 downloads are free.

Baird, J., Hopfenbeck, T.N., Newton, P., Stobart, G. & Steen-Utheim, A.T. (2014) State of the Field Review: Assessment and Learning. Report for the Norwegian Knowledge Centre for Education.

Baird, J. (2014) EditorialAssessment in Education: Principles, Policy & Practice, 21 (1), 1-3

Baird, J. (2014) Assessment and Attitude (Editorial), Assessment in Education: Principles, Policy & Practice, 21 (2), 129-132. Click here to download – the first 50 downloads are free.

Baird, J. (2014) EditorialAssessment in Education: Principles, Policy & Practice, 21 (1), 1-3. Click here to download – the first 50 downloads are free.

Caro, D.H., Cortina, K.S., & Eccles, J. (2014) Socioeconomic Background, Education, and Labor Force Outcomes: Evidence from a Regional U.S. SampleBritish Journal of Sociology of Education [online].

Hopfenbeck, T.N. (2014) Strategier for læring: Om selvregulering, vurdering og god undervisning (Strategies for Learning: self-regulation, assessment and good teaching). Oslo, Universitetsforlaget.

Hopfenbeck, T.N. (2014) Testing Times: Fra PISA til nasjonale prover. Intensjoner, ansvar og anvendelse. (Testing Times: From PISA to national tests. Intentions, accountability and applications.) Chapter 23, 401–419, in: J.H. Stray & L. Wittek, Pedagogikk, en grunnbok. Cappelen Damm Akademisk, ISBN: 978-82-02-41424-5.

Lenkeit, J., & Caro. D.H. (2014) Performance status and change – Measuring education system effectiveness with data from PISA 2000-2009, Educational Research and Evaluation, 20 (2), 146-174.

Mirazchiyski, P., Caro, D.H., Sandoval-Hernandez, A. (2014) Youth Future Civic Participation in Europe: Differences between the East and the RestSocial Indicators Research, 115 (3), 1031–1055.

Nyhamn, F. & Hopfenbeck, T.N. (2014) (Eds) From Political Decisions to Change in the Classroom: Successful Implementation of Education PolicyCIDREE Yearbook 2014.


Baird, J. (2013) The currency of assessments (Editorial), Assessment in Education: Principles, Policy & Practice, 20 (2), 147-149.

Baird, J. (2013) Judging students’ performances (Editorial), Assessment in Education: Principles, Policy & Practice, 20 (3), 247-249.

Baird, J., Ahmed, A., Hopfenbeck, T.N., Brown, C. & Elliott, V. (2013) Research evidence relating to proposals for reform of the GCSE. OUCEA Report.

Baird, J. & Black, P. (2013) (Eds) The reliability of public examinations, Research Papers in Education, 28 (1), 1-4, Special Issue.

Baird, J. & Black, P. (2013) Test theories, educational priorities and reliability of public examinations in EnglandResearch Papers in Education, 28 (1), 5-21.

Baird, J., Hayes, M., Johnson, R., Johnson, S. & Lamprianou, I. (2013) Marker effects and examination reliability: A comparative exploration from the perspectives of generalizability theory, Rasch modelling and multilevel modelling. Report commissioned by the Office of Qualifications and Examinations Regulation. Ofqual/13/5261.

Caro, D.H., Sandoval-Hernandez, A., & Lüdtke, O. (2013) Cultural, Social and Economic Capital Constructs in International Assessments: An Evaluation Using Exploratory Structural Equation ModelingSchool Effectiveness and School Improvement [online].

Elwood, J. & Baird, J. (2013) (Eds) Students: researching voice, aspirations and perspectives in the context of educational policy change in the 14–19 phase, London Review of Education, 11 (2), Special Issue.

Hopfenbeck, T.N. (2013) What did you learn in school today?, in: J. Hattie, T.S. Wille, M. Hermansen, T.N. Hopfenbeck, C. Madsen, P. Kirkegaard, H. Bjerresgaard, C.E. Weinstein, I. Bråten & R. Andreassen (Eds) Feedback og vurdering for laering, in Danish. ISBN: 978-87-7281-685-2

Hopfenbeck, T.N. (2013) Students’ voice, aspirations, and perspectives: international reflections and comparisonsLondon Review of Education, 11 (2), 179-183.

Hopfenbeck, T.N., Tolo, A., Florez, T. & El Masri, Y. (2013) Balancing Trust and Accountability? The Assessment for Learning Programme in Norway. Report for OECD.

Lenkeit, J. (2013) Effectiveness measures for cross-sectional studies: A comparison of value-added models and contextualised attainment models, School Effectiveness and School Improvement, 24 (1), 39-63.

Lillejord, S. & Hopfenbeck, T.N. (2013) Vurdering og læring i skolen, in: Lillejord, S., Manger, T. & Nordahl, T. (Eds) Livet i skolen 2: Grunnbok i pedagogikk og elevkunnskap. Lærerprofesjonalitet, 231-259.

Rose, J. & Baird, J. (2013) Aspirations and an austerity state: young people’s hopes and goals for the future, London Review of Education, 11 (2), 157-153. Special Issue: Students: researching voice, aspirations and perspectives in the context of educational policy change in the 14–19 phase.

Simpson, L. & Baird, J. (2013) Perceptions of trust in public examinations, Oxford Review of Education, 39 (1), 17-35.


Baird, J. (2012) EditorialAssessment in Education: Principles, Policy & Practice, 19 (4), 389-391.

Baird, J. (2012) Do we need marking at all? (Editorial), Assessment in Education: Principles, Policy & Practice, 19 (3), 277-279.

Baird, J. (2012) Science and misfits (Editorial), Assessment in Education: Principles, Policy & Practice, 19 (2), 141-145.

Baird, J., Elwood, J. & Isaacs, T. (2012) Written evidence submitted to the Education Select Committee’s Inquiry into the administration of examinations for 15-19 year olds in England.

Baird, J., Pillinger, R. & Steele, F. (2012) Use of the LEMMA Online Learning Materials, Report prepared for the LEMMA (Learning Environment for Multilevel Modelling and Applications) node, University of Bristol, January. For background information click here.

Baird, J., Rose, J. & McWhirter, A. (2012) So tell me what you want: a comparison of FE college and other post-16 students’ aspirations, Research in Post-Compulsory Education, 17 (3), 293-310.

Caro, D.H. (2012) Evidencia causal en estudios educativos con bases de datos observables. [Causal evidence in educational studies with observational data]. In E. Vásquez (Ed) Inversión social: indicadores, bases de datos e iniciativas. [Social investment: indicators, data sets and initiatives]. Lima: Universidad del Pacífico.

Caro. D.H. & Cortés, D. (2012) Measuring family socioeconomic status: An illustration using data from PIRLS 2006, IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 5, 9-33.

Caro, D.H. & Lenkeit, J. (2012) An analytical approach to study educational inequalities: 10 hypothesis tests in PIRLS  2006International Journal of Research and Method in Education, 35 (1), 3-30.

Caro, D.H. & Mirazchiyski, P. (2012) Socioeconomic Gradients in Eastern European Countries: evidence from PIRLS 2006European Educational Research Journal, 11 (1), 96-110.

Caro, D.H. & Schulz, W. (2012) Ten Hypotheses about Tolerance among Latin American AdolescentsCitizenship, Social and Economics Education, 11 (3), 213-234.

Daly, A., Baird, J., Chamberlain, S. & Meadows, M. (2012) Assessment reform: students’ and teachers’ responses to the introduction of stretch and challenge at A-levelCurriculum Journal, 23 (2), 173-187.

Eklof, H., Hopfenbeck, T.N. & Kjaernsli, M. (2012) Hva vet vi om elevers testmotivasjon? Erfaringer fra internasjonale og nasjonale undersokelser i Norge og Sverige (What do we know about students’ test motivation? Experiences of international and national tests in Norway and Sweden). Chapter 6, in: T.N. Hopfenbeck, M. Kjaernsli & R.V. Olsen (Eds) Kvalitet i norsk skole. Internasjonale og nasjonale undersokelser av laeringsutbytte og undervisning. (Quality in the Norwegian school. International and national tests of learning outcomes and teaching). Oslo, Universitetsforlaget ISBN 978-82-15-02004-4.

Hellekjaer, G.O. & Hopfenbeck, T.N. (2012) CLIL og lesing. En sammenligning av Vg3-elevers leseferdigheter og lesestrategibruk I 2002 og 2011. Report to the Norwegian Centre for Foreign Languages in Education, investigating students’ reading comprehension at the age of 18, comparing IB, CLIL and ordinary ESL students in Upper Secondary Schools.

Hopfenbeck, T.N. (2012) Strategier for laering. Om selvregulering og strategimaalinger i PISA (Strategies for learning. Self-regulating and strategy measuring in PISA). Chapter 5, in: T.N. Hopfenbeck, M. Kjaernsli & R.V. Olsen (Eds) – see above.

Hopfenbeck, T.N. (2012) The Role and Value of International Datasets and Comparisons in Education Research, Guest Editorial in International Datasets and Comparisons in EducationResearch Intelligence, 119, British Educational Research Association, 7-8.

Hopfenbeck, T.N., Throndsen, I., Lie, S. & Dale, E.L. (2012) Assessment with distinctly defined criteria: A research study of a National projectPolicy Futures in Education, 10 (4), 421-433.

Oancea, A. and Hopfenbeck, T.N. (2012) (Eds) International Datasets and Comparisons in Education, Research Intelligence, 119, British Educational Research Association.

Olsen, R.V., Hopfenbeck, T.N., Lillejord, S. & Roe, A. (2012) Elevenes læringssituasjon etter innføringen av ny reform, Acta Didactica Oslo. 1/2012.


Baird, J. (2011) Does the learning happen inside the black box? (Editorial), Assessment in Education: Principles, Policy & Practice, 18 (4), 343-345.

Baird, J. (2011) Why do people appeal Higher Education grades and what can it tell us about the meaning of standards? Assessment in Education: Principles, Policy & Practice, 18 (1), 1-4.

Baird, J., Béguin, A., Black, P., Pollitt, A. & Stanley, G. (2011) The Reliability Programme: Final Report of the Technical Advisory Group. Coventry: Ofqual/11/4825. Chapter 20, in: Q. He, & D. Opposs (Eds) Ofqual’s Reliability Compendium. Office of Qualifications and Examinations Regulation, Ofqual/12/5117. ISBN 978-0-85743-016-8.

Baird, J., Elwood, J., Duffy, G., Feiler, A., O’Boyle, A., Rose, J. & Stobart, G (2011) 14-19 Centre Research Study: educational reforms in schools and colleges in England Annual Report. London: QCDA.

Baird, J., Isaacs, T., Johnson, S., Stobart, G., Yu, G., Sprague, T. & Daugherty, R. (2011) Policy Effects of PISA. Report commissioned by Pearson UK.

Hopfenbeck, T.N. (2011) Fostering self-regulated learners in a community of quality assessment practicesCADMO, (1) 7-21.

Hopfenbeck, T.N. & Maul, A. (2011) Examining Evidence for the Validity of PISA Learning Strategy Scales Based on Student Response Processes, International Journal of Testing, 11 (2), 95-121.

Leckie, G. & Baird, J. (2011) Rater Effects on Essay Scoring: A Multilevel Analysis of Severity Drift, Central Tendency, and Rater ExperienceJournal of Educational Measurement, 48 (4), 399-418.


Baird, J. (2010) Construct‐irrelevant variance sometimes has consequential validity (Editorial), Assessment in Education: Principles, Policy & Practice, 17 (4), 339-343. Click here to download – the first 50 downloads are free.

MacCann, R.G. & Stanley, G. (2010) Classification consistency when scores are converted to grades: examination marks versus moderated school assessments, Assessment in Education: Principles, Policy & Practice, 17 (3), 255-272.

MacCann, R.G. & Stanley, G. (2010) Extending participation in standard setting: an online judging proposal, Educational Assessment, Evaluation and Accountability, 22 (2), 139-157.

Stanley, G. & Lee, J.C. (2010) Future Educational Reform Policies and Measures, in: J.C. Lee & B. Caldwell (Eds) Changing Schools in an Era of Globalization. New York, Routledge.


MacCann, R.G. & Stanley, G. (2009) Item banking with embedded standards, Practical Assessment Research & Evaluation, 14(17).

MacCann, R.G. (2009) Standard setting with dichotomous and constructed response items: some Rasch model approachesJournal of Applied Measurement, 10(4), 438-454.

Stanley, G. & MacCann, R.G. (2009) Incorporating industry specific training into school education: enrolment and performance trends in a senior secondary system, Journal of Vocational Education and Training, 61(4), 459-466.

Stanley, G., MacCann, R., Gardner, J., Reynolds, L. & Wild, I. (2009) Review of Teacher Assessment: Evidence of What Works Best and Issues for Development. Report commissioned by QCA.


Stanley, G. (2008) National Numeracy Review Report. Canberra: Council of Australian Governments. ISBN 0642 77735 7.

Stanley, G. & Tognolini, J. (2008) Performance with respect to standards in public examinations, Proceedings of the 34th IAEA Conference, Cambridge, UK.