The Errors of our Measurement

One of the issues with studying psychology is that we do not have direct access to the contents of people’s minds. Unlike other biological measurements, such as blood pressure, we rarely have ‘objective’ measures of people’s mood and thinking skills that we can get reliable recordings of over several occasions. We often have to make really good guesses based on the questions we ask.

math lady confused
Image from

If you have ever been to your GP about anxiety, you may have been asked questions about how often you have felt ‘nervous, anxious or on edge’ in the last week. This question is trying to measure your level of ‘anxiety’ but, because we cannot directly reach into your mind and pull out your level of anxiety, it is inferred from how you have answered the question. But your answer to the question is not a one-to-one representation of your anxiety. Other factors such as how you are feeling in the moment, how you remember events over the last week or even how you interpret the question can influence what your score is on the questionnaire. These factors are known as measurement error.

This does not mean that it is hopeless to try and measure the concept of anxiety. Often studies factor in what is thought to affect reported levels of anxiety, such as current levels of anxiety. Also, even though the name suggests so, measurement error also does not mean that someone has made an ‘error’ with measurement (although this sometimes can be the case!) It refers to all the unmeasured, sometimes random factors that affect what you are interested in measuring.

I didn’t really think much about measurement error before attending the National Centre for Research Methods (NCRM; masterclass at the University of Manchester last week. Harvey Goldstein, a distinguished Professor of Statistics at the University of Bristol and a speaker at the event, asserts that measurement error is a problem that does not get enough attention in research. If a questionnaire has high measurement error, then we will have less reliability in our findings. Going back to the anxiety example, this means that we will not get a consistent score on the anxiety scale, even if levels of anxiety remain the same. Therefore, it is important to model and make adjustments for a test’s reliability, which is often not done in research.

Goldstein provides a cautionary tale to illustrate this point. A paper by Feinstein (2003) analysed test scores and socio-economic status (SES) for children between the ages of 2 and 6. The results suggested that children who were grouped into the “high educational ability but low SES” at age 2 scored worse than the “low ability but high SES” children at age 6. This suggests that deprivation and inequality blunt educational achievement, even when the child was capable to begin with. Goldstein mentioned that this study was presented to Parliament to illustrate the damaging effects of poverty on educational achievement, revealing the powerful impact that studies can have to shape policy.

Feinstein 2003
Graph from Feinstein (2003)

Needless to say, the study was flawed. There was no consideration of the reliability of the ability test at age 2, which provides the baseline for the follow-up analysis. Therefore, we cannot determine how much of the score was due to the child’s ability and how much was due to measurement error. Goldstein and French (2015) addressed these issues in a follow-up paper by comparing scores across time and by factoring in reliability estimates. They found that high SES children scored better than low SES children across time, but the estimates were less dramatic than what was found by Feinstein (2003) depending on what reliability criteria were used. The implications were not completely refuted; SES is still important in educational ability, but it is less clear whether low SES can hinder a child’s future progress. The latter point was probably what got it noticed by Parliament in the first place but it may not be reflective of reality, which is what we want our policies to be based on.

Goldstein’s words after presenting this ‘cautionary tale’ resonated with me: “If you are using quantitative methods in your research, you are ethically bound to use best practice. You shouldn’t play with tools without understanding them.” I might write this on a post-it and stick it on my screen, so I am reminded of it whenever I am working on my PhD.



Feinstein, L. (2003). Inequality in the early cognitive development of British children in the 1970 cohort. Economica70(277), 73-97.

Goldstein, H. & French, R., Differential educational progress and measurement error. In Feinstein, L., Jerrim, J. & Vignoles, A., Goldstein, H. & French, R., Washbrook, E. & Lee, R. & Lupton, R. (2015). Social class differences in early cognitive development debate. Longitudinal and Life Course Studies, 6, 331-376.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s