Reliability

Specification: Reliability across all methods of investigation. Ways of assessing reliability: test‐retest and inter‐observer; improving reliability.

Reliability is a measure of consistency. For example, if you are using a tape measure, you expect to get the same results every time you measure a certain object. If the results are not consistent, then the measure is not reliable. In psychology, the expectations are the same; if researchers are using a questionnaire to measure levels of depression, they want to ensure that the measure is consistent between participants and over time.

Test re-test method

One very straightforward way of testing whether a tool is reliable is using the test‐retest method. Quite simply, the same person or group of people are asked to undertake the research measure, e.g. a questionnaire, on different occasions.

When using the test‐retest method, it is important to remember that the same group of participants are being studies twice, so researchers need to be aware of any potential demand characteristics. For example, if the same measure is given twice in one day, there is a strong chance that participants will be able to recall the responses they gave in the first test, and so psychologists could be testing their memory rather than the reliability of their measure. On the other hand, it is also important to make sure that there is not too much time between each test. For example, if psychologists are testing a measure of depression, and question the participants a year apart, it is possible that they may have recovered in that time, and so they give completely different responses for that reason, rather than that the questionnaire is not reliable.

After the measure has been completed on two separate occasions, the two scores are then correlated. If the correlation is shown to be significant, then the measure is deemed to have good reliability. A perfect correlation is 1, and so the closer the score is to this, the stronger the reliability of the measure, but a correlation of over +0.8 is also perfectly acceptable and seen as a good indication of reliability.

Inter-observer reliability

Inter‐observer reliability refers to the extent to which two or more observers are observing and recording behaviour in a consistent way. This is a particularly useful way of ensuring reliability in situations where there is a risk of subjectivity. For example, if a psychologist was making a diagnosis for a mental health condition, it would be a good idea for someone else to also make a diagnosis to check that they are both in agreement.

In psychology studies where behavioural categories are being applied, inter‐observer reliability is also important to make sure that the categories are being used in the correct manner. Psychologists would observe the same situation or event separately, and then their observations (or scores) would be correlated to see whether they are suitably similar.

An example from the attachment topic, where the use of operationalised behavioural categories have been employed, is that of Ainsworth’s Strange Situation. During the controlled observation, her research team were looking for instances of separation anxiety, proximity seeking, exploration and stranger anxiety across the eight episodes of the methodology. Ainsworth et al. (1978) found 94% agreement between observers and when inter‐observer reliability is assumed to a high degree, such as this, the findings are considered more meaningful.

If reliability is found to be poor, there are different ways in which it can be rectified depending on the type of measure being used.

Improving reliability: questionnaires

For questionnaires, it will be possible to identify which questions that are having the biggest impact upon the reliability, and adjust them as necessary. If it is deemed that they are important items that must remain in the questionnaire, then rewriting them in a manner that reduces the potential for them to be incorrectly interpreted may be enough. For example, if the item in question is an open question, it may be possible to change it into a closed question, reducing possible responses and thereby limiting potential ambiguity.

Improving reliability: interviews

If reliability needs improving in an interview, there are several factors that can be adjusted. Firstly, ensuring that the same interviewer is conducting all interviews will help reduce researcher bias; there is the potential for variation in the way that questions are asked which can then lead to different responses. Equally, some researchers may ask questions that are leading or are open to interpretation. If the same interviewer cannot be used throughout the interviewing process, then training should be provided in order to limit the potential bias. Further to this, changing the interview from unstructured to structured will limit researcher bias.

Improving reliability: experiments

In experiments, the level of control that the researcher has over variables is one way that reliability can be influenced. Laboratory experiments are often referred to as having high reliability due to the high level of control over the independent variable(s), which in turn makes them easier to replicate by following the standardised procedures. To improve the reliability within experiments researchers might try to take more control over extraneous variables, helping to further the potential for them to become confounding.

Improving reliability: observations

Observations can lack objectivity, since they are relying on the researcher’s interpretations of a situation. If behavioural categories are being used, it is important that the researcher is applying them accurately and not being subjective in their interpretations. One way to improve reliability in this instance would be to operationalise the behavioural categories. This means that the categories need to be clear and specific on what constitutes the behaviour in question. There should be no overlap between categories leaving no need for personal interpretation of the meaning.

Possible exam questions

Define what is meant by the term reliability. (2 marks)
A music teacher was interested in studying whether there was a relationship between English language skill and musical aptitude. He decided to investigate this with Year 11 students in the school where he worked. He randomly chose 20 students, from the 200 in the year group, and gave each of them two tests. He used part of a GCSE exam paper to test their English language skill. The higher the mark achieved, the better the command of the English language was assumed. The teacher could not find a musical ability test that was suitable for the investigation, so he invented his own. He asked each of the 20 students to choose a song to sing for him acapella which he then rated on a scale of 1–5, where 1 was out of tune and 5 was in tune.

Explain how the music teacher could have checked the reliability of the English language skill test. (3 marks)

A psychologist used the observational method to look at behaviours indicative of attachment between primary caregivers and their infants. Pairs of observers watched a single child interact with the mother for a twenty‐minute period. They noted the number times the child sought contact and used the parent as a safe base to go and explore. After seeing the first round of ratings from the observers, the psychologist becomes concerned about the quality of inter‐rater reliability.

What could the psychologist do to improve inter‐rater reliability before continuing with the observational research on attachment? (4 marks)

A psychologist was interested in studying student stress levels in their third year of their degree course. She asked an academic colleague for feedback on her method who reported concern that the psychologist had not checked the reliability and validity of the questionnaire used to measure the level of stress. Explain how the psychologist could check the reliability and the validity of the stress questionnaire. (5 marks)
Identify and explain two or more ways of improving reliability. (6 marks)