Reliability

Specification: Reliability across all methods of investigation. Ways of assessing reliability: testretest and interobserver; improving reliability.

Reliability is a measure of consistency. For example, if you are using a tape measure, you expect to get the same results every time you measure a certain object. If the results are not consistent, then the measure is not reliable. In psychology, the expectations are the same; if researchers are using a questionnaire to measure levels of depression, they want to ensure that the measure is consistent between participants and over time.

Test re-test method

One very straightforward way of testing whether a tool is reliable is using the testretest method. Quite simply, the same person or group of people are asked to undertake the research measure, e.g. a questionnaire, on different occasions.

 

When using the testretest method, it is important to remember that the same group of participants are being studies twice, so researchers need to be aware of any potential demand characteristics. For example, if the same measure is given twice in one day, there is a strong chance that participants will be able to recall the responses they gave in the first test, and so psychologists could be testing their memory rather than the reliability of their measure. On the other hand, it is also important to make sure that there is not too much time between each test. For example, if psychologists are testing a measure of depression, and question the participants a year apart, it is possible that they may have recovered in that time, and so they give completely different responses for that reason, rather than that the questionnaire is not reliable.

 

After the measure has been completed on two separate occasions, the two scores are then correlated. If the correlation is shown to be significant, then the measure is deemed to have good reliability. A perfect correlation is 1, and so the closer the score is to this, the stronger the reliability of the measure, but a correlation of over +0.8 is also perfectly acceptable and seen as a good indication of reliability.

Inter-observer reliability

Interobserver reliability refers to the extent to which two or more observers are observing and recording behaviour in a consistent way. This is a particularly useful way of ensuring reliability in situations where there is a risk of subjectivity. For example, if a psychologist was making a diagnosis for a mental health condition, it would be a good idea for someone else to also make a diagnosis to check that they are both in agreement.

 

In psychology studies where behavioural categories are being applied, interobserver reliability is also important to make sure that the categories are being used in the correct manner. Psychologists would observe the same situation or event separately, and then their observations (or scores) would be correlated to see whether they are suitably similar.

 

An example from the attachment topic, where the use of operationalised behavioural categories have been employed, is that of Ainsworth’s Strange Situation. During the controlled observation, her research team were looking for instances of separation anxiety, proximity seeking, exploration and stranger anxiety across the eight episodes of the methodology. Ainsworth et al. (1978) found 94% agreement between observers and when interobserver reliability is assumed to a high degree, such as this, the findings are considered more meaningful.

 

If reliability is found to be poor, there are different ways in which it can be rectified depending on the type of measure being used. 

Improving reliability: questionnaires

For questionnaires, it will be possible to identify which questions that are having the biggest impact upon the reliability, and adjust them as necessary. If it is deemed that they are important items that must remain in the questionnaire, then rewriting them in a manner that reduces the potential for them to be incorrectly interpreted may be enough. For example, if the item in question is an open question, it may be possible to change it into a closed question, reducing possible responses and thereby limiting potential ambiguity. 

Improving reliability: interviews

If reliability needs improving in an interview, there are several factors that can be adjusted. Firstly, ensuring that the same interviewer is conducting all interviews will help reduce researcher bias; there is the potential for variation in the way that questions are asked which can then lead to different responses. Equally, some researchers may ask questions that are leading or are open to interpretation. If the same interviewer cannot be used throughout the interviewing process, then training should be provided in order to limit the potential bias. Further to this, changing the interview from unstructured to structured will limit researcher bias.

Improving reliability: experiments

In experiments, the level of control that the researcher has over variables is one way that reliability can be influenced. Laboratory experiments are often referred to as having high reliability due to the high level of control over the independent variable(s), which in turn makes them easier to replicate by following the standardised procedures. To improve the reliability within experiments researchers might try to take more control over extraneous variables, helping to further the potential for them to become confounding.

Improving reliability: observations

Observations can lack objectivity, since they are relying on the researcher’s interpretations of a situation. If behavioural categories are being used, it is important that the researcher is applying them accurately and not being subjective in their interpretations. One way to improve reliability in this instance would be to operationalise the behavioural categories. This means that the categories need to be clear and specific on what constitutes the behaviour in question. There should be no overlap between categories leaving no need for personal interpretation of the meaning.

Possible exam questions

Explain how the music teacher could have checked the reliability of the English language skill test. (3 marks)

What could the psychologist do to improve interrater reliability before continuing with the observational research on attachment? (4 marks)

Revision materials

Seneca learning


Online textbook