Statistical tests
Specification: When to use the following tests: Spearman’s rho, Pearson’s r, Wilcoxon, Mann‐Whitney, related t‐test, unrelated t‐test and Chi‐squared test.
Factors affecting the choice of statistical test
It is important to remember that when choosing a statistical test, an appropriate test must be selected and justified, otherwise the statistical analysis may be brought into question. In psychology, there are a number of important considerations that researchers must take into account when deciding on an appropriate statistical test.
Difference or association
The first important decision to make when choosing a statistical test of significance is whether the research hypothesis is looking to investigate a difference or a relationship. It is important to identify this factor first of all, as most statistical tests are designed to be used for one or the other specifically, and cannot be simply applied to data regardless.
Data that investigates a difference will most typically have two conditions, one control condition and one experimental condition. For example, imagine a researcher is investigating the impact of revision classes on exam scores. Participants in the experimental condition may have been given three additional revision classes to attend, whilst those in the control condition were not given any additional support. The psychologist would be hoping to see that the average exam result in the experimental condition was significantly higher than that of the control condition – looking for whether or not a difference between these two groups exists.
If a researcher was wanting to establish an association/relationship, however, their investigation would look quite different. Using the same example of the impact of revision on exam performance, each student would be asked to state how many hours of revision they had completed in preparation for the exam and this would be correlated against their final exam grade. The psychologist would therefore be investigating the relationship between the two co‐variables: number of hours of revision completed and the performance in the exam.
Experimental design
The second decision which is important to consider when selecting an appropriate statistical test is the research design that was used.
Psychologists will only need to consider the experimental design if they are looking for a difference, not an association. If they are looking for an association, then they can move onto the level of measurement, to help them decide which is the most appropriate statistical test.
The experimental design will have been identified as one of the following three: independent groups, where participants take part in only one of the conditions; repeated measures, where participants take part in all of the conditions; or matched pairs, where participants from one condition are matched with a participant from the other condition, who are considered to have similarities on a variable that is important for the sake of the investigation.
From the experimental design, the type of data – related or unrelated – can be decided. Related data refers to data in which participants in each condition are related in some manner; therefore, this would mean the researcher has used either repeated measures or matched pairs. Unrelated data refers to having two separate groups of people in each condition of the study, and so would refer to an independent groups design.
Once a researcher knows if they are looking for a difference or a relationship/association, their research design and what type of data they are working with then it is relatively straightforward to find out which statistical test to use.
Below is a table outlining which statistical test to use, based on these decisions.
Parametric and non-parametric tests
Whilst the table above provides an overview of the test that should be used, it is also important for researchers to consider whether the data they have is suitable for a parametric or a non‐parametric test.
In psychological research, it is preferable to be able to use a parametric test: these are much more powerful than non‐parametric tests, but require the data to meet certain assumptions before use.
Firstly, data should be interval level, because parametric tests use the actual score, rather than ranked data.
Secondly, the data should be drawn from an underlying normal distribution, so we would expect the data itself to be normally distributed.
Thirdly, there should be homogeneity of variance – the variances in the two groups should not be significantly different from one another. One way of testing for homogeneity of variance is to compare the standard deviation scores for each condition. Because the participants are drawn from the same population, it is expected that each condition would be similarly dispersed, particularly if the conditions are related, thus giving homogeneity of variance.
The parametric tests of difference that are required for the specification are the related t‐test and unrelated t‐test. If the interval level data does not meet the other two requirements for a parametric test then either the Mann‐Whitney test (independent samples design) or the Wilcoxon test (repeated measures or matched pairs design) should be selected as an alternative. The parametric test of correlation that is required for the specification is the Pearson’s R.
Exam Hint: While the critical values table will tell you whether the calculated value needs to less/more than the critical value for the results to be significant, a simple rule of thumb is that any test with a letter ‘r’ in the name is higher.
Possible exam questions
A psychologist was interested in studying the effects of a severely calorie controlled diet on memory performance and she expected recall to be reduced. The psychologist’s hypothesis was that participants’ scores on a memory test would be lower after following a severely calorie controlled diet than eating a non‐restricted diet. She gave the volunteer participants a memory test when they first arrived at her university research suite and a similar test at the end of a four‐week period and noted the memory scores on both memory tests.
State whether the hypothesis for this study is directional or non‐directional. (1 mark)
What it meant by the abbreviation df in inferential statistics? (1 mark)
A researcher has conducted an experiment using repeated measures to categorise people as either normal or abnormal. Identify the most appropriate test of significance which should be applied to this data. (1 mark)
Explain two factors that a researcher must consider when deciding to use the sign test. (2 marks)
Explain the difference between a one‐tailed test and a two‐tailed test. (2 marks)
A small group of psychologists wanted to see whether the use of diagrams in medical consultations would help patients recall medical information provided to them by a doctor.
In a laboratory experiment involving a role‐play between a patient and a doctor, volunteer participants were randomly allocated to one of two conditions.
Condition 1: a doctor used diagrams to present to participants a series of facts about a high sugar diet.
Condition 2: the same doctor presented the same series of facts about a high sugar diet to participants but without the use of diagrams.
At the end of the mock consultation, participants were tested on their recall of facts about a high sugar diet. Each participant was given a score out of five for the number of correct facts recalled.
Identify an appropriate statistical test that the psychologists could use to analyse the data and provide one reason why this test is most appropriate. (2 marks)
When would a Chi‐squared test of significance be appropriate to use? (3 marks)