Develop a proposed PSYCHOLOGY study that can make use of either the between-subjects (independent-sample) or within-subjects (paired sample) t-tests. Identify the grouping variable (and level of measurement) and the dependent variable. Specify the relevant population (sample). Be sure to indicate which t-test you would be using. Write a null and alternative hypothesis.

Complete the following readings from your textbook, *Essentials of Statistics for the Behavioral Sciences:*

- Chapter 10: The t-test for two Independent Samples
- Chapter 11: The t-test for two Related Samples

Reporting Statistics in Psychology

This document contains general guidelines for the reporting of statistics in psychology re- search. The details of statistical reporting vary slightly among different areas of science and also among different journals.

General Guidelines

Rounding Numbers For numbers greater than 100, report to the nearest whole number (e.g., M = 6254). For numbers between 10 and 100, report to one decimal place (e.g., M = 23.4). For numbers be- tween 0.10 and 10, report to two decimal places (e.g., M = 4.34, SD = 0.93). For numbers less than 0.10, report to three decimal places, or however many digits you need to have a non-zero number (e.g., M = 0.014, SEM = 0.0004).

For numbers… Round to… SPSS Report

Greater than 100 Whole number 1034.963 1035

10 – 100 1 decimal place 11.4378 11.4

0.10 – 10 2 decimal places 4.3682 4.37

0.001 – 0.10 3 decimal places 0.0352 0.035

Less than 0.001 As many digits as needed for non-zero 0.00038 0.0004

Do not report any decimal places if you are reporting something that can only be a whole number. For example, the number of participants in a study should be reported as N = 5, not N = 5.0.

Report exact p-values (not p < .05), even for non-significant results. Round as above, unless SPSS gives a p-value of .000; then report p < .001. Two-tailed p-values are assumed. If you are reporting a one-tailed p-value, you must say so.

Omit the leading zero from p-values, correlation coefficients (r), partial eta-squared (ηp2), and other numbers that cannot ever be greater than 1.0 (e.g., p = .043, not p = 0.043).

Statistical Abbreviations Abbreviations using Latin letters, such as mean (M) and standard deviation (SD), should be italicised, while abbreviations using Greek letters, such as partial eta-squared (ηp2), should not be italicised and can be written out in full if you cannot use Greek letters. There should be a space before and after equal signs. The abbreviations should only be used inside of pa- rentheses; spell out the names otherwise.

Inferential statistics should generally be reported in the style of: “statistic(degrees of freedom) = value, p = value, effect size statistic = value”

Statistic Example

Mean and standard deviation M = 3.45, SD = 1.21

Mann-Whitney U = 67.5, p = .034, r = .38

Wilcoxon signed-ranks Z = 4.21, p < .001

Sign test Z = 3.47, p = .001

t-test t(19) = 2.45, p = .031, d = 0.54

ANOVA F(2, 1279) = 6.15, p = .002, ηp 2 = 0.010

Pearson’s correlation r(1282) = .13, p < .001

1

Reporting Statistics in Psychology

Descriptive Statistics Means and standard deviations should be given either in the text or in a table, but not both.

The average age of participants was 25.5 years (SD = 7.94).

The age of participants ranged from 18 to 70 years (M = 25.5, SD = 7.94). Age was non-normally distributed, with skewness of 1.87 (SE = 0.05) and kurtosis of 3.93 (SE = 0.10)

Participants were 98 men and 132 women aged 17 to 25 years (men: M = 19.2, SD = 2.32; women: M = 19.6, SD = 2.54).

Non-parametric tests Do not report means and standard deviations for non-parametric tests. Report the median and range in the text or in a table. The statistics U and Z should be capitalised and italicised. A measure of effect size, r, can be calculated by dividing Z by the square root of N (r = Z / √N).

Mann-Whitney Test (2 Independent Samples…)

A Mann-Whitney test indicated that self-rated attractiveness was greater for women who were not using oral contraceptives (Mdn = 5) than for women who were using oral contraceptives (Mdn = 4), U = 67.5, p = .034, r = .38.

Wilcoxon Signed-ranks Test (2 Related Samples…)

A Wilcoxon Signed-ranks test indicated that femininity was preferred more in female faces (Mdn = 0.85) than in male faces (Mdn = 0.65), Z = 4.21, p < .001, r = .76.

2

Reporting Statistics in Psychology

Sign Test (2 Related Samples…)

A sign test indicated that femininity was preferred more in female faces than in male faces, Z = 3.47, p = .001.

T-tests Report degrees of freedom in parentheses. The statistics t, p and Cohen’s d should be re- ported and italicised.

One-sample t-test

One-sample t-test indicated that femininity preferences were greater than the chance level of 3.5 for female faces (M = 4.50, SD = 0.70), t(30) = 8.01, p < .001, d = 1.44, but not for male faces (M = 3.46, SD = 0.73), t(30) = -0.32, p = .75, d = 0.057.

The number of masculine faces chosen out of 20 possible was compared to the chance value of 10 using a one-sample t-test. Masculine faces were chosen more often than chance, t(76) = 4.35, p = .004, d = 0.35.

Paired-samples t-test Report paired-samples t-tests in the same way as one-sample t-tests.

A paired-samples t-test indicated that scores were significantly higher for the pathogen subscale (M = 26.4, SD = 7.41) than for the sexual subscale (M = 18.0, SD = 9.49), t(721) = 23.3, p < .001, d = 0.87.

3

Reporting Statistics in Psychology

Scores on the pathogen subscale (M = 26.4, SD = 7.41) were higher than scores on the sexual subscale (M = 18.0, SD = 9.49), t(721) = 23.3, p < .001, d = 0.87. A one- tailed p-value is reported due to the strong prediction of this effect.

Independent-samples t-test

An independent-samples t-test indicated that scores were significantly higher for women (M = 27.0, SD = 7.21) than for men (M = 24.2, SD = 7.69), t(734) = 4.30, p < .001, d = 0.35.

If Levene’s test for equality of variances is significant, report the statistics for the row equal variances not assumed with the altered degrees of freedom rounded to the nearest whole number.

Scores on the pathogen subscale were higher for women (M = 27.0, SD = 7.21) than for men (M = 24.2, SD = 7.69), t(340) = 4.30, p < .001, d = 0.35. Levene’s test indicated unequal variances (F = 3.56, p = .043), so degrees of freedom were adjusted from 734 to 340.

ANOVAs ANOVAs have two degrees of freedom to report. Report the between-groups df first and the within-groups df second, separated by a comma and a space (e.g., F(1, 237) = 3.45). The measure of effect size, partial eta-squared (ηp2), may be written out or abbreviated, omits the leading zero and is not italicised.

One-way ANOVAs and Post-hocs

Analysis of variance showed a main effect of self-rated attractiveness (SRA) on preferences for femininity in female faces, F(2, 1279) = 6.15, p = .002, ηp2 = .010. Post- hoc analyses using Tukey’s HSD indicated that femininity preferences were lower for participants with low SRA than for participants with average SRA (p = .014) and high SRA (p = .004), but femininity preferences did not differ significantly between participants with average and high SRA (p = .82).

4

Reporting Statistics in Psychology

2-way Factorial ANOVAs

A 3×2 ANOVA with self-rated attractiveness (low, average, high) and oral contraceptive use (true, false) as between-subjects factors revealed a main effects of SRA, F(2, 1276) = 6.11, p = .002, ηp2 = .009, and oral contraceptive use, F(1, 1276) = 4.38, p = .037, ηp2 = 0.003. These main effects were not qualified by an interaction between SRA and oral contraceptive use, F(2, 1276) = 0.43, p = .65, ηp2 = .001.

3-way ANOVAs and Higher Although some textbooks suggest that you report all main effects and interactions, even if not significant, this reduces the understandability of the results of a complex design (i.e. 3-way or higher). Report all significant effects and all predicted effects, even if not significant. If there are more than two non-significant effects that are irrelevant to your main hypotheses (e.g. you predicted an interaction among three factors, but did not predict any main effects or 2- way interactions), you can summarise them as in the example below.

A mixed-design ANOVA with sex of face (male, female) as a within-subjects factor and self-rated attractiveness (low, average, high) and oral contraceptive use (true, false) as between-subjects factors revealed a main effect of sex of face, F(1, 1276) = 1372, p < .001, ηp2 = .52. This was qualified by interactions between sex of face and SRA, F(2, 1276) = 6.90, p = .001, ηp2 = .011, and between sex of face and oral contraceptive use, F(1, 1276) = 5.02, p = .025, ηp2 = .004. The predicted interaction among sex of face, SRA and oral contraceptive use was not significant, F(2, 1276) = 0.06, p = .94, ηp2 < .001. All other main effects and interactions were non-significant and irrelevant to our hypotheses, all F ≤ 0.94, p ≥ .39, ηp2 ≤ .001.

Violations of Sphericity and Greenhouse-Geisser Corrections ANOVAs are not robust to violations of sphericity, but can be easily corrected. For each within-subjects factor with more than two levels, check if Mauchly’s test is significant. If so, report chi-squared (χ2), degrees of freedom, p and epsilon (ε) as below and report the Greenhouse-Geisser corrected values for any effects involving this factor (rounded to the appropriate decimal place). SPSS will report a chi-squared of .000 and no p-value for within- subjects factors with only two levels; corrections are not needed.

5

Reporting Statistics in Psychology

Data were analysed using a mixed-design ANOVA with a within-subjects factor of subscale (pathogen, sexual, moral) and a between-subject factor of sex (male, female). Mauchly’s test indicated that the assumption of sphericity had been violated (χ2(2) = 16.8, p < .001), therefore degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.98). Main effects of subscale, F(1.91, 1350.8) = 378, p < .001, ηp2 = .35, and sex, F(1, 709) = 78.8, p < .001, ηp2 = . 10, were qualified by an interaction between subscale and sex, F(1.91, 1351) = 30.4, p < .001, ηp2 = .041.

ANCOVA

An ANCOVA [between-subjects factor: sex (male, female); covariate: age] revealed no main effects of sex, F(1, 732) = 2.00, p = .16, ηp2 = .003, or age, F(1, 732) = 3.25, p = .072, ηp2 = .004, and no interaction between sex and age, F(1, 732) = 0.016, p = .90, ηp2 < .001.

The predicted main effect of sex was not significant, F(1, 732) = 2.00, p = .16, ηp2 = .003, nor was the predicted main effect of age, F(1, 732) = 3.25, p = .072, ηp2 = .004. The interaction between sex and age were also not significant, F(1, 732) = 0.016, p = .90, ηp2 < .001.

6

Reporting Statistics in Psychology

Correlations Italicise r and p. Omit the leading zero from r.

Preferences for femininity in male and female faces were positively correlated, Pearson’s r(1282) = .13, p < .001.

References American Psychological Association. (2005). Concise Rules of APA Style. Washington, DC:

APA Publications.

Field, A. P., & Hole, G. J. (2003). How to design and report experiments. London: Sage Pub- lications.

7

,

Open Access

Correspondence

The bread and butter of statistical analysis “t-test”: Uses and misuses

Younis Skaik

doi: http://dx.doi.org/10.12669/pjms.316.8984 How to cite this: Skaik Y. The bread and butter of statistical analysis “t-test”: Uses and misuses. Pak J Med Sci 2015;31(6):1558-1559. doi: http://dx.doi.org/10.12669/pjms.316.8984

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Statistical tests are very important in biomedical research.1 Several factors play a role in selecting the most appropriate statistical test.2 The misuse or inaccurate use of a statistical test may navigate the research in the wrong direction, and hence incorrect conclusions. Because it is probably the most commonly used statistical test, Student’s t-test is considered “the bread and butter” of statistical analysis. The William Gossett test “Student’s t-test” is easy to use, however, it is also misused.3 There are three types of the t-test, which are used for comparing either a single mean or two population means (Table-I). Each t-test can be used under specific conditions and criteria.

Types of t-test: 1. One- Sample t-Test It is used for comparing sample results with a known and specified value, sometimes a “gold standard”. The task of this test should be to answer the question “is the mean of the population from which the sample is taken is different from the specified value”? For example, based on a random sample of 200 students, can we conclude that the average IQ score this year is lower than the average from 3 years ago?

In most studies, a sample size of at least 40 can guarantee that the sample mean is approximately normally distributed, and the one-sample t-test can then be safely applied.

Correspondence:

Dr. Younis Skaik, Department of Laboratory Medicine, Faculty of Applied Medical Sciences, AL-Ayhar University-Gaza, Palestine. E-mail: [email protected]

* Received for Publication: September 14, 2015

* Accepted for Publication: October 15, 2015

2. Two-Sample t-test It is used to know whether the unknown means of two populations are different from each other based on independent samples from each population. To apply this test, it is very important that the two samples are independent and unrelated to each other. The samples can be obtained from two separate populations, or from a single population that has been randomly divided into two groups, and each group subjected to one of two treatments. The test is only valid for comparing means from a quantitative variable. 3. Paired t-test

It is appropriate for data in which the two samples are paired in some way, such as the following examples. 3-1 Pairs consist of before and after measurements

on a single group of subjects. 3-2 Two measurements on the same subject (e.g.,

right and left arm) are paired. 3-3 Subjects in one group (e.g., those receiving a

treatment) are paired or matched on a one-to- one basis with subjects in a second group (e.g., control subjects).

Misuses of t-tests: Please do not use t-tests in the following cases. 1. If the sample size is small (less than 15), the

one-sample t-test should not be used if the data are clearly skewed or the outliers are present. Nonparametric test can be performed.

2. If the sample size is moderate (at least 15), the one-sample t-test should not be used if there are severe outliers.

3. If the outcome measure is categorical (nominal/ discrete) variable such as, gender, and even if the data have been numerically coded, the two- sample t-test should not be applied.

1558 Pak J Med Sci 2015 Vol. 31 No. 6 www.pjms.com.pk

t-test”: Uses and misuses

Type of t-test Table-I: Types of Student’s t-test.

Test Description

One-sample t-test Two-Sample t-test

Paired t-test

To compare a single mean to a fixed number or gold standard To compare two populations means based on independent samples from

the two populations or groups To compare two means based on samples that are paired in some way

4. If a group of subjects receives one treatment, and then the same subjects later receive another treatment. This is a paired t-test and not two- sample t-test.

5. If subjects receive a treatment, and then the results are compared to a known value (often a “gold standard”). This is a one-sample t-test and not two-sample t-test.

6. If the study aims to compare three or more means, then it is better to use an analysis of variance to avoid the loss of control over the experiment-wise significant level.

REFERENCES

1. Scales CD JR, Norris RD, Preminger GM, Vieweg J, Peterson BL, Dahm P. Evaluating the evidence: statistical methods in randomized controlled trials in the urological literature. .J Urol. 2008;180:1463-1467.

2. Skaik Y. The panacea statistical toolbox of a biomedical peer reviewer. Pak J Med Sci. 2015;31:999-1001.

3. Wu S, Jin Z, Wei X, Gao Q, Lu J, Ma X. Misuse of statistical methods in 10 leading Chinese medical journals in 1998- 2008. Scientific World J. 2011;11:2106-2014.

Pak J Med Sci 2015 Vol. 31 No. 6 www.pjms.com.pk 1559