A 2:1 in..?

I got my grades back from this year, and I’m very interested to look not at the scores, but at the standard deviation of those scores. Out of 100 percent, 3.279 is a very narrow variance, which would show that my weighted mean is a fair representation of my scores.
Grade Analysis

Key:
Blue: Subjects where I scores 63 & 64%
Green: The most fantastic module where we made giant posters and presented them in a poster conference, then had online discussions about the content of said posters, which was graded. Followed by a multiple choice exam.
Orange: A coursework assignment, followed by a final exam, with multiple choice questions and short essay questions.
Yellow: Completely unimaginative final exam, with multiple choice questions and short essay questions.

And it falls into the 2:1 classification category. Good.

This summer, 70% of students across the nation (and my sister is one of them) will graduate with a 2:1 degree or higher. Those who don’t will either be sent to an old poly to do a masters degree, or to do a PGCE, or to work in the HR department at Tesco. Because we’ve created a stigma to go along with our degree classifications, that if you don’t get your 2:1 you’ve basically failed, the universities are doing everything they can to ensure that the 70% statistic holds.

My solid 65.25% could mean that the grading system is an accurate measure, confirmed by test-retest reliability across time. But what exactly is 62.25% is a score of? Recall? How much of a neuro-psychologist I am? Or a developmental psychologist I am? Or a psycholinguist? Or a researcher? Or a computer scientist?

Wait?! Computer scientist? That has very little to do with psychology. Yet my different modules are also quite unrelated to one another, they talk about different paradigms, different topics, different research methods, some more exciting, some less. Yet even so, my grades are reliably steady. I reckon I could get a 2:1 in absolutely anything. I believe in me!

That’s why I’ve used colour coding in the table above, to try connect the different teaching methods and spot trends in the data. And again, there’s a very random spread among the different levels of the independent variable. For the ‘coursework and typical final exam’ subjects, I got 70%, with subsequent scores regressing to the mean, and for the very ordinary ‘typical exam’ subjects I got my lowest score, as well as one in line with the mean. Most interestingly, the subject which I enjoyed the most, Personality and Individual Differences* was right on the mode average and just one percent above the median. Also very interesting is that the two research methods modules, (RMIII & RMIV), both measured identically, scored a time consistent 60%, despite me thinking I was improving.

This tells us that these results do not conclusively argue for the class results being an accurate measure merely of memory under exam conditions. However they could argue that the system is designed to produce lots and lots of 2:1’s, since across a whole array of different subjects, teaching methods and assessment methods, the results are steady.

I don’t think that the current system allows for the genuine expression of student’s novel thought and intellectual talents at undergraduate level. There’s enough hoop jumping and invalid measures to ensure that students who are at least obedient will get their ordinary degree, without demonstrating much beyond content memorisation.

*Personality and Individual Differences was assessed very uniquely, with class groups making posters, and then subsequent graded online discussion using social media.

Advertisements

Why is Reliability Important?

Last week I read an interesting post on validity, and this week I am going to talk about its brother, reliability. So, why is reliability important? It’s easy when learning about scientific research and writing to feel like we’re given a whole set of rules and standards to follow and stick to, extra bits to think about and boxes to tick, when what we really want to be doing is researching psychology.

Ask Joe Bloggs on the street what he thinks about reliability, and he’ll probably say something like: “Because if your results aren’t reliable then they’ll be wrong”. And he’s right. (Do you not agree?) But are these methods helping us? Or are they a psychological form of ‘politically correct gone too far’?

Reliability and validity come together hand in hand to ensure the results of an experiment are trustworthy, realistic and correctly obtained. Validity is defined as the “extent to which a measure assesses what it is claimed to measure” (p. 261 Howitt, D & Cramer, D 2008), whereas reliability concerns consistency across different times or circumstances. An experiment could produce results which may be valid, and therefore are correctly measured, and could help us draw a conclusion, however they might not be reliable.

Reliability tells us that if one week, an experiment produces results to support hypothesis A, and that then in another experiment, either with a different sample, or a similar (if not the same) sample at a different time hypothesis A is then proven wrong, the results aren’t very reliable, and therefore there is insufficient evidence to draw a conclusion.

Reliability in psychology is often measured using statistical methods. ‘Internal reliability’ refers to how well each data value on a scale measures the concept in question. If the data is reliable, then theoretically any data value used will give the same as any other value, or indeed, all values together. Methods are used, such as ‘split half reliability’, where the first and second halves of results are separated, and then the Pearson correlation for these results is calculated. Other mathematical functions such as ‘Spearman-Brown formula’ and ‘Guttman reliability’ are also used.

More practically, tests can be repeated, either as a simple repeat (‘Test-retest reliability’) or in a different form (‘Alternate form reliability’) however this in turn can adversely affect the results, since the circumstances of participants may change, or memories of the first test can affect how participants handle the second test. Alternate forms reliability attempts to overcome the latter problem, by using a slightly different test, which resolves the issue to some extent.

Internal reliability still works hand in hand with these practical methods of ensuring reliability, e.g. after a repeat test, if we see that the value calculated for internal reliability are different from that of the original test, we can determine that results may not be reliable.

While statistics seem so lifeless, dull and uninteresting, we can see here how mathematical formulas can compensate where practical work falls short, but also vice-versa. Obviously, results must be reliable, and here we have a selection of methods that, when used in conjunction with our scientific judgement can and will help us ensure both validity and reliability of our research.