Four features of this year’s exam results that might be a surprise…

This year's A level results were accompanied, as usual, by pictures of joyously jumping teenagers, and searching analyses of lots of numbers, such as why 28.4% of the grades awarded to boys were A and A*, but only 28.2% to girls. Much the same is likely to be reported about the GCSE grades when they are announced on Thursday.

All good, and important, stuff.

Some features, though, don't receive much publicity, so here are four that might cause you to raise an eyebrow…

1. Some GCSE grade distributions are not bell-shaped

You might expect grade distributions to be more-or-less bell-shaped, with most students awarded a grade towards the middle, and progressively fewer for each successive grade on each side.

For many years, this has been the case for all A level subjects, and, in the past, for all GCSE subjects graded A*, A, B...But since the introduction of numeric grades in 2017, GCSE grade distributions for some subjects have had two peaks, or even three, with one of those peaks almost always being at grade 3.

Figure 1 shows some examples for the summer 2024 exams in England - English Literature (bell-shaped), English Language (two peaks) and Spanish (three peaks):Figure 1: Some summer 2024 GCSE grade distributions

I'm puzzled.

I don't know why more English Language students are awarded each of grades 3 and 5 than grade 4. It can't be because of re-sits, for the distribution for students aged 16 also has two peaks at grades 3 and 5; nor can it be a result of tiering, for there is only one tier for this subject. Nor can tiering explain the Spanish 'roller coaster'. GCSE Spanish does indeed have two tiers, which, in principle could result in two peaks. But not three.

I find that peak at grade 3 to be particularly harsh, especially for GCSE English Language.If the grade 3/grade 4 grade boundary were just a percentage point or two lower, the population awarded grade 3 would decrease, whilst that awarded grade 4 would increase - so restoring a bell-shape, as illustrated in Figure 2:

Figure 2: Author's simulations of the grade distributions for GCSE English Language with (left) grade boundaries resulting in a close match to the actual distribution (as in Figure 1, bottom left), and also (right) with the 3/4 grade boundary two percentage points lower.

Which raises the question as to why Ofqual positions that oh-so-important grade 3/grade 4 boundary in a location that does so much damage to so many young people.

So look out for non-bell-shaped GCSE grade distributions, especially in subjects such as English Language, History, French, Spanish and German.

And if you can offer an explanation as to why these distributions happen, please let me know!

2. There may be more marking errors than you think...

To me, three hallmarks of a trustworthy and fair exam system are that

the number of marking errors is very small...
...as is the number of grading errors, and...
...the appeals system quickly and effectively resolves any errors that do get through.

Let's look at each of these, starting with the number of marking errors.

We have to wait until mid-December for the publication of information on this year's marking errors, but here are some of Ofqual's figures for the summer 2024 exams in England:

Table 1: Some statistics relating to the summer 2024 exams in England

Expressed as a percentage grades awarded, the number of marking errors detected and corrected following a grade challenge, 3.4% in total, appears modest enough.

But to compare the number of marking errors to the number of grades awarded is singularly inappropriate. Since a marking error can be detected only if the corresponding grade has been challenged, a far more meaningful comparison is against the number of grades challenged.

That tells a very different story: 63.0% for GCSE and 83.4% for AS and A level.

That does not mean that 63 in every 100 GCSE grades have an underlying grade error, for many GCSE grades are error-free whilst others may contain several errors. But as an overall indication of the incidence of marking errors, those numbers, to me, are meaningful. And huge.

Furthermore, as the table shows, only about 5% of GCSE grades are challenged, and 7% of AS and A level grades, implying that some 95% of GCSE grades, and 93% of AS and A level grades, are not challenged.

How many marking errors are lurking, undetected, simply because no one has looked?

3. ...and many more grading errors too...

Some marking errors result in a corrected mark such that the script's new total mark lies within the same grade as the originally-awarded grade. The originally-awarded grade is therefore confirmed, even though there was an underlying marking error that was discovered and corrected.

Other marking errors, however, result in a total mark lying within a different grade, so causing the originally-awarded grade to be changed. As can be seen from Table 1, in 2024, the number of grades changed, expressed as a percentage of grades awarded, was about 1%, as it has been for many years past. This can lead the unwary to believe that the remaining 99% of grades are correct - the unwary, it appears, including Pearson.

Such an inference is false. That's because a grade can be changed only if two conditions are fulfilled simultaneously - firstly, the grade must be challenged, and secondly, the review of marking must have discovered a marking error such that the revised, correct, total mark lies within a different grade. If a grade isn't challenged, it might be right, it might be wrong, we just don't know. The inference that 99% of grades are correct when only 1% of grades are changed therefore assumes that all the 95% of unchallenged grades are correct. You may make your own mind up as to the likelihood of that.

To me, a more meaningful comparison is to express the number of grades changed as a percentage of the number of grades challenged, and - as with marking errors - this gives a startling result: some 22% of grades challenged have their grades changed.

That about one challenge in every five "wins" is worth thinking about, especially when you bear in mind that about 95% of grades are not challenged.

If the sample of grades challenged is representative of the whole population, the fact that one in five challenged grades is changed suggests that maybe the same applies to the whole.

There are of course several reasons why the grades that are challenged are not a representative sample, for example, the fact that challenges are clustered just below important grade boundaries. Another possible reason is that challenges are correlated with wealth - to challenge a grade costs money. That certainly biases the sample of students (and their parents) who raise a challenge, but I don't think it affects the outcome: whether a grade is right or wrong is not influenced by the affluence of the candidate.

So how many grading errors are out there, lurking, undetected, simply because the corresponding grade isn't challenged?

Probably a lot more than we might think. Or wish to think...

4. ...remembering that many grading errors can't be discovered

All that is complicated by the fact that many grading errors can't be discovered and corrected. That's because they are not recognised by Ofqual's current rules for appeals, which allow a re-mark only if a 'review of marking' discovers at least one marking error.

This excludes all grading errors that are attributable not to marking errors, but to legitimate differences in academic opinion between the 'assistant' examiner, who actually marked the script, and a subject senior examiner, whose mark, and hence grade, is, according to Ofqual, 'definitive', and so the ultimate authority of what the right grade actually is.

As an example, suppose that, for a particular A level subject, grade B is defined as 'all marks from 61/100 to 65/100 inclusive' and grade A 'all marks from 66/100 to 70/100 inclusive'. Accordingly, a script marked 64, with no marking errors, is awarded grade B. Suppose further that had a senior examiner marked the script, the mark would have been 66, grade A.

If the candidate, whose certificate shows grade B, raises a challenge, a review of marking will discover no marking errors, for the good reason that there are none. The originally-awarded, 'non-definitive' (or, in plainer English, 'wrong') grade B is therefore confirmed. Had, however, the script been re-marked by a senior examiner, the wrong grade B would have been up-graded to the 'definitive', right, grade A.

The current rules for challenges deny, as Ofqual put it, "a second bite of the cherry".

I believe this to be especially unfair, for if the "first bite" is poisoned by a grade error, surely it is only right to be allowed to take a "second bite" containing the antidote, a re-mark by a senior examiner.

How many instances are there of the award of 'non-definitive' grades that do not have underlying marking errors, and so are currently undetectable and uncorrectable?

Probably, once again, many more that you might think.

Like maybe as many as 25% of all grades awarded.

No, that's not a number out-of-a-hat, but the results of Ofqual's own research, as presented in their November 2018 report Markling Consistency Metrics - An update, and as discussed further here.

As I said at the start, none of this is likely to be in an official press release.

But it's all true.

This article was written by Dennis Sherwood, an independent management consultant, and author of 'Missing the Mark – Why so many school exam grades are wrong, and how to get results we can trust' (Canbury Press, 2022)

Four features of this year’s exam results that might be a surprise…

Related Posts