How AI Detection Scores Work in Student Coursework

An AI detection score appears on your screen: 79%. The work is from a student who has never given you cause for concern before. You are not sure whether to escalate, have a conversation, or quietly set the result aside. This is the situation UK secondary teachers are navigating every week — and the score itself is rarely as clear-cut as it looks.

Understanding what AI detection likelihood scores actually measure — and what they do not — is essential before you use them to make any decisions about students. This guide explains how these scores work, when they can mislead, and how to build a fair, evidence-based response around them.

What a Likelihood Score Actually Measures

AI detection tools do not have access to a student's browser history or a log of every application they opened. What they actually do is analyse the statistical and linguistic properties of a piece of text and compare them against patterns associated with AI-generated writing. The result — expressed as a percentage from 0 to 100 — represents the degree of similarity to those patterns, not a definitive record of what happened.

A score of 0% suggests the text closely resembles natural human writing. A score of 100% suggests it closely resembles AI output. Everything in between is a spectrum, and anything in the 30% to 70% range should be treated with particular caution. These are not hard categories — they are probabilistic inferences drawn from computational analysis of writing style.

This distinction matters enormously. A high score is evidence worth taking seriously. It is not proof. Teachers who treat it as proof risk causing real harm to students — and potentially acting outside their school's academic integrity policy.

Why Handwritten Work Can Still Trigger High Scores

You might assume that handwritten student work is automatically safe from AI detection flags. It is not — and understanding why matters when you are interpreting results from tools like GradeOrbit that scan physical papers.

When a student photographs handwritten work, GradeOrbit's AI transcribes the text before any detection analysis takes place. The detection model then analyses that transcribed text — it has no knowledge of whether the writing was originally done by hand. A student who drafted a response using ChatGPT or Claude, memorised it, and reproduced it in handwriting will produce text that the detection model analyses in the same way it would a typed submission.

This means handwriting does not act as a shield against detection. It also means, conversely, that a student who writes with unusual fluency and consistency by hand may trigger a higher score than a student who types a more stilted, disjointed response. The detection signal is always about the text — not the medium.

When High Scores Do Not Mean AI Was Used

False positives are a real and well-documented feature of AI detection tools. Before drawing any conclusions from a high score, it is worth knowing what kinds of student writing tend to produce them.

Highly proficient writers

Students who write with clarity, structural consistency, and academic fluency produce text that can statistically resemble AI output. Academic writing at its best shares characteristics with AI-generated prose: well-organised paragraphs, clear topic sentences, controlled vocabulary. A Year 13 student who has been coached intensively for A-Level English or who is a natural writer may score highly not because they used AI but because their writing is exceptionally good.

EAL students and translation

Students who speak English as an additional language sometimes draft responses in their first language and translate them, either manually or with a translation tool. The resulting English can have a formal, slightly over-structured quality that detection models flag. This is a sensitive category. Treating a high score on an EAL student's work as straightforward evidence of AI use — without further investigation — risks being both unfair and potentially discriminatory.

Heavily revised drafts

A student who has worked through multiple drafts, received detailed feedback from a teacher, and refined their writing over several weeks may produce a final submission that is smoother and more consistent than anything they produced at first attempt. Good teaching can make writing look more like AI output to a statistical model. That is not a failure of the student — it is evidence that your feedback worked.

Formal subject registers

Subjects that demand formal, structured writing — religious studies, philosophy, law, certain science assessments — require students to adopt a specific register. A student who has genuinely mastered that register and applied it consistently may produce writing that appears machine-like precisely because they have succeeded at the task.

When Low Scores Are Not a Clean Bill of Health

The reverse error matters equally. AI detection tools can be circumvented — sometimes intentionally, sometimes not. A student who pastes AI output and edits it substantially, adding their own phrasing, examples, and restructured sentences, will often produce a lower score. The more editing they do, the less the text resembles raw output from ChatGPT or Claude. A student who uses AI to produce a first draft and then rewrites it heavily may score very low while still having relied on AI in a meaningful way.

A low score should not be read as confirmation that the work is authentic. Your professional knowledge of the student — what they typically produce, how they speak about ideas, what their previous work looks like — remains your most reliable indicator.

A Framework for Responding to Detection Results

Rather than treating a score as a trigger for action, a more robust approach considers the full context. Here is a practical sequence for working through a result that concerns you.

Compare against the student's previous work

Ask yourself whether this piece of work is consistent with what this student normally produces. If you have examples of their writing from class exercises, rough drafts, or timed in-class tasks, compare them. A sudden improvement in vocabulary range, argument structure, or analytical depth — especially if it occurs alongside a high detection score — is worth investigating. If the score is high but the work is entirely consistent with this student's usual standard, that context significantly changes how you should respond.

Read the text carefully for linguistic signals

AI-generated writing tends to exhibit certain patterns: evenly distributed sentence lengths, a lack of personal voice, over-reliance on hedging phrases, generic competence that lacks authentic personality. These are worth looking for as additional data points. They are not proof on their own, but they either reinforce or reduce your concern when combined with the detection score.

Consider the submission context

When was the work submitted? Has there been a pattern with this student of late submissions that are unusually polished? A high detection score combined with a last-minute submission from a student who typically produces weaker work represents a stronger case for further investigation than the same score from a confident, consistent writer who submitted early.

Have a conversation before taking any formal action

If multiple indicators point in the same direction, the most effective next step is a non-accusatory conversation with the student. Ask them to walk you through their argument, explain their evidence, or write a short paragraph on the same topic in class. A student who genuinely wrote the work will be able to engage with it. A student who submitted something they did not write will typically struggle to discuss ideas in any depth, or will give answers that do not match the sophistication of what they submitted.

Follow your school's academic integrity policy

Before any formal action, check your school's policy on AI use in assessed work. Many schools are still developing their approach, and there is wide variation in how AI assistance is classified. Document your evidence carefully, and involve a senior colleague if the case is complex. A detection score alone is not sufficient grounds for a formal sanction.

How GradeOrbit's Detection Tool Supports Professional Judgment

GradeOrbit includes a built-in AI Detection feature designed with classroom context in mind. You can submit student work as pasted text, an uploaded image, or a scanned physical document. The tool returns a likelihood score from 0 to 100%, a confidence label — Low, Medium, or High — indicating how certain the model is, the specific linguistic signals that contributed to the score, and a brief reasoning paragraph summarising the overall assessment.

Detection is available in two modes: a faster one-credit option for quick checks, and a more thorough three-credit option for cases where you want a deeper analysis. Your model preference is saved between sessions.

Student work is never stored on GradeOrbit's servers. Content is sent to the AI for analysis and immediately discarded. Before submitting, you can use GradeOrbit's built-in redaction tool to draw black boxes over student names and any other identifying information — students are processed anonymously throughout.

For a broader introduction to AI detection in schools, our guide on how to use AI detection in school fairly covers policy and procedure in more depth.

Try GradeOrbit's AI Detection Feature

AI detection scores are a useful piece of evidence — but only one piece among many. Used alongside your professional knowledge of the student, a careful reading of the text, and a direct conversation where needed, they can play a meaningful role in maintaining academic integrity without putting students at unfair risk.

GradeOrbit's detection tool is built directly into your dashboard, ready to use with any text, image, or scanned document. Try GradeOrbit today and see how it fits into your approach to academic integrity.