Skip to main content
Back to Blog

How AI Detection Scores Work for Teachers

GradeOrbit Team·Education Technology
7 min read

You run a piece of student work through GradeOrbit's AI detection tool and a number appears: 78%. The score sits there and the question immediately follows — what does that actually mean? Should you be concerned? Is it evidence of anything? And what do you do next?

AI detection likelihood scores are widely misunderstood, and the misunderstanding runs in both directions. Some teachers treat a high score as definitive proof that a student used ChatGPT or Claude. Others dismiss detection entirely because they've heard the tools are unreliable. The truth is more nuanced — and getting it right matters for both academic integrity and the welfare of the students in your care.

What a Likelihood Score Actually Measures

A likelihood score is a probabilistic assessment, not a surveillance report. The tool is not reading a log of what software the student used. It has no access to their browser history, their clipboard, or their conversation with an AI assistant. What it is doing is analysing the statistical patterns in the submitted text and comparing them against patterns that are characteristic of AI-generated output.

Put simply: AI models like ChatGPT and Claude produce text that has recognisable statistical fingerprints. They tend to select words in predictable ways, construct sentences with a particular rhythm, and arrange arguments in a characteristic sequence. Detection tools are trained to recognise these fingerprints and return a score reflecting how closely the submitted text resembles them.

A score of 0% means the text shows almost no features associated with AI output — it looks statistically human. A score of 100% means the text is almost entirely consistent with AI generation. Everything in between is a spectrum of probability, not certainty. A score of 78% means: this text shares many features with AI-generated writing. It does not mean the student used AI.

How AI Patterns Are Detected

Detection models look for clusters of signals rather than any single feature. The signals most associated with AI-generated text include unusually consistent sentence length — humans naturally vary their rhythm in ways AI tends not to — over-use of hedging constructions like "it is worth noting that" or "in the contemporary landscape", and a kind of generic competence that addresses every relevant point without any authentic personal voice or idiosyncratic phrasing.

Tools trained on large datasets of known ChatGPT, Claude, and Gemini outputs can identify the statistical fingerprints of specific models. Different AI systems have slightly different writing tendencies, and well-calibrated detection models account for this. This is part of why accuracy varies between tools: a detection model trained on older AI outputs may miss patterns specific to more recent models, and vice versa.

GradeOrbit uses Google Gemini AI to power its detection analysis, which means the underlying model is continuously updated as AI writing patterns evolve. Rather than relying on a static database of known AI outputs, the analysis is done by a frontier AI that understands current generation patterns at a deep level.

Why High Scores Can Be Misleading

False positives — cases where genuinely human-written work scores highly — are well-documented and real. Understanding the most common causes will help you read results more carefully.

Highly Proficient Writers

Students who write with exceptional fluency, structural clarity, and formal register can produce text that looks statistically similar to AI output. Academic writing at its best shares many features with AI-generated text: clear topic sentences, logical sequencing, balanced argument construction. A Year 13 student who has been coached for A-Level or who is a naturally gifted writer may regularly score in the moderate-to-high range simply because they write very well.

EAL Students and Translation Workflows

Students who speak English as an additional language sometimes draft or plan in their first language before writing in English. The resulting prose can have a formal, slightly over-structured quality that detection tools interpret as AI-like. This is a particularly sensitive category. A high score on an EAL student's work without contextual investigation would be an unfair basis for any action.

Formal Academic Registers

Subjects that require students to write in strict formal registers — philosophy, law, religious studies — can produce text that reads as machine-like precisely because the student has successfully mastered the required style. A student who has absorbed the conventions of academic argumentation and is applying them consistently may score higher than one writing more casually, despite the former having worked harder and more honestly.

Why Low Scores Can Also Be Misleading

The reverse error is equally important. A low score should not be treated as a clean bill of health. Students who use AI to generate a first draft and then edit it substantially — changing vocabulary, restructuring sentences, inserting personal detail — can produce work that scores well below 50%. The more editing they do, the less the text resembles the AI's original output.

Some students are deliberately trying to evade detection. Others are doing something more ambiguous: using AI to help them plan or draft, then rewriting substantially. Both may score low. Your knowledge of the student and their typical work remains your most powerful tool — the detection score is one input, not the answer.

Using the Score Alongside Professional Judgement

The most effective approach treats the detection score as a single piece of evidence within a broader picture. Before acting on a high score, ask yourself: does this piece of work feel consistent with what this student normally produces? Do you have previous writing samples that show a different level of fluency or structural sophistication? Is the submission context suspicious — submitted at midnight after a deadline, dramatically better than timed class work?

If a high score coincides with several other indicators — a step change in quality, specific linguistic patterns in the text, an inconsistency between the submitted work and in-class writing — you have a convergent case that warrants a quiet conversation. If a high score is the only anomaly and the work is consistent with the student's usual output, the score alone is not a basis for any action.

If you do need to investigate further, the most productive step is asking the student to talk through their work in person. Someone who wrote the essay genuinely will be able to discuss it fluently. Someone who submitted AI-generated content will often struggle to explain their own ideas in any depth. This conversation is almost always more informative than any score.

GradeOrbit's 1-Credit vs 3-Credit Detection Models

GradeOrbit offers two detection modes to suit different situations. The 1-credit model runs a fast analysis using a highly capable detection system — suitable for quick checks, large batches, or situations where you want a rapid first impression of a class set. It is accurate and reliable for most everyday detection tasks.

The 3-credit model uses a more powerful underlying model that performs a deeper linguistic analysis. It is better suited to borderline cases — work where the 1-credit score falls in the ambiguous middle range, or where you want a more thorough breakdown of the specific signals contributing to the score. The 3-credit analysis also returns a more detailed reasoning paragraph, which can help you explain your thinking to a pastoral colleague or a line manager if a case needs to be escalated.

Your model preference is saved between sessions, so you do not need to reconfigure each time. If you regularly run quick checks across a class and occasionally want a deeper analysis on specific pieces, switching between modes takes one click.

As with all GradeOrbit features, student work submitted for AI detection is never stored on our servers. The content is processed and then discarded — no student text is retained after the session ends. We recommend redacting any identifying information before submitting work, using the built-in redaction tool available in the upload interface.

For a broader introduction to using detection in your classroom, our guide on AI detection for teachers covers the fundamentals and a practical framework for responsible use.

Try GradeOrbit's AI Detection Tool

AI detection scores are a useful addition to your academic integrity toolkit — not a replacement for professional judgement, but a meaningful data point when combined with your knowledge of the student and the context of the submission. GradeOrbit's built-in detection tool is available directly from your dashboard, ready to use with pasted text, uploaded images, or scanned documents.

Try GradeOrbit free today and see how the detection tool fits into your existing approach to academic integrity across your class or department.

Ready to save time on marking?

Join UK teachers using AI to provide better feedback in less time.

Get Started Free