When ChatGPT Meets the Oxford Dictionary
Image credit: UnsplashRecently, I’ve been collaborating with an old friend (who is now a language teacher in Hong Kong) from my Bachelor’s programme in Chinese Language Education to produce a podcast series on language teaching. Our discussions frequently gravitate toward the impact of Artificial Intelligence (AI) on education. This post explores a pressing question: In an era where “asking AI” yields instant answers, what remains of the irreplaceable authority of the “old brand”: the dictionary?
The “New Normal” in the Classroom: Is AI Always Right?
When we speak of AI in this context, we are specifically referring to Large Language Models (LLMs). LLMs, such as ChatGPT, DeepSeek, and Google Gemini, are AI systems trained on vast amounts of text data. These data allow the models to predict and generate human-like natural conversations. My friend shared that when primary school students encounter unfamiliar words today, their first instinct is to ask an LLM. However, when it comes to Classical Chinese (ancient Chinese), the answers provided by LLMs often fail to make sense within the specific context of a text. When my friend urged her students to consult a Classical Chinese dictionary, the children responded with a blank stare: “A dictionary? How do you even use that?”
This question left me in deep thought. Once upon a time, looking up radicals (the graphical components of Chinese characters), counting strokes, and cross-referencing definitions were essential rites of passage for every student. But in this age of LLMs, children have grown accustomed to “spoon-fed” answers. They are gradually losing the ability to retrieve and critically judge information.
To test the reliability of LLMs, I decided to compare ChatGPT’s definition of an obscure character, “豵” (pronounced zōng), with that of a authoritative dictionary.
I first asked ChatGPT 5.2 for the meaning of “豵”. It informed me that the character originally meant a “piglet” or “young pig,” even adding that modern speakers would simply say “piglet” (豬仔 zhū zǎi). It sounded perfectly reasonable.
However, when I turned to Han Dian (漢典), a respected online dictionary that integrates historical lexicons, the results were different. Han Dian provided a much more granular breakdown: according to the ancient text “Yu Pian” (玉篇), “豵” can mean a “boar” (male pig), while the Shuowen Jiezi (說文解字; the first Chinese dictionary to analyse character structure and word meaning, dating back to the 2nd century) specifies it as a “six-month-old piglet”. Unlike ChatGPT’s generalised answer, the authoritative source was precise about the gender and the exact age of the animal.
Interpreting Classical Chinese requires a deep understanding of the era, the text, and the pragmatics. If a student’s prompt to the LLM is too simple, or if the LLM’s training data hasn’t fully indexed the nuances of ancient texts, the results will inevitably be subpar.
This is what I call “probabilistic verisimilitude.” AI currently lacks a deep understanding of cultural context. It provides the “greatest common denominator” found in its data. This often leads to “over-inference” based on modern linguistic intuition. For beginners, these “half-right” explanations are dangerous. Such might lead students to believe language learning is merely about word substitution, ignoring the thousand-year-old cultural foundation behind words.
This discrepancy explains why teachers are so devoted to dictionaries (at least among Chinese teachers in Hong Kong). In teacher training, the dictionaries recommended by professors represent a Canon, a standardized transmission of knowledge. Without this pursuit of precision, language learning becomes as hit-or-miss as a probability forecast.
Persistence in the Lab: How is the “Standard Answer” Manufactured?
Turning to my own field, psycholinguistics, the situation is even more intriguing. In the lab, we also need a benchmark for our experiments, but our definition of “authority” differs significantly from that of a classroom teacher.
Psycholinguists generally recognise authority through large-scale corpora published by leading research labs. While a dictionary pursues the “standard of quality”, a corpus prioritises “quantity of texts” (quality is also key, but the quantity of texts is important). These datasets might be scraped from books, children’s literature, or even movie subtitles. For a researcher, a corpus doesn’t necessarily represent “correctness”, but rather “authenticity”. It shows how people actually use language in daily life.
However, this quantity-based authority faces its own set of hurdles.
For example, when designing a vocabulary size test (a standardised tool used to estimate how much vocabulary a person knows), a challenge for researchers is designing the “correct answer”. You may have seen these tests where a participant must choose the most accurate definition from options A, B, C, and D. Researchers often find it challenging to design options that are 100% precise without being ambiguous.
This is where researchers and teachers eventually meet. While we may not share the teachers’ specific devotion to a single dictionary, we both seek stability. If the corpora we rely on lack the clear boundaries of a dictionary, and we now add the interference of LLMs, which are prone to “hallucinating” or making up definitions, how can we ensure the reliability and Validity of our tests? If the “ruler” we use to measure is inaccurate, can we truly measure how the language is learnt?
Suggestions for Teachers and Researchers: From “Finding Answers” to “Cross-Validation”
Teachers and psycholinguists may approach “authority” differently: teachers guard the academic lineage, while researchers pursue robustness of large-scale data. But essentially, we are both protecting the same thing: a deep understanding of linguistic structure.
In the age of AI, whether it is the teachers’ “standard” or the researchers’ “ideal”, both serve as a vital defense against the “dilution of knowledge”.
What does this mean for teachers and parents?
This presents a wonderful educational opportunity. When a child brings an AI-generated answer to us, we shouldn’t just look at whether the result is “right”. We should lead them through “cross-validation.” We can tell them:
LLM is your “Brainstorming Buddy”: It can help you quickly retrieve ideas, simulate conversations, and expand your thinking. But…
The dictionary and authoritative corpora are your “Steering Wheel”: When you need to make a final judgment and ensure you aren’t “putting a hat on the wrong head” (a Chinese idiom for misattribution), you must return to the benchmark agreed upon by the academic community.
For a child, this isn’t just about learning a language; it’s about cultivating critical thinking where we refuse to blindly follow data probabilities.
Final Remarks
The discussion above is just the tip of the iceberg. There are so many topics regarding the battle between AI and authority that deserve deeper exploration in our upcoming podcast episodes, such as:
The art of “prompt engineering”: How to write systematic instructions that allow LLMs to accurately explain word meanings within a specific context.
The craft and limitations of dictionaries: A dictionary is more than a tool; it is the product of Exegesis (訓詁 - xùngǔ), the traditional academic study of explaining the meanings of ancient texts through rigorous philological analysis. This depth of “craftsmanship” is something AI cannot replace. However, we must also acknowledge their shortcomings, such as their challenges in keeping up with internet slang and neology.
Cross-cultural views on authority: Do different countries or linguistic communities define “authority” differently?
In the era of blurred standards, we are no longer just looking for a shortcut to an answer. The ability to discern truth from noise in the vast sea of information is one of the most important skills of all.

