Reading Between the Lines: How Linguistic Analysis Can Become a Tool for Dementia Diagnosis

Dementia describes a number of conditions that are characterised by significant cognitive decline, the most common of which is Alzheimer’s disease. Physiologically, Alzheimer’s is characterised by the gradual accumulation of toxic amyloid-beta and tau proteins, which form senile plaques in the brain. There is currently no treatment for any form of dementia and so early identification of cognitive decline can enable interventions to delay, or even prevent, some of the damage.

The first symptoms individuals with dementia may display are commonly increased episodes of confusion and impairments in memory or language. However, evidence suggests that the underlying pathology of the condition is likely to begin years, maybe even decades, before the onset of symptoms. Dementia is known to have an effect on writing, and consequently, research has explored whether a patient’s writing history could serve as a potential biomarker for the disease.

New research, published in the Open Access journal Brain Sciences, investigates the potential of word-level linguistic analysis as a method for the early detection of cognitive decline. The study examines the case of Sir Terry Pratchett, focusing on his literary output alongside his diagnosis of posterior cortical atrophy, a rare variant of Alzheimer’s disease, to determine whether changes in his writing appeared before clinical symptoms were recognised.

Best known for the Discworld series, comprising of 41 novels published between 1983 and 2015, Pratchett was diagnosed with Alzheimer’s disease in 2007 while he was still actively writing and publishing. Therefore, analysing the novels in the Discworld series has the potential to provide an insight into how neurodegenerative diseases may subtly affect language prior to formal dementia diagnosis.

Dr Thom Wilcockson, corresponding author on the study, summarises the importance of the study:

“Identifying dementia in the early stages is important as it may enable us to use interventions sooner before the brain is damaged beyond repair.”

Literary history as evidence of early cognitive change

Despite a limited number of potential case studies, previous research has selected the works of writer Iris Murdoch, who was posthumously diagnosed with Alzheimer’s. Significant and consistent variations in lexical diversity were uncovered between Murdoch’s final book and control books from earlier in her career.

A broader study also included the works of writers Agatha Christie, who was also thought to have had Alzheimer’s during her career, and P.D. James, who acted as a control participant. As the control, P.D. James was shown to maintain stable linguistic diversity well into her late 80’s. Both Murdoch and Christie showed signs of cognitive decline displaying as a reduction in the diversity of vocabulary and, in turn, an increase in repetition of content words (nouns, verbs, adjectives, and adverbs).

Interestingly, deficits in Murdoch’s writing appeared in her late 40’s and early 50’s. This suggests that linguistic deficits can be detected many years before a formal diagnosis and that linguistic analysis appears to show promise in identifying whether an author has experienced cognitive decline.

Given Terry Pratchett’s extensive writing career which continued following his dementia diagnosis, linguistic analysis of his novels may provide a deeper understanding of how early cognitive decline could present itself in a writer’s text.

Measuring diversity in text

In this study, the novels were analysed by measuring the type-token ratio for 4 major word classes: nouns, verbs, adjectives, and adverbs. The type-token ratio calculates the proportion of unique words within each class relative to the total number of words in that category to measure how varied the vocabulary is. A higher value indicates a richer and more diverse use of language.

Overall lexical diversity was quantified for 33 out of 41 novels from the Discworld series. Of the 8 titles excluded from analysis, 1 was shorter than the other full-length novels, and the remaining were part of titles aimed at younger readers. Focusing on a homogenous selection of adult-marketed, full-length novels, ensured that observed changes reflected cognitive status rather than intended shifts in audience or format.

To investigate the potential effects of cognitive decline, novels were grouped into publications relative to Pratchett’s diagnosis, either pre- or post-diagnosis. Finally, statistical analysis was subject to the raw data to summarise the key linguistic features across the Discworld novels.

Patterns before and after official diagnosis

Statistical analysis revealed that adjective use was the strongest indicator of change across Terry Pratchett’s texts. A cut-off value was calculated for the adjective type-token ratio, representing the point at which writing shifts from typical variation to patterns associated with cognitive decline. Using this threshold, they estimated that his cognitive decline may have begun around the time The Last Continent, the 22nd book in the Discworld series, was written. The book itself was published in May 1998, 9 years and 7 months before the formal dementia diagnosis.

Dr Thom Wilcockson states the following:

“Our analysis found that Sir Terry’s use of language did indeed change during his career. These results suggest that language may be one of the first signs of dementia, and Sir Terry’s books reveal a potential new approach for early diagnosis.”

Adjusting for length

The type-token ratio is strongly influenced by book length, as longer texts naturally tend to repeat words more often. To account for this, the researchers normalised their results using a moving-average type-token ratio, calculating vocabulary diversity within consecutive 100-word sections of the book. This approach provides a more stable and reliable measure of lexical variation across the Discworld series.

Even after controlling for word count in this way, all word classes continued to show significant differences in lexical diversity before and after the publication of The Last Continent. As lead author Dr Melody Pattison explains:

“We would normally expect less lexical diversity as texts get longer, but even after controlling for text length, our findings were still significant. The shifts in language were not something a reader would necessarily notice, but rather a subtle, progressive change.”

Limits of language-based dementia diagnosis

This study suggests that language analysis may be sensitive enough to detect preclinical cognitive changes, shifts so subtle that readers would never consciously notice them.

However, these findings are based on a single, unique case: Sir Terry Pratchett. While promising, this approach cannot yet be generalised to everyone with Alzheimer’s disease. The threshold values identified in this research were personalised to Pratchett’s baseline writing style and are not universal diagnostic markers.

More broadly, writing naturally evolves over time. Ageing influences both spoken and written language, and these changes vary from person to person. This means linguistic analysis must be interpreted carefully, as patterns observed in one individual may not apply to another.

Important insights for the future of medicine

Despite these limitations, the implications are compelling.

Language is something that we produce every day, through emails, messages, social media, and conversation. Advances in computational linguistics and machine learning are increasingly being explored in dementia diagnosis research. With validation across larger and more diverse samples, personalised linguistic monitoring could one day complement traditional cognitive assessments in dementia diagnosis.

Rather than replacing existing clinical diagnosis, lexical analysis may serve as an early warning system, flagging subtle, progressive shifts that might otherwise go unnoticed. Earlier detection could allow for earlier intervention, planning and support.

Importantly, this research reminds us that cognitive change does not begin at diagnosis, it unfolds gradually. If we look closely enough, language may provide an essential insight into cognitive deficits.

More studies on research exploring dementia can be found across the Open Access journals Journal of Dementia and Alzheimer’s Disease and Brain Sciences. Alternatively, you can access the full MDPI journal list here.