How Cite Lens Uses AI to Flag Potentially Problematic Citations

Citations are fundamental to the process of communicating scientific findings and play a key role in strengthening claims made by the authors. However, due to problematic citation practices, differentiating between references that contribute to an article and those that are out of scope or out of context is highly important.

MDPI’s in-house Technology Innovation team developed Cite Lens, a tool used to detect out-of-scope and out-of-context citations in scholarly articles. The tool aims to enhance editorial screening and discourage manipulative citation practices.

This article builds on a conference paper published by MDPI’s Tech Innovation team that explores the technical aspects of Cite Lens, reinforcing MDPI’s commitment to transparency and integrity in academic publishing. Given the importance of human-in-the-loop validation, meaning that a human makes the final decision, it is important that the scholarly community understands how Cite Lens functions.

Why citation screening is essential

Citations are essential to maintaining the integrity of the scholarly record. They must be an accurate reflection of how prior research informed and supported subsequent published articles. Their inclusion acknowledges others’ work, provides evidence, and frames the new research within existing knowledge.

However, checking the references of a manuscript and how they are used is a lengthy manual task. This is putting a strain on the editorial process that is being amplified by the increasing volume of publications and problematic citations.

One of the latest challenges is to detect irrelevant citations. These are references included in a scholarly work that do not contribute to its arguments or claims, and do not fit within the scope of the citing article.

Other problematic citation practices include:

  • Abnormal or excessive self-citation, which can be performed for self-promotion or to inflate citation counts.
  • Citation padding, which refers to adding unnecessary or irrelevant citations to inflate the number of references.
  • Coercive citation, which refers to when editors or reviewers pressure an author to add in unnecessary or irrelevant citations.
  • Citation cartels, which are groups of researchers citing each other to boost their respective impacts.

To maintain integrity, the editorial process must be able to ensure that citations are used accurately and ethically to preserve the scholarly record.

MDPI’s Technology Innovation team

MDPI’s in-house Technology Innovation team is committed to creating user-centric tools for both internal and external users. Their primary aim is to deliver applications that optimise workflows and provide support. Tools are developed in line with MDPI’s core values of quality and transparency, both guaranteeing research integrity across MDPI’s entire portfolio of more than 485 journals.

Milos Cuculovic, Head of Technology Innovation, outlines how

In 2018, we established the Technology Innovation team, composed of AI experts including data scientists, AI/ML engineers, and data engineers. The team’s main focus is to identify opportunities to improve internal processes, develop new tools for researchers, and enhance the quality of academic output.

What is Cite Lens?

Cite Lens is part of MDPI’s Ethicality tool suite, which focuses on strengthening editorial processes and ensuring the quality of published manuscripts.

The tool provides a robust means of identifying potentially problematic citations, especially when used alongside citation checking methods. It was developed with support from editors and research integrity experts.

How does it work?

Cite Lens uses embedding models, which are AI systems that convert text into vector representations, to measure how semantically similar two pieces of text are.

To do this, a relevancy score is derived by leveraging the academic language modelling capabilities of Specter, a natural language processing model that assesses document-level relatedness. This model can map the article being reviewed and the cited article, recognising how closely related the contents of the two documents are.

Cite Lens bases comparisons on two key metrics:

  1. Article-reference similarity: how similar is the cited article to the citing article overall?
  2. Context-reference similarity: how similar is the cited article to the specific paragraph in which it is cited?

By comparing both, the tool can detect cases where a reference is out of scope (irrelevant to the article) or out of context (featured in a paragraph where it does not belong).

AI-assisted human review

The tool is not autonomous but works by flagging potentially problematic instances for a human expert to review. Potential users who address the flagged instances include internal editorial, research integrity, and ethics staff, and also external users such as reviewers and academic editors. Human expertise and final validation are critical to MDPI’s mission to publish high-quality science.

Overall, the growing number of scientists and publications, alongside the rise in problematic citation practices, creates the need for AI-powered checking tools. Cite Lens aids users during the editorial process by flagging potentially out-of-scope and out-of-context citations.

The tool helps ensure that content within MDPI’s journals is relevant and adheres to the highest ethical standards.

Benefits of Cite Lens

Because citations are such an essential aspect of scholarly publications, Cite Lens focuses on ensuring the quality and integrity of citations. This, in turn, improves editorial screening and the overall quality of publications.

Other benefits include:

  • Discouraging the use of problematic citation patterns.
  • Integrating into the workflow of reviewers and Editorial Board Members (EBMs), reminding reviewers and EBMs of the importance of ethical citation practice.
  • Saving time during the review process by efficiently flagging manuscripts with potentially problematic citations and guiding reviewers and EBMs towards validating flagged citations.
  • Reducing the number of out-of-scope and out-of-context references present in published papers.
  • Increasing the reliability of journal metrics that rely on citations.
  • Reducing the prevalence of citation cartels.

Overall, Cite Lens is supporting the human-centric editorial process across MDPI journals in ensuring citations are used accurately and ethically in high-quality scientific outputs.

Future development

MDPI’s Technology Innovation team will continue to refine the tool so it can better support the editorial screening process. Furthermore, the tool will be integrated into SuSy, MDPI’s in-house online submission system.

Following this, the tool will be available for journals participating in JAMS, MDPI’s comprehensive Journal Article Management service. JAMS enables publishers of all sizes to streamline their entire publishing process. Support from Cite Lens will enable them to improve their editorial screening process and overall publication quality.

Across MDPI journals, Cite Lens will be implemented for regularly scanning published papers for retracted references. This will ensure high-quality output across MDPI’s more than 485 journals.

Cite Lens: quality, innovation, transparency

Citations are essential to the scholarly record. However, due to the rising number of publications and increase in problematic citation behaviour, the need for AI-powered checking tools has grown.

Cite Lens was developed to support the editorial screening process by guiding editors towards potentially problematic references in papers.

This reflects MDPI’s commitment to innovation and transparency, with the aim of advancing open science. It is the responsibility of publishers to disclose the tools they use and how they work.

If you want to learn more about Cite Lens and the technology behind it, click here to read a conference paper written by the Technology Innovation team.