Optimising the paradigms of human AI collaborative clinical coding

Contact Person

Honghan Wu

Page Link

https://www.nature.com/articles/s41746-024-01363-7

Published on

December 2024

The Challenge

Clinical coding is the process of assigning standardised codes (e.g., ICD-10 for diagnosis or procedures) for an interaction with the health service (a visit to GP or a hospital stay). Such ‘coded’ information is widely used for patient care, auditing and research. Clinical coding task is a resource-intensive process which requires a group of specialised clinical coders to manually conduct systematic code assignments for multi-source, multi-modal raw medical records based on standard coding classification systems consisting of thousands of candidate codes2. For example, the most predominant coding classification systems is the ICD-10 (International Classification of Diseases, Tenth Revision) which contains around 68,000 diagnosis codes3. As a result, the whole coding process is expensive, time-consuming, and error-prone.

The Research

This paper proposes a novel Human-in-the-Loop (HITL) framework, CliniCoCo, for human–AI Collaborative Clinical Coding in real-world scenarios. The proposed CliniCoCo involves clinical coders’ feedback in the key stages of the Automated Clinical Coding (ACC) system, i.e., data preprocessing stage, model training stage, and clinical decision-making stage, and fully considers the complex medical record characteristics and clinical process in Chinese hospitals. This is one of the first works, which systematically designs a HITL paradigm for the task of ACC. The main contributions of this paper are summarised as follows.

To reduce workload and enhance annotation quality, this paper proposes a HITL-based collaborative strategy that employs a semi-automatic, iterative module to generate both large noisy-labelled and small clean-labelled datasets.
A 3-step contrastive learning method is introduced to improve ACC’s representation using datasets with varying noise levels, along with a kNN-based inference module that integrates expert knowledge for better prediction.
Multiple collaborative features—such as threshold tuning, heatmap visualization, and reference retrieval—are designed to support clinical decision-making, and a HITL interface is developed to integrate these functions throughout CliniCoCo.
Extensive experiments on real-world EMR data from two Chinese hospitals demonstrate CliniCoCo’s effectiveness in clinical settings, further supported by quantitative analysis, pilot studies, and expert interviews.

The Impact

With automatically optimised annotation workloads, the model can achieve F1 scores around 0.80–0.84. For an EMR with 30% mistaken codes, CliniCoCo can suggest halving the annotations from 3000 admissions with an ignorable 0.01 F1 decrease. In human evaluations, compared to manual coding, CliniCoCo reduces coding time by 40% on average and significantly improves the correction rates on EMR mistakes (e.g., three times better on missing codes). Senior professional coders’ performances can be boosted to more than 0.93 F1 score from 0.72.