A Large Language Model Based Pipeline for Review of Systems Entity Recognition from Clinical Notes

Abstract

Objective: Develop a cost-effective, large language model (LLM)-based pipeline for

automatically extracting Review of Systems (ROS) entities from clinical notes.

Materials and Methods: The pipeline extracts ROS sections using SecTag, followed by

few-shot LLMs to identify ROS entity spans, their positive/negative status, and associated

body systems. We implemented the pipeline using open-source LLMs (Mistral, Llama,

Gemma) and ChatGPT. The evaluation was conducted on 36 general medicine notes

containing 341 annotated ROS entities.

Results: When integrating ChatGPT, the pipeline achieved the lowest error rates in

detecting ROS entity spans and their corresponding statuses/systems (28.2% and 14.5%,

respectively). Open-source LLMs enable local, cost-efficient execution of the pipeline

while delivering promising performance with similarly low error rates (span: 30.5–36.7%;

status/system: 24.3–27.3%).

Discussion and Conclusion: Our pipeline offers a scalable and locally deployable solution

to reduce ROS documentation burden. Open-source LLMs present a viable alternative to

commercial models in resource-limited healthcare environments.

Keywords: review of systems, clinical note, natural language processing, large language

model, open-source, LangChain pipeline.

Dr. Elton Isaj Health Horizons Medicina e chirurgia oncologica