P3626 - Using Unsupervised Natural Language Processing to Automatically Identify Chronicity and Extent of Inflammation on Ileocolonoscopy From Pathology Reports
Nathaniel J. Spezia-Lindner, MD1, Scott Berger, MD1, Arash Maghsoudi, PhD1, Javad Razjouyan, PhD1, Manreet Kaur, MD2 1Baylor College of Medicine, Houston, TX; 2Mayo Clinic Arizona, Scottsdale, AZ
Introduction: Ascertaining disease location in inflammatory bowel disease requires manual review of unstructured pathology reports, which may vary in style and terminology. Natural Language Processing (NLP) technologies are used to extract data from free text format in the electronic medical record (EMR). We aimed to develop and validate an unsupervised NLP based algorithm to identify presence of ileal and/or colonic inflammation as well as differentiate acute from chronic inflammation from pathology reports within an integrated EMR system.
Methods: We developed an unsupervised, rule-based regular expression NLP algorithm to identify keywords corresponding to the findings of acute or chronic ‘ileitis’, ‘colitis’, ‘crypt architectural distortion’ and ‘granulomas’ alongside a list of negation terms within pathology reports. The algorithm’s performance was evaluated in comparison to authors NSL, SB, and MK's interpretation of the contents of the same pathology reports. A portion of the pathology reports were reviewed by multiple authors to ensure adequate intra-observer agreement. The algorithm’s performance was calculated as accuracy, sensitivity, precision, and F-measure.
Results: We queried 9508 pathology reports spanning a 36-month period and identified 649 reports with findings of acute or chronic inflammation on colonoscopy. The NLP algorithm demonstrated high accuracy in detecting acute colitis (93.5%), chronic colitis (80.2%), acute ileitis (96.4%), chronic ileitis (86.4%) and the presence of granulomas (98.7%) compared to manual review of pathology reports. Detailed performance across the variables studies is in the table below.
Discussion: Unsupervised NLP approach identified the location and chronicity of inflammation from biopsies with high degree of accuracy. We expect our algorithm’s performance to improve further with the utilization of training sets with expert input. Application of this algorithm has the potential to improve patient identification to enhance research and clinical care across large EMRs.
Disclosures:
Nathaniel Spezia-Lindner indicated no relevant financial relationships.
Scott Berger indicated no relevant financial relationships.
Arash Maghsoudi indicated no relevant financial relationships.
Javad Razjouyan indicated no relevant financial relationships.
Manreet Kaur indicated no relevant financial relationships.
Nathaniel J. Spezia-Lindner, MD1, Scott Berger, MD1, Arash Maghsoudi, PhD1, Javad Razjouyan, PhD1, Manreet Kaur, MD2. P3626 - Using Unsupervised Natural Language Processing to Automatically Identify Chronicity and Extent of Inflammation on Ileocolonoscopy From Pathology Reports, ACG 2023 Annual Scientific Meeting Abstracts. Vancouver, BC, Canada: American College of Gastroenterology.