Skip to content Skip to navigation

Natural Language Processing in Radiology

Medical imaging is not always unequivocally diagnostic, and thus radiology reporting often contains terminology that express degrees of certainty. Although noncommittal reporting is often criticized ("hedging"), providing differential diagnoses or expressing uncertainty in imaging interpretations and recommendations are necessary given the limitations of medical imaging. However, expressing uncertainty is not simple, and radiologists must choose accurate terminology to clearly communicate the level of uncertainty in their observations and inferences as well as recommendations for additional care or follow-up. Unclear communication may not only result in additional tests or radiation, but delay in care and even litigation (2). Characterizing how, when, and why radiologists express uncertainty in reporting may contribute to standards, similar to mammography, to improve consistency and clarity in communicating with clinicians. Natural language processing (NLP), the process of extracting and structuring natural language into formal computer representations, can process the massive data necessary to study variations in uncertainty and recommendation reporting. An NLP system can also assist in studying studying practice patterns auto-generating feedback summaries for radiologists comparing initial knee x-ray dictations containing hedges or recommendations with the relevant follow-up x-ray, CT, or MRI report. Furthermore, real-time NLP implementation includes over-reading in which a report containing an uncertainty is flagged by RIS and routed to a subspecialist colleague for second read. Most important, such systems can be transparently deployed in high-volume practice settings.


We are developing methods to: (1) develop a natural language processor to detect uncertainty and recommendations in radiology reports, (2) validate the NLP system, and (3) characterize uncertainty and recommendation reporting in example settings. We define uncertainty detection as the identification of uncertainty signals. An uncertainty signal is a pattern in the sequence of words in the text, which includes a word or phrase, that confers uncertainty to the referred observation or inferential concept. For example, in the sentence findings could represent pneumonia, the adverb (and uncertainty signal) could modifies the verb represent, conferring inferential uncertainty to the subject, findings. Uncertainty signals can be expressed as regular expressions. A regular expression provides a concise and flexible means for identifying text of interest when searching the text. For example, the regular expression can\s?not matches cannot and can not, providing a signal of uncertainty in statements such as Cannot exclude pneumonia. Uncertainty signals can be concept pairs (eg inflam.+infect matches sentences containing both inflammation and infection; the presence of both concepts signal the discussion of a differential diagnosis) and classification. For example, the BIRADS 3 and 4 terminology used in mammography designates probable benign and malignant disease states, respectively. These properties reduce the necessary phrase permutations with the lexicon and improve algorithm speed. Because uncertainty can be expressed in many ways, we have designed our framework to be highly extensible to accommodate the variety of signals for uncertainty. We built an uncertainty signal database comprising a list of regular expressions used to detect uncertainty in text. The uncertainty signal database is stored as a text file and can be easily edited.


To test our methods, we analyzed 1,232 sentences from 684 unique reports. The NLP system classified the sentences as follows: 173 (14%), 87 (7.1%), and 972 (79%) as uncertain, recommendation, and declarative, respectively. Precision, recall, and specificity for uncertainty detection by the NLP system were 97.6, 97.1, and 99.6, respectively. The overall accuracy of uncertainty detection was 99.3%. Precision, recall, and specificity for recommendation detection were 90.9, 92, and 99.3, respectively. The overall accuracy of recommendation detection was 98.8%. Figures 2 and 3 provide confusion matrices for uncertainty and recommendation detection, respectively.