Skip to content Skip to navigation

Radbank Data Warehouse: from imaging to molecular phenotype

Radiologists frequently need to search for cases with particular imaging findings for teaching or identify patient cohorts with particular diseases for research studies. Searching for case material is often a difficult and time-consuming task, relying on chart reviews, searches of case logs and individual reports, or working with administrators of medical records or PACS systems. In addition, coverage is limited to the source of information, varying depending on the hospital database and its purpose in the enterprise. Current picture archiving and communication systems (PACS) are rich sources of historical image data, but they generally lack the full scope of patient information needed for research and teaching, such as radiology reports, pathology reports, diagnostic codes, and laboratory data. Even those PACS systems containing radiology reports (RIS/PACS systems) generally do not provide the ability to search for cases that have particular findings or diagnoses, because they are production clinical systems and were not designed to meet the specific needs of radiology teaching and research.
We have built Radbank, a data warehouse specialized to integrating related data stored in different source databases that have diverging representations or storage formats. Radbank integrates radiology, pathology, and clinical data into a single searchable resource. This resource will enable research and teaching activities, and can be extended to link to other types of information in the future, such as molecular data.


Architecture: RadBank was built with open source software tools on a Linux platform with a relational database. We created a text report parsing module that recognizes the structure of radiology reports and makes individual sections available for indexing and search. A database schema was designed to link radiology and pathology reports and to enable users to retrieve cases using flexible queries. Radbank is kept current via an HL7 parsing module. Radbank is unique in parsing all text reports to identify the report sections, marking them with XML tags to enable section-specific report indexing and search.



Radbank Quieries: All information is stored in a relational database which can be queried using standard SQL queries. We have processed many complex queries such as "find patients who showed pathologically-proven retained products of conception and who had an ultrasound within 6 weeks of the pathology specimen"


Recently, we built a related resource called RadTF, a searchable resource containing s subset of the report data in Radbank and in which all data is de-identified. RadTF enables users to search reports for teaching purposes and is currently being deployed in the PACS.