Imaging Informatics: Anonymously Yours
By Dave Yeager
Radiology Today
Vol. 18 No. 8 P. 8
Researchers at Mount Sinai Health System in New York have a new way to conduct randomized, double-blinded studies. In March, the Mount Sinai Department of Radiology and the Mount Sinai Translational and Molecular Imaging Institute (TMII) launched the Imaging Research Warehouse (IRW), a database that combines the clinical imaging and digital health records of more than one million patients in a deidentified format. David Mendelson, MD, vice chair of radiology at Mount Sinai Health System, a professor of radiology at the Icahn School of Medicine at Mount Sinai, and one of the IRW's creators, says the effort is intended to fill a gap in radiology research.
"One of the criticisms of the radiology literature, and medical literature in general, is that there really has been a paucity of double-blinded, randomized controlled clinical trials," Mendelson says. "One of the powerful pieces of this is the association of the clinical data with the imaging exams themselves. That's something that I think distinguishes this from many other current efforts."
Unmasking to Deidentify
Mendelson says the idea for the IRW goes back about six years. Mount Sinai researchers were often asking for deidentified images for studies, and Mendelson wanted to put a mechanism in place to make that type of data easily accessible. He proposed building a mirrored PACS with pseudoanonymized patient data—new identities, new medical record numbers, substitutions for any information that can be used to identify patients—and a secure crosswalk table so clinicians can backtrack to a patient if it's medically necessary. The mirrored PACS would be tied to the Mount Sinai Data Warehouse, a clinical repository.
Mendelson began to seek funding five years ago, but it wasn't until Mount Sinai's status as a hub for the National Institutes of Health's Clinical and Translational Science Awards (CTSA) Program was up for renewal two years ago that he found the vehicle for the IRW; the IRW funding was written into the CTSA proposal. Mendelson and Zahi Fayad, PhD, endowed chair in medical imaging and bioengineering, professor of radiology and medicine (cardiology), director of the TMII at the Icahn School of Medicine at Mount Sinai, and one of the IRW's creators, then put together a team to build and expand the IRW. In January 2016, Mount Sinai began building a mirrored PACS to house the deidentified imaging data.
Mendelson evaluated several programs that can deidentify data, but many don't perform well with bulk volumes of imaging. To solve this issue, Mount Sinai licensed a vendor neutral archive (VNA) from Vital Images and used the VNA's deidentification engine. The next step was to find the places in the DICOM metadata where vendors may have stored protected health information (PHI). Mendelson says there are 18 known fields where PHI is stored that are documented in academic literature.
"That's the easy part because everybody knows where they are," Mendelson says. "The problem was there are many miscellaneous/custom/private tags in the DICOM metadata that people can use to store data, and vendors frequently use these blank fields that are available to put some information that might be pertinent to the exam. So, we spent a year taking all of the modalities at Mount Sinai, one by one, sending exams through this engine, and looking for places where you might find PHI buried."
The researchers found a significant amount of hidden PHI, which was catalogued and deidentified. In addition, Mendelson has almost completed his evaluation of tools that remove pixel data, a necessity because some vendors enable pixel burning on the corners of exams to embed patient data; as long as it's known where each modality embeds those data, algorithms can be used to routinely remove them.
By then end of 2016, Mount Sinai was ready to make the IRW operational. The system went into production in March 2017. For the first two weeks, only brain MRs were fed into the system. Since then, all modalities except ultrasound have been incorporated, but Mendelson expects ultrasound to be added soon. The IRW is available at Mount Sinai's main campus but will eventually be available to all Mount Sinai users.
For Who? For What?
Mendelson envisions a wide range of applications for the IRW. For example, a researcher who wants to understand the correlation between lesions and radiation dose related to low-dose chest CT can do that. Although the dose will be low in all cases, variations among machines made by different vendors can make quantifying dose difficult. The quantitative imaging information embedded in DICOM metadata coupled with patient data allow investigators to examine the question without clinical bias.
"Now, when you're doing this clinically, all you care about is low-dose CT and whether there is a nodule or not. But, if you're a research person who wants to make strides in further dose reduction, what you really need is the detailed data about how the exam was conducted: milliamperage, kilovoltage, slice thickness, all of those parameters which are fixed," Mendelson says. "I can give that to you now while blinding you to who the patient is, but you can look at a thousand chest CTs, and you'll know exactly what the technical parameters were, you'll know exactly what the radiation dose was, and you can begin to conduct research about how to lower dose and maintain lesion conspicuity, on that basis."
Mendelson says there are many use cases that will immediately benefit from the IRW. A significant one is genomics. The phenotypic information provided by imaging exams is needed to correlate the associations between genomic findings and how diseases manifest, he says.
Another important use case is machine learning and artificial intelligence (AI). The growing need to train AI algorithms makes large patient databases invaluable. Mendelson says there are still some details to work out, but the IRW will possibly be able to assist with these efforts.
"As much as we intended the IRW for basic, core research, there are all of these machine learning companies springing up, now—it started with IBM, when they purchased Merge a couple of years ago—and they all want as many images as they can get to train on," Mendelson says. "Their problem is getting those images, and there are a lot of issues to be solved about who can release images. Potentially, our library of deidentified images is very useful for this. We're going carefully, but we have people starting to do machine learning within Mount Sinai so that's one venue, and we would probably carefully entertain outside partners, going forward."
Mendelson emphasizes, however, that whatever the use case may be, the IRW can play a role.
"I think the initial notion [of researchers] was that just having images is enough," Mendelson says, "but images that have been annotated and curated with clinical and pathologic data offer the richest value for research and machine learning."
— Dave Yeager is the editor of Radiology Today.