News

NIH Clinical Center Releases 32,000-Image CT Data Set

The National Institutes of Health's (NIH) Clinical Center has made a large-scale data set of CT images publicly available to help the scientific community improve detection accuracy of lesions. While most publicly available medical image data sets have less than 1,000 lesions, this data set, named DeepLesion, has over 32,000 annotated lesions identified on CT images. The images, which have been thoroughly anonymized, represent 4,400 unique patients, who are partners in research at the NIH. In 2017, the Clinical Center released anonymized chest X-ray images and their corresponding data.

Once a patient steps out of a CT scanner, the corresponding images are sent to a radiologist to interpret. Radiologists at the Clinical Center then measure and mark clinically meaningful findings with an electronic bookmark tool. Similar to a physical bookmark, radiologists save their place and mark significant findings to be able to come back to them at a later time. These bookmarks are complex; they provide arrows, lines, diameters, and text that can tell the exact location and size of a lesion so experts can identify growth or new disease. 

The bookmarks, abundant with retrospective medical data, are what scientists used to develop the DeepLesion data set. DeepLesion is unlike most lesion medical image data sets currently available, which can only detect one type of lesion. The database has great diversity: It contains many types of critical radiology findings from across the body, such as lung nodules, liver tumors, enlarged lymph nodes, and other features. 

Conventional methods, such as search engines, cannot be applied in the medical image domain for collecting image labels. Medical image annotations require extensive clinical experience, but that could change. The released data set is large enough to train a deep neural network; it could enable the scientific community to create a large-scale universal lesion detector with one unified framework.

Researchers hope the release of the data set enables the following:

• development of a universal lesion detector that will help radiologists find all types of lesions. It may open the possibility to serve as an initial screening tool and send its detection results to other specialist systems trained on certain types of lesions;
• data mining and study of the relationship between different types of lesions. In DeepLesion, multiple findings are often marked in one CT exam image. Researchers are able to analyze their relationship to make new discoveries; and
• more accurate and automatical measurement of all of the lesions a patient has, enabling whole-body assessment of cancer burden.

In the future, the Clinical Center hopes to keep improving the DeepLesion data set by collecting more data, thus improving its detection accuracy. The universal lesion detecting capability will become more reliable, once researchers are able to leverage 3D and lesion type information. It may be possible to further extend DeepLesion to other image modalities, such as MRI, and combine data from multiple hospitals, as well. 

— Source: National Institutes of Health