Multi Organ Abdominal CT Segmentation

Overview

Accurate organ maps are the foundation for early cancer detection.

This project segments 24 abdominal structures from a single CT volume. On top of that segmentation sits an early, experimental head that estimates per organ cancer risk, a research direction rather than a diagnostic. Locating early, localized disease depends first on knowing precisely where each organ sits, a problem studied by the Johns Hopkins BodyMaps group.

Method

Model and training.

SuPreM initialized SwinUNETR

A 3D SwinUNETR backbone is initialized from the SuPreM pretrained weights and fine tuned on AbdomenAtlas. Transfer learning does most of the work when labeled data is limited.

Organ balanced sampling

Rare, thin structures such as vessels, the adrenal glands and the celiac trunk are sampled deliberately rather than left to chance. This raised mean Dice from 0.19 to 0.62 during training.

Experimental cancer risk head

An experimental per organ risk head is trained with synthetic tumor augmentation. It has not been validated on real tumor scans and is a research direction, not a diagnostic. Monte Carlo dropout maps show where the model is least certain, pointing to regions a radiologist may want to review.

Full volume inference

Gaussian weighted sliding window inference reconstructs a segmentation of the whole scan. Scores are measured on full volumes, the stricter approach used by benchmarks such as Touchstone.

Model pipeline

Per organ results

Segmentation quality, organ by organ.

Liver

0.95

Aorta

0.83

R. Lung

0.76

R. Kidney

0.75

Spleen

0.73

Pancreas

0.68

Stomach

0.62

Duodenum

0.52

The full 24 organ breakdown, including the harder thin structures such as the celiac trunk, hepatic vessel and femurs, is available in the technical report.

Results

Example results on held out scans.

Multi organ segmentation overlay on axial, coronal and sagittal CT views — Predicted organ masks overlaid on axial, coronal and sagittal views of a held out CT volume.

Bar chart of Dice score per organ — Per organ Dice similarity coefficient on the evaluation set, measured on full reconstructed volumes.

Limitations

The model was trained on 199 of the 9,262 available cases. Full volume mean Dice is 0.62, against 0.85 and higher reported at full scale. That gap reflects the volume of training data rather than the design of the model, and narrows as the data scales. The cancer risk head is experimental and has not been validated on real tumor scans.

PyTorchMONAISwinUNETR Hugging FaceWeights & BiasesKaggle

Built on

References

SuPreM. Li, Yuille, Zhou. How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks? ICLR 2024. github.com/MrGiovanni/SuPreM

AbdomenAtlas. Li, Qu, Chen, Bassi, et al. A Large Scale, Detailed Annotated, Multi Center Dataset. Medical Image Analysis, 2024.

Touchstone. Bassi, Li, Tang, et al. Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation? NeurIPS 2024.

Research access

Available for research use.

The trained model and evaluation code can be shared for academic and research purposes. To request access or discuss a collaboration, reach out on LinkedIn.

Open the live demo Connect on LinkedIn