BMJ Health & Care Informatics; Jung In Park, Selen Bozkurt, Jong Won Park, Sunmin Lee; Published January 19, 2023; DOI: 10.1136/bmjhci-2022-100666



Survival machine learning (ML) has been suggested as a useful approach for forecasting future events, but a growing concern exists that ML models have the potential to cause racial disparities through the data used to train them. This study aims to develop race/ethnicity-specific survival ML models for Hispanic and black women diagnosed with breast cancer to examine whether race/ethnicity-specific ML models outperform the general models trained with all races/ethnicity data.


We used the data from the US National Cancer Institute’s Surveillance, Epidemiology and End Results programme registries. We developed the Hispanic-specific and black-specific models and compared them with the general model using the Cox proportional-hazards model, Gradient Boost Tree, survival tree and survival support vector machine.


A total of 322 348 female patients who had breast cancer diagnoses between 1 January 2000 and 31 December 2017 were identified. The race/ethnicity-specific models for Hispanic and black women consistently outperformed the general model when predicting the outcomes of specific race/ethnicity.


Accurately predicting the survival outcome of a patient is critical in determining treatment options and providing appropriate cancer care. The high-performing models developed in this study can contribute to providing individualised oncology care and improving the survival outcome of black and Hispanic women.


Predicting the individualised survival outcome of breast cancer can provide the evidence necessary for determining treatment options and high-quality, patient-centred cancer care delivery for under-represented populations. Also, the race/ethnicity-specific ML models can mitigate representation bias and contribute to addressing health disparities.


artificial intelligence; health equity; informatics; machine learning