Title: Deep learning-based survival analysis of omics and clinicopathological data
Abstract:
Consider the problem of assessing the success of a treatment A and treatment B in cancer patients. The response variable is Y, the patient’s survival in days, and the clinicopathological variables are age, sex, gene expression, and so on. Such data is often censored, meaning that some patients survived past the end of the study, while the actual survival time is unknown. One can not use regression to find an association between the clinicopathological variables and the response variable, and one can not use the two-sample test (neither Wilcoxon nor t-test) to compare those treated with A vs B. Ingenious algorithms were constructed to answer inferential questions. The three main methodologies of survival analysis are: (1.) the Kaplan-Meier estimate for a graphical comparison, e.g. --Do patients live longer with treatment A vs treatment B, (2.) the Log-rank test, a non-parametric two-sample comparison of censored data, permits one to answer the question: Is treatment A really better than treatment B, or this is just random variability? (3.) Cox proportional hazard model allows for a full regression analysis of censored data. From 1975 till now, survival analysis has been a gold standard in medical statistics. The deep era has brought undeniable improvement for the image modality, e.g. in radiomics. Yet, for omics data with p >> n, where p is the number of features and n is the number of patients, the improvement is questionable. What one would want from a deep variety of survival analysis? Do clinical protocols already exist? What the algorithm designer should keep in mind when proposing a new algorithm? The 2017–2024 period has been prolific in the area of the algorithms for deep-based survival analysis. We have searched the answers to the following three questions. (1) Is there a new “gold standard” already in clinical data analysis? (2) Does the DL component lead to a notably improved performance? (3) Are there tangible benefits of deep-based survival that are not directly attainable with non-deep methods
Audience Take Away Notes:
- Survival analysis is the “gold standard” of medical statistics. A non-expert will learn the three main methods that allow inference regarding the patients’ survival. Expert audience will learn about advantages and non trivial shortcomings of the deep learning paradigm in survival analysis
- Teaching personnel can include the new material as an expansion of the classical survival analysis, both in theory and labs, as all the mentioned methods have open source libraries. Clinical bioinformaticians can include the new deep methods, as their uses have appeared this year in the central clinical journals not only as technical research but as tools for routine analysis of clinical data
- Survival analysis is included in undergraduate curriculum for medical and technical students, however it is also a developing research field. It is central in bioinformatics analysis especially in cancer research