Title: Information leakage: Types, remedies, and open problems
Abstract:
Information Leakage threatens and questions the use of machine learning model in real-life clinical applications. In effect, information leakage is similar to vulgar overfitting, yet rather more subtle and even when detected much harder to remove. Some recent research indicates that if overfitting is removed, deep neural networks perform systematically worse than linear regression models. This statement is not very far from our results in survival analysis. There are different types of leakage and some are specific to deep neural netowrks,e.g. the effects of pretraining have not been thoughreghly studied. In the talk, I will review the current understanding of what is information leakage and its subtypes. The types and examples were largely defined within different applications of machine learning. The RQ asked is: -- Is there anything a clincial bioinformatician should learn from the current concerns and work done in chemoinformatics, political science etc. Do the protocols of analysis keep us safe and where it is dangerous waters?