Clinical Data Science
Topics covered: overlap between epidemiology and data science, data exploration and visualization, data simplification and dimensionality reduction, creating understandable code (basic knowledge of R required)
Epidemiology and data science fields overlap considerably, since both fields employ quantitative methods to get actionable clinical insight from patient data. A data scientist’s skill is data-mining from data with a very large number of variables, called ‘high dimensional data’ (e.g. genomics, proteomics or imaging biomarkers) using computationally intensive techniques, whereas the epidemiologist’s skill is study design and robust statistical inference. In this clinical data science course, we will present the shared methods, contrast the different approaches, and then propose how epidemiology can be enhanced by using techniques from the data scientist’s toolkit. After this module, you should be able to conduct data exploration and build interesting data visualizations. You should develop the skills to apply data simplification and dimensionality reduction, so that robust statistical inferences can be made, and you should be able to use cluster analyses to detect groups of patients that are alike. Finally, you should be able to communicate, collaborate and inspire action with your future clients by creating readily understandable code and reproducible analyses including version control. You will work with real-life datasets using RStudio as the primary software for analyses, version control and reporting.
Assessment: group work, individual contribution to group work