Veridical Data Science

Tuesday, December 8 2020, 4pm

Zoom

ASA_GA_Winter_Lecture_2020_1_Bin_Yu.pdf (250.18 KB)

Dr. Bin Yu

Departments of Statistics and EECS

UC Berkeley

Veridical Data Science

Veridical data science extracts reliable and reproducible information from data, with an enriched technical language to communicate and evaluate empirical evidence in the context of human decisions and domain knowledge. Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework for veridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. Moreover, we propose the PDR desiderata for interpretable machine learning as part of veridical data science (with PDR standing for predictive accuracy, predictive accuracy and relevancy to a human audience and a particular domain problem). The PCS framework will be illustrated through the development of iterative random forests (iRF) for extracting predictable and stable non-linear interactions in genomics studies. Finally, a general DNN interpretation method based on contextual decomposition (CD) will be discussed with applications to sentiment analysis and cosmological parameter estimation.

Video recording: https://www.youtube.com/watch?v=TW1H6-aDZ5Q

Support us

We appreciate your financial support. Your gift is important to us and helps support critical opportunities for students and faculty alike, including lectures, travel support, and any number of educational events that augment the classroom experience. Click here to learn more about giving.

Slideshow

Veridical Data Science

Support us