KDD 2017 Tutorial

Safe Data Analytics: Theory, Algorithms, and Applications

Abstract

Data science is penetrating virtually every aspect of our society. However, data science algorithms and systems, including data acquisition and processing pipelines and analytical techniques, are becoming increasingly complex. Many data science algorithms and systems are not transparent to the end-user. For example, how the underlying models work and when such models may fail, are not clear. Many approaches, especially those that apply to human subjects, may learn and reinforce pre-existing biases leading, for example, to unfair treatment of minority sections of a population. To enable the widespread adoption of data science approaches it is necessary to construct data analytics that operate safely and securely, in a controlled and transparent manner. However, current research in this area is very limited.

In this tutorial, we plan to cover three aspects of safe data analytics, namely, transparency, fairness and security. We present several real-world applications of safe data science to illustrate the importance of the topic. We review recent research efforts in data mining and machine learning to achieve safe data science based on different techniques and evaluation metrics. We conclude the tutorial by pointing out remaining challenges in current research and future directions.

Tutorial Slides

Part I: Introduction

Part II: Transparent and Interpretable Machine Learning

Part III: Fair Machine Learning

Presenters

Jun (Luke) Huan is a professor in the Department of Electrical Engineering and Computer Science, University of Kansas. He directs the Data Science and Computational Life Sciences Laboratory at the KU Information and Telecommunication Technology Center (ITTC). He also holds courtesy positions at the KU Bioinformatics Center, the KU Bioengineering Program and and a visiting professorship from GlaxoSmithKline plc. He is currently serving as a Program Director at National Science Foundation (NSF), in the Division of Information and Intelligent Systems (IIS) within the Directorate for Computer & Information Science & Engineering (CISE).

Chao Lan will be an assistant professor in the Department of Computer Science at the University of Wyoming in late August 2017. He received his Ph.D. in computer science from the University of Kansas, advised by professor Jun (Luke) Huan. He has broad research interests in machine learning and its applications. He is currently investigating fairness-aware machine learning; his earlier research topics include multi-view learning, matrix recovery, subspace learning and face recognition.

Xiaoli Li is currently a Ph.D candidate in the Department of Electrical Engineering and Computer Science at the University of Kansas, advised by professor Jun (Luke) Huan. Her research interests include transparent machine learning, Bayesian non-parametric methods, and multi-task, multi-view and multi-label learning.