Theoretical Foundations of Data Science

Description

In this course we introduce the cardinal topics of modern research in data science, and familiarize PhD students with fundamental solutions to research problems in those areas. In particular, we introduce fundamental principles of data system architecture; we discuss massive data analysis, and we examine the management of very large data systems, including questions of adaptivity and self-tuning; we present the fundamentals of data models and languages, especially in relation to semi-structured data, multi-media, temporal and spatial data; we analyze the problems of privacy, security, and trust in data systems; we analyze techniques for recognition, image analysis, computer vision, statistical methods for learning, representations for recognition and localization.

We investigate methods and algorithms for analyzing scientific data, social network analysis, recommender systems, mining sequences, time series analysis, online advertising, text/web analysis, topic modeling, mining temporal and spatial data, graph and link mining, rule and pattern mining. We introduce the concepts of dimensionality reduction and manifold learning, combinatorial optimization, relational and structured learning, classification and regression methods, semi-supervised learning, unsupervised learning including anomaly detection and clustering, kernel methods, compressed sensing and sparse modeling, graphical models, Bayesian methods, deep learning, hyper-parameter and model selection, Markov decision processes, reinforcement learning, dynamical systems and Hidden Markov Processes, recurrent networks.

The course aims to bring all students on the same page regarding the nature and orientation of state-of-the-art work in their field, so that they acquire both depth and breadth of knowledge.