Goal of the Course |
Data science deals with the huge flow of data generated in variety of disciplines and ranging from the internet, to biology to physics and astronomy. For instance, high energy physics experiments as well as astronomical observations can generate multiple terabytes of data per day and the analysis of these “big data” sets requires novel, efficient methods. Data science is an interdisciplinary field that unifies statistics, data analysis, machine learning and other related methods, in process of analyzing and understanding actual phenomena from data. Machine learning (ML) methods have recently played a very important role in making advances in data science. Deep Learning (DL), which is a sub-field of ML, has played a phenomenal role in making advances in many fields such as computer vision, speech recognition, machine translation, and robotics amongst other fields. In physics, ML and DL can detect and classify astronomical objects, track particles in detector arrays, and predict the state of complex, nonlinear dynamical systems. The purpose of this undergraduate course is to provide a hands-on introduction to the core concepts and tools of data science, machine learning and deep learning in a manner easily understood and intuitive to physicists. The course will focus on hands-on applications using physics-related datasets, present the key mathematical concepts, and emphasize the connections between ML and statistical physics. At the same time, students will be introduced to modern computational tools and programming languages (Python, Jupyter notebooks, modern ML/statistical packages). The course will provide an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML dealing with science, technology as well as industry where physicists are capable to contribute.
|
Syllabus |
Unit 1: Python and Tensoflow for Data Science Week 1. Introduction. Mathematical prerequisites. Computation and Representation. Setting-up the computational environment: Introduction to Python. Week 2. Introduction to Python (II). Setting-up the “computational narrative”: How to use Jupyter Notebooks. Week 3. Hands-on Python projects (I). Accessing data-sources. Setting-up Keras and TensorFlow. Introduction to TensorFlow primitives. Week 4. Hands-on Python projects (II). Probability and Statistics in Data Science. Using Python and TensoFlow to learn fundamental statistical and probabilistic approaches to understand and gain insights from data. Unit 2: Machine and Statistical Learning Week 5. The Fundamentals of Machine Learning: The ML Landscape, how to structure a ML project, Data Sources, Classification and Prediction. Training Models I: Linear Regression, Gradient Descent, Polynomial Regression. Week 6. Training Models II: Regularized Linear Models (Ridge Regression, Lasso Regression, Elastic Net, Early Stopping). Logistic Regression (Estimating Probabilities, Training and Cost Function, Decision Boundaries, Softmax Regression). Week 7. Support Vector Machines. Decision Trees. Ensemble Learning and Random Forests. Week 8. Dimensionality Analysis. Principal Components Analysis. Unit 3: Deep Learning Week 9. Neural networks and deep learning. Deep learning primitives, architectures, and frameworks. Applications using TensorFlow. Week 10. Fully-connected deep networks. Training Deep Neural Nets (I). Week 11. Training Deep Neural Nets (II). Convolutional Neural Networks. Recurrent Neural Networks. Week 12. Autoencoders. Reinforcement Learning. The future of deep learning. Week 13. Submission of Final Project.
|
Bibliography |
Bibliography available online, as well as: 1. For Hands-on Machine Learning, Neural Networks and Deep Learning: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron (2017).2. For a more thorough mathematical exposition (especially on the Deep Learning part): Deep Learning, by Goodfellow, Bengio and Courville (2016).
|