Statistical Machine Learning

STAT5241 Sec01, Spring 2022

Fri 10:10am-12:40pm, ALL CLASS WILL BE OFFERED ONLINE ONLY before Feb 1st.

COVID-19 CORONAVIRUS: The policies set forth in this course are subject to change as we try to determine how best to keep you safe from the COVID-19 coronavirus while we provide the education we promised you.

Instructor: Xiaofei Shi xs2427[at]columbia(dot)edu
    TAs: Ling Chen lc3521[at]columbia(dot)edu
      Jaesung Son js4638[at]columbia(dot)edu
      Zhanhao Zhang zz2760[at]columbia(dot)edu
Office Hours: Office hours will be on zoom: Meeting ID: 984 6250 0455
    Xiaofei Shi: Thursday 2:30 pm - 4:00 pm
    Ling Chen: Monday 8:00 pm - 9:30 pm
    Jaesung Son: Wednesday 2:00 pm - 3:30 pm

Course Description: The course will provide an introduction to machine learning and its core models and algorithms. The aim of the course is to provide students of statistics with detailed knowledge of how machine learning methods work and how statistical models can be brought to bear in computer systems---not only to analyze large data sets, but to let computers perform tasks that traditional methods of computer science are unable to address. Examples range from speech recognition and text analysis through bioinformatics and medical diagnosis. This course provides a first introduction to the statistical methods and mathematical concepts which make such technologies possible.
Course Prerequisites: Statistics, (Calculus based) Probability, Linear Regression Models, Linear Algebra.

Homework: There will be four homework assignments, approximately evenly spaced throughout the semester. The homework will be posted on CourseWork. We will also use CourseWork for submitting and grading. We highly recommend using the discussion function on Coursework for discussion. Homeworks submitted after the deadline will not be considered, so please plan in advance. In the case of an emergency (sudden sickness, family problems, etc.), a reasonable extension will be assigned. But we emphasize that this is reserved for true emergencies.

Evaluation: 40% for Homework average + 40% for Project + 20% for Final.

Schedule

Date Topic Note
01/21 Introduction; MLE and MAP HW1 out
01/28 Modern Regression
02/04 Recitation 1
02/11 Decision Tree; K Nearest Neighbors HW2 out
02/18 Naive Bayes HW1 due
02/25 Support Vector Machine HW3 out
03/04 Boosting, Ensemble Methods; PCA HW2 due
03/11 Clustering; EM Algorithm Project Part 1
03/18 Spring Break; No Class
03/25 Optimization Methods HW4 out; HW3 due
04/01 Artificial Neural Networks
04/08 Introduciton to Deep Learning and Convolutional Neural Networks Project Milestone I due
04/15 Graphical Models and Sequence Models
04/22 Recurrent Neural Networks; Project Guide HW4 due

Logistics

Policy on Late Homework: All homework submitted after the solution is not going to be graded and you will receive zero credit for that homework.

Policy on Collaboration: You are encouraged to work together on the homework. Discussing the homework problems with one another can be a valuable learning experience. However, it is a violation of the rules on academic integrity to copy another student's solution and submit it as your own. You should write up your solutions separately, not referring to a common document. Furthermore, you should not submit any work that you do not fully understand. You should be able to start with a clean sheet of paper and without notes or assistance write out the solution to any homework solution you submit. If you will do that with every homework you submit, the similarity between your solutions and those of other students will not arouse suspicion. More importantly, you will be well prepared for the exams. You are not permitted to use homework solutions for this course from previous years or solutions you find from other sources, including the internet.

Take Care of Yourself:It is easy for me to say and hard for all of us, including me, to do, but taking care of your physical and mental health is essential, especially during the COVID-19 pandemic. Life is a marathon, and you need to pace yourself. Do your best to maintain a healthy lifestyle by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.
If you or anyone you know experiences extreme academic stress, difficult life events, or feelings of anxiety or depression, I strongly encourage you to seek support. Counseling and Psychological Services is here to help 24/7, and everything will be confidential: call 212-854-2878 or visit here.
In addition, consider reaching out to a friend, faculty or family member you trust for help getting connected to the support. Keep in mind that for serious psychological issues, the first counselor you meet with may not be the right one for you, but this does not mean you should give up on counseling. Keep looking for someone who can help you.
If you or someone you know is feeling suicidal or in danger of self-harm, call immediately, day or night:
    Counseling and Psychological Services: 212-854-2878
    If the situation is life threatening, call the police:
    • On campus: Columbia Police: 212-854-2797
    • Off campus: 911
If you have questions about this advice, your coursework, or anything else about which I might be helpful, please let me know.
Rubric and policies are designed with experience from MSML and MSCF program at CMU. Materials are based off lectures from Prof. Peter Orbanz and Prof. Cynthia Rush at Columbia University, 10-725 lectures at CMU from Prof. Ryan Tibshirani, and 10-701 lectures at CMU from Prof. Ziv Bar-Joseph, Prof. Pradeep Ravikumar and Prof. Aarti Singh.