Teaching staff:
- Instructor and office hours:
- Bo Wang, Mon 2-3pm, gathertown
- TAs: Aida Ramezani (Head TA), Haonan Duan, Shihao Ma, Mustafa Ammous
Piazza: Students are encouraged to sign up Piazza to join course discussions. If your question is about the course material and doesn’t give away any hints for the homework, please post to Piazza so that the entire class can benefit from the answer.
Lecture and tutorial hours:
Time | Location | |
---|---|---|
Tutorial Lec0301 | Tues 1-2 pm Thur 1-3 pm | ES B142 |
Prerequisites: Students should have taken courses on machine learning, linear algebra, and multivariate calculus. Further, it is recommended that students have some basic familiarity with statistical concepts. Finally, students must be proficient in reading and writing Python code. For a list of courses that serve as prerequisites for the undergraduate version of this course, see here.
Course Overview:
Deep learning is the branch of machine learning focused on training neural networks. Neural networks have proven to be powerful across a wide range of domains and tasks, including computer vision, natural language processing, speech recognition, and beyond. The success of these models is partially thanks to the fact that their performance tends to improve as more and more data is used to train them. Further, there have been many advances over the past few decades that have made it easier to attain good performance when using neural networks. In this course, we will provide a thorough introduction to the field of deep learning. We will cover the basics of building and optimizing neural networks in addition to specifics of different model architectures and training schemes. The course will cover portions of the “Dive into Deep Learning” textbook.
Assignments:
Handout | Due | |
---|---|---|
Assignment 1 | pdf, starter code(make a copy in your own Drive) | Sept. 5(out), due Sept. 26 |
Assignment 2 | Sept. 26(out), due Oct. 17 | |
Assignment 3 | ||
Assignment 4 | ||
Course Project |
Calendar:
The following schedule is tentative; the content of each lecture may change depending on pacing. All readings refer to corresponding sections in “Dive into Deep Learning”S. Because the book is occasionally updated, the sections listed may become out-of-date. If a reading seems incongruous with the topic of the lecture, please let me know and I will check if the sections changed. Tutorials will more directly cover the background and tools needed for each homework assignment or, when preceding an exam, will consist of an exam review.
Date | Topic | Slides | Suggested Readings | Homework | |
---|---|---|---|---|---|
Lecture 1 | Sept 5 | Class introduction, linear & logistic regression | Slides | 2.1-2.7 (optional), 3.1-3.5, 4.1-4.5; Roger Grosse’s notes: Linear Regression, Linear Classifiers, Training a Classifier | H1 assigned |
Lecture 2 | Sept 12 | Multilayer Perceptrons & Backpropagation | Slides | 3.6-3.7, 5.1-5.4, 5.6; Roger Grosse’s notes: Multilayer Perceptrons, Backpropagation | |
Lecture 3 | Sept 19 | Optimization & Generalization | Slides | 12.1-12.6, 12.10; Roger Grosse’s notes: Automatic Differentiation, Distributed Representations, Optimization | |
Lecture 4 | Sept 26 | Convolutional Neural Networks and Image Classification | Slides | 7.1-7.5; Roger Grosse’s notes: ConvNets, Image Classification. Related papers: Yann LeCun’s 1998 LeNet, AlexNet | H1 due, H2 assigned |
Lecture 5 | Oct 03 | Batch/layer normalization, residual connections | Slides | 8.5-8.6, Roger Grosse’s notes: Generalization, Exploding Vanishing Gradients. Related papers: Dropout, ResNet | |
Lecture 6 | Oct 10 | Recurrent Neural Networks, sequence-to-sequence learning | Slides | 9.1-9.7, 10.1-10.8, Roger Grosse’s notes: RNNs, Exploding Vanishing Gradients. Related papers: LSTM, ResNet, Neural machine translation | |
Midterm exams | Oct 17 | H2 due, H3 assigned | |||
Lecture 8 | Oct 24 | Attention, Transformers and Autoregressive Models | 11.1-11.7, 15.8-15.10 | ||
Reading Week | Oct 31 | ||||
Lecture 9 | Nov 07 | Large language models | 11.8-11.9 | H3 due, H4 assigned | |
Lecture 10 | Nov 14 | Diffusion model, vision language model | |||
Lecture 11 | Nov 21 | Additional architecture grab bag: GNNs, autoencoders, UNet, MoE | |||
Lecture 12 | Nov 28 | Deep learning engineering; fairness, accountability, transparency, and recent trends in deep learning | 13.5-13.6, 4.7 | H4 due |
Logistics:
Grading:
- Homework, 50 points: There will be 4 homework assignments. Homework will consist of some combination of math and coding. Each homework is worth 12.5 points.
- Midterm, 20 points: The midterm will take place on 10/17 and will cover all topics discussed before the midterm.
- Final Project, 30 points: For the course project, you will implement a research idea related to the course material. Details will be released later.
Late work, collaboration rules, and the honor code:
Every student has a total of 7 grace days to extend the coursework deadlines through the semester. Each grace day allows for a 24 hours deadline extension without late penalty. That is, you may apply the grace days on a late submission to remove its late penalty. The maximum extension you may request is up to the remaining grace days you have banked up. We will keep track of the grace days on MarkUs. After the grace period, assignments will be accepted up to 3 days late, but 10% will be deducted for each day late, rounded up to the nearest day. After that, submissions will not be accepted and will receive a score of 0.
You are welcome to work together with other students on the homework. You are also welcome to use any resources you find (online tutorials, textbooks, papers, chatbots, etc.) to help you complete the homework. However, you must list any collaboration or resources you used to complete each homework on each assignment. If you hand in homework that involved collaboration and/or makes use of content that you did not create and you do not disclose this, you will get a 0 for that homework. In addition, it is likely that you will be able to use some resource (be it another student, ChatGPT, or whatever) that can help you solve many of the homework problems. However, note that if you rely too much on such resources you will likely not learn the material and will do poorly on the exams, during which such resources will not be available.
Resource:
Type | Name | Description |
---|---|---|
Related Textbooks | Deep Learning (Goodfellow at al., 2016) | The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning. |
Information Theory, Inference, and Learning Algorithms (MacKay, 2003) | A good introduction textbook that combines information theory and machine learning. | |
General Framework | PyTorch | An open source deep learning platform that provides a seamless path from research prototyping to production deployment. |
Computation Platform | Colab | Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. |
GCE | Google Compute Engine delivers virtual machines running in Google’s innovative data centers and worldwide fiber network. | |
AWS-EC2 | Amazon Elastic Compute Cloud (EC2) forms a central part of Amazon.com’s cloud-computing platform, Amazon Web Services (AWS), by allowing users to rent virtual computers on which to run their own computer applications. |