# Course Info

**Lecture Location:** Tu/Th 3-4:15pm in Klaus 2443

**Professor:**

- Jacob Abernethy
- Email: prof_at_gatech_dot_edu
- Office Hours: Monday, 11am-12pm (Klaus 2134)

**Teaching Assistants:**

- Benjamin Bray
- Email: benrbray_at_gatech_dot_edu
- Office Hours: Fridays, 12:30-1:30pm (tables between Klaus 2116 and 2124)

- Naveen Kodali
- Email: nkodali3_at_gatech_dot_edu
- Office Hours: Wednesdays, 3:00-4:00pm (starting from Sep 5, tables between Klaus 2116 and 2124)

Please fill out the COURSE SURVEY!

This is an advanced course on algorithms. That is quite a broad topic, and in particular this semester’s course will focus heavily on **algorithms for machine learning**. We will be especially interested in diving into the following topics:

- Numerical Analysis
- Convex Geometry & Optimization
- Probability & Statistical Inference

While students should have a strong back background in core algorithmic concepts, linear algebra, calculus, and probability, we will review many of these topics early in the course. Students will be required to write code in Python, and we will present much of the material in the course using Jupyter Notebooks.

## Hands-on Format

In most courses, students learn about material in class through lecture, and then they practice problem solving on their own by doing homework. In this course we will do the opposite! Students will be required to read material before each class period, and then arrive in class ready to dive into problem-solving. This way, students can familiarize themselves with basic definitions and examples at home, and benefit from one-on-one interaction with the course staff during lecture while working through more challenging aspects of the material. Lecture notes for each day will be posted online at least one week prior to each lecture (with the first week as an exception).

Why do it like this? The lecture format is an outdated way to teach mathematical material, especially for topics such as algorithms and machine learning where it is so easy to play with code and implement ideas. The lecture format also limits the professor’s ability to interact directly with students.

Each class period will have the following structure:

*(0mins)*Students arrive and organize themselves by sitting with their group*(5mins)*Students take a very short and simple quiz on assigned reading material*(55mins)*Problem time! Instructor presents a sequence of problems, and students are given 5-10 minutes per problem to try to solve each one with their group. Instructors will move around the classroom to engage with students and answer questions.*(15mins)*Class wraps with a brief description of the forthcoming material for next period, in a “mini lecture”

## Grading

The course grades will be determined as follows. Note that the in-class quizzes will be graded generously, and 50% of the credit will be given simply for showing up and answering the questions.

Homeworks | 35% |

Attendance + Quizzes | 15% |

Midterm Exam | 20% |

Final Exam | 30% |

Quizzes will be entered electronically, via a web form, so make sure you have a phone, laptop, or tablet with you in class! If you don’t have access to any of these, please let the instructor know. The grading scheme will be:

- 2 points for a correct answer
- 1 point for an incorrect answer
- 0 points for not taking the quiz

Your **five lowest quiz scores will be dropped**, which should be enough to account for quizzes missed due to planned or unplanned absences.

Students are allowed, and indeed encouraged, to work on homework with other students in the course. But when solutions are written up, this should be done alone and without the help of other students. Students are required to specify on their writeups which students that collaborated with. If we find solutions that appear even remotely to have been copied, these will be given a zero and the students will be notified.

## Reading

Readings will be assigned for you to complete *before each lecture*. All required reading will either be linked to here or posted to canvas. You are not required to purchase a textbook for this course, but you may find the following books helpful.

- Boyd & Vandengerbhe,
*Convex Optimization*(Free PDF) - Trefethen & Bau,
*Numerical Linear Algebra*

## Important Dates

*(Monday 10/8 – Tuesday 10/9)*Fall recess, no class*(Tuesday, 10/16)***In-class Midterm**(tentative)*(Wednesday, 11/21 – Sun, November 25)*Thanksgiving Holiday*(Tuesday, 12/4)*Last day of class*(Thursday, 12/6)***Final Exam**from 2:40-5:30PM

# Homework

**Homework #1**due Sunday, September 9 @ 11:59pm

# Calendar

# Schedule

(Tu 8/21/18) **Lecture #1: Introduction & Perceptron**

No Required Reading

*Additional Resources*

- CMU 15-859(B), Lecture #4, The Perceptron Algorithm by Avrim Blum
- Raul Rojas, Neural Networks: A Systematic Introduction

(Th 8/23/18) **Lecture #2: Review of Linear Algebra & Intro to Numpy** (Lecture Slides)

*Required Preparation before Class*

- Notes, CS 4540: Python Basics
- Brush up on linear algebra!

*Additional Resources*

- 3blue1brown, Essence of Linear Algebra video series
- UNSW 2501: Linear Algebra Notes on Canvas
- MIT OCW 18.06 Linear Algebra Lecture Videos

(Tu 8/28/18) **Lecture #3: Convex Geometry** (Lecture Slides)

*Required Preparation before Class*

- Read Boyd & Vandenberghe, Convex Optimization
- §2.1 Affine and Convex Sets
- §2.2 Important Examples (of Affine and Convex Sets)
- §2.5 Separating & Supporting Hyperplanes

*Additional Resources*

- Stanford EE364a, Lecture #2: Convex Sets (Slides, Video)
- Jeffe, UIUC Computational Geometry, “Lecture 1: Convex Hulls”
- Lauritzen,
*Lectures on Convex Sets*, Ch 3: Separation (nice proof of Farkas lemma) - ETH Zürich, Approximate Methods in Geometry, “Chapter 1: Some Basic Geometry”
- David L. Finn, Rose-Hullman MA 323, “Barycentric Coordinates & de Casteljau’s Algorithm”

(Th 8/30/18) **Lecture #4: Review of Multivariable Calculus** (Lecture Slides)

*Required Preparation before Class*

- Brush up on single and multivariable calculus!
- Sequences, series, and limits
- Chain rule, product rule, quotient rule
- Mean value theorem, intermediate value theorem
- Taylor series

*Additional Resources*

- Randal J. Barnes, “Matrix Differentiation”
- Parr & Howard 2018, “The Matrix Calculus You Need for Deep Learning”
- Petersen & Pedersen 2012, “The Matrix Cookbook”
- 3blue1brown YouTube Channel
- Essence of Calculus #4, “Visualizing the chain rule and product rule”
- Essence of Calculus #6, “Implicit Differentiation, what’s going on here?”
- “What they won’t teach you in Calculus”

(Tu 9/4/18) **Lecture #5: Convex Functions & Intro to Optimization** (Lecture Slides)

*Required Preparation before Class*

- Boyd & Vandenberghe, Convex Optimization
- §3.1 Basic Properties & Examples of Convex Functions
- Skip §3.1.2 Extended-Value Extensions

- §3.1 Basic Properties & Examples of Convex Functions

*Additional Resources*

- Stanford EE364a, Lecture #3: Convex Functions (Slides, Video)
- Boyd & Vandenberghe, Convex Optimization
- §2.3 Operations that Preserve Convex Sets
- §3.2 Operations that Preserve Convex Functions
- §2.5 Separating & Supporting Hyperplanes

(Th 9/6/18) **Lecture #6: Linear Programming Introduction** (Lecture Slides)

*Required Preparation before Class*

- Tim Roughgarden, Stanford CS261, Lecture 7: Linear Programming (Pages 1-5)
- Try to get used to the matrix notation for linear programs! Think geometrically!

(Tu 9/11/18) **Lecture #7: Linear Programming Duality** (Lecture Slides)

*Required Preparation before Class*

- Tim Roughgarden, Stanford CS261, Lecture 8: Linear Programming Duality I (Pages 1-6)

*Additional Resources*

- Tim Roughgarden, Stanford CS261, Lecture 9: Linear Programming Duality II
- Jim Burke, University of Washington MATH 407

(Th 9/13/18) **Lecture #8: Positive Definiteness and Gradient Descent Intro** (Lecture Slides)

*Required Preparation before Class*

- Jonathan Shewchuk 1994, “Painless Conjugate Gradient” (Pages 1-17)
- We (probably) won’t cover Conjugate-Gradient, but these notes are a great intro gradient descent.
- We’ll cover the Jacobi method in more detail later, so don’t worry too much about §5.2

(Tu 9/18/18) **Lecture #9: Gradient Descent for Convex Functions** (Lecture Slides)

*Required Preparation before Class*

- Jonathan Shewchuk 1994, “Painless Conjugate Gradient” (Pages 17-21)
- Proof of convergence for gradient descent on quadratic forms

- Try to understand the definition of strong convexity in Boyd & Vandenberghe, “Convex Optimization” §9.1.2

*Additional Resources*

- Moritz Hardt, UC Berkeley EE227C
- Elad Hazan, “Introduction to Online Convex Optimization”, Chapters 2 & 3

(Th 9/20/18) **Lecture #10: Stochastic, Accelerated, and Conditional Gradient Descent** (Lecture Slides)

*Required Preparation before Class*

- Elad Hazan, “Introduction to Online Convex Optimization”, §7.1-7.4
- Understand Frank Wolfe (a.k.a. conditional gradient) at a high-level
- Compare the structure of the convergence proof to that of gradient descent from Lecture 9

*Additional Resources*

- Fabian Pedrigosa, “Notes on the Frank Wolfe Algorithm”
- Moritz Hardt, UC Berkeley EE227C, “Lecture 5: Conditional Gradient Method”

(Tu 9/25/18) **Lecture #11: Applications of Gradient Descent** (Lecture Slides)

*Required Preparation before Class*

- No new reading for today! We’ll talk about Frank-Wolfe, so remind yourself about the reading from L10.

*Additional Resources*

- Gabriel Goh,
*Distill*, “Why Momentum Really Works” - Moritz Hardt, “The Zen of Gradient Descent”
- Peter Absil, “Optimization on Matrix Manifolds” slides (and book)

(Th 9/27/18) **Lecture #12: Second-order Methods & Fixed Point Iteration** (Lecture Slides)

*Required Preparation before Class*

- Sauer,
*Numerical Analysis*(posted to Canvas)- §1.1 The Bisection Method (Pages 25-29)
- §1.2 Fixed Point Iteration (Pages 30-40)
- §1.4 Newton’s Method (Pages 51-58)

- Review the following topics from multivariable calculus:
- Multivariate Taylor’s Theorem
- Mean / Intermediate Value Theorems

*Additional Resources*

- Newton’s Method
- Boyd & Vandenberghe, Convex Optimization, §9.5 Newton’s Method
- Chris Hauser, “Multivariate Newton’s Method and Quasi-Newton”
- Wei-Ta Chu 2014, “Multivariate Newton’s Method” (slides)
- Ryan Tibshirani 2015, “Newton’s Method” (slides)
- Sean Harrington, “Solving Logistic Regression with Newton’s Method” (blog post; we’ll cover logistic regression later)

- Newton Fractals
- Simon Tatham, “Fractals Derived from Newton-Raphson Iteration”
- Daniel Dreibelbis, “Newton Fractals”
- Paul Bourke, “An Introduction to Fractals”

- Geoffrey Hinton, Coursera NNML, “A Brief Overview of Hessian-Free Optimization”
- Nykamp DQ,
*Math Insight*, “Introduction to Taylor’s Theorem for Multivariable Functions” - Sauer,
*Numerical Analysis*§1.3 briefly covers of conditioning / sensitivity, but we won’t focus on these topics in class. For a slightly more advanced treatment, see Trefethen & Bau,*Numerical Linear Algebra*§13-14.

(Tu 10/2/18) **Lecture #13: Matrix Decompositions & SVD** (Lecture Slides)

*Required Preparation before Class*

- Trefethen & Bau,
*Numerical Linear Algebra*(posted to Canvas)- Lecture 4: The Singular Value Decomposition (Pages 25-31)
- Lecture 5: More on the SVD (Pages 32-37)

*Additional Resources*

(Th 10/4/18) **Lecture #14: Numerical Methods for Linear Systems** (Lecture Slides)

*Required Preparation before Class*

- Understand §5.1-5.3 of Shewchuk, “Painless Conjugate Gradient”
- Understand the Gershgorin Circle Theorem (Wikipedia’s proof isn’t too bad!)

*Additional Resources*

- Golub & van Loan,
*Matrix Computations*, “§11.2: The Classical Iterations” (posted to Canvas) - Volker John, “Ch 3: Classical Iterative Schemes”
- Schonlieb, “Numerical Analysis” Lectures 16, 17, 18
- Gilbert Strang,
*Mathematical Methods for Engineers*, §6.2: Iterative Methods

(Th 10/11/18) **Lecture #15: Numerical Methods for Computing Eigenvalues** (Lecture Slides)

*Required Preparation before Class*

- Trefethen & Bau,
*Numerical Linear Algebra*(posted to Canvas)- Lecture 27: Rayleigh Quotient, Inverse Iteration (Pages 202-210)

*Additional Resources*

- Trefethen & Bau,
*Numerical Linear Algebra*(posted to Canvas)- Lecture 25: Overview of Eigenvalue Algorithms (read if you want some more context about computing eigenvalues)
- Lecture 28: QR Algorithm without Shifts (we won’t cover this, but the algorithm is interesting!)

(Tu 10/16/18) **Lecture #16: Random Variables and Maximum Likelihood Estimation** (Lecture Slides)

*Required Preparation before Class*

- Maleki and Do, Stanford Lecture Notes,
*Review of Probability Theory*- Brush up on your probability theory, and make sure you understand the distributions
*Binomial, Bernoulli, Poisson, Exponential, and Gaussian*

- Brush up on your probability theory, and make sure you understand the distributions
- Zheng, Missouri State Lecture Notes,
*Maximum Likelihood Estimation*- Read through pages 1-7

(Th 10/25/18) **Lecture #17: Bayes Rule, Binomial, Regression, MAP Estimation** (Lecture Slides)

*Required Preparation before Class*

- Kevin Murphy, “Machine Learning: A Probabilistic Perspective” book (on Canvas)
- §2.2: Probability Review (esp §2.2.3)
- §2.3.1-2.3.2: Binomial, Bernoulli, Multinomial, and Multinoulli distributions
- §3.2: Bayesian Concept Learning
- §3.5: Naive Bayes Classifiers
- Make sure you understand Bayes rule well by working out examples.

(Tu 10/30/18) **Lecture is Cancelled**

(Th 11/1/18) **Lecture #18: Beta-Binomial model, Conjugate Prior, Naive Bayes** (Lecture Slides)

*Required Preparation before Class*

- Dan Navarro, Amy Perfors In Introduction to the Beta-Binomial Model

(Tu 11/6/18) **Lecture #19: Logistic Regression Model** (Lecture Slides)

*Required Preparation before Class*

- Charles Elkan Maximum Likelihood, Logistic Regression,
and Stochastic Gradient Training
- You only need to read through the first 7 pages, although you are welcome to continue!

(Th 11/8/18) **Lecture #20: Clustering** (Lecture Slides )

*Required Preparation before Class*

- Lior Rokach and Oded Maimon Clustering Methods
- Read Sections 1-4, and Section 5.2

(Tu 11/13/18) **Lecture #21: Expectation Maximization** (Lecture Slides - Soon)

*Required Preparation before Class*

- Benjamin Bray Expectation Maximization
- Read pages 1-7

(Th 11/15/18) **Lecture #22: EM and the Gaussian Mixture Model** (Lecture Slides - Soon)

*Required Preparation before Class*

- Benjamin Bray Expectation Maximization
- Read pages 8-12

(Tu 11/27/18) **Lecture #23: Exponential Weights Algorithm** (No slides – board lecture)

*Required Preparation before Class*

- Online Learning/Exponential Weights
- You can skip Section 16.2 since we have already discussed the perceptron algorithm

(Th 11/29/18) **Lecture #24: Boosting**

*Required Preparation before Class*

- The Boosting Approach to Machine Learning
- You only need to read pages 1-5

*Additional Reading*