CS 229: Machine learning- Notes on Lecture #2

by Amit

In the second lecture, Prof. Andrew Ng starts talking about supervised learning methods. He begins with Linear Regression, in which the relationship between the input is assumed to be linear, such as  h_\theta(x)=\theta_0 + \theta_1x. The parameters \theta_0 and \theta_1 need to be found, for which couple of approaches are discussed:

  1. The first one involves minimizing the function:                                                                                                     J(\vec{\theta})=\frac{1}{2}\sum_{i=1}^m(h_\theta(x^{(i)}-y^{(i)})^2, where m is the number of training examples, x(i) is the ith sample input, and y(i) is the corresponding output. Couple of methods are discussed for the minimization task above. The first method that is discussed is the Gradient Descent method, which roughly is:
    1. Start with some value of the \vec{\theta} (say \vec{\theta} = \vec{0})
    2. Keep updating \vec{\theta} to reduce J(\vec{\theta}) as follows:                                           {\theta}_i = {\theta}_i - \alpha \frac{\partial}{\partial {\theta}_i} J(\theta)
    3. Stop when a desired reduced value of J(\vec{\theta}) is reached.
  2. A faster method in case of large data sets is the stochastic gradient descent method is then described

  3. The second approach to estimate the value of \vec{\theta} uses linear algebraic techniques to obtain a closed form formulae for the parameters, \vec{\theta}

This is the video lecture:

As I told in the notes of my first lecture, this is a good time to review the section notes on Linear Algebra. Lecture Notes 1 have some notes on the first two lectures.

Looks like we will do a lot of regression in the next lecture. See you then!