Shawn's Blog

The Back Propagation Algorithm

Posted on 2021-01-30 Edited on 2021-04-17 In Tech , Machine Learning

Artificial Neural Networks

Artificial Neural Networks (ANNs) are loosely modeled after the brain. ANNs are composed of units (also called nodes) that are modeled after neurons, with weighted links interconnecting the units together.

What ANN?

Overall, the ANN contains input layer, hidden layer and the output layer.

Vector(Matrix) Derivatives

Posted on 2020-01-07 Edited on 2021-04-17 In Tech , Machine Learning

Matrix/vector manipulation

You should be comfortable with these rules. They will come in handy when you want to simplify an expression before differentiating

$(AB)^T = B^TA^T$
$(a^TBc)^T = c^TB^Ta$
$a^Tb = b^Ta$ (the result is a scalar, and the transpose of a scalar is itself)
$(A + B)C = AC + BC$ (multiplication is distributive)
$(a + b)^T C = a^TC + b^T C$
$AB {=}\mathllap{/\,} BA$ (multiplication is not commutative)

Linear Regression

Posted on 2020-01-06 Edited on 2021-04-17 In Tech , Machine Learning

Linear Function

Let’s say we decide to approximate y as a linear function of x

h_{\theta}(x) = \theta_0 * x_0 + \theta_1 * x_1 + ... + \theta_n * x_n = \sum_{i=0}^n \theta_i * x_i = \begin{pmatrix} \theta_0 \\ \theta_1 \\ ... \\ \theta_n \end{pmatrix}^T * \begin{pmatrix} x_0 \\ x_1 \\ ... \\ \theta_n \end{pmatrix}

where the $\theta_i$ ’s are the parameters (also called weights) parameterizing the space of linear functions mapping from $X$ to $Y$ , $\theta$ and $x$ are both $n * 1$ vectors
In case of a collection of dataset $x$ and $y$ , let’s say $x^j$ and $y^j$ is the $j^{th}$ data, the linear function can be written as

H_{\theta}(X) = \begin{pmatrix} y_0 \\ y_1 \\ ...\\ y_n \end{pmatrix} = \begin{pmatrix} \theta_0 * x_{0}^{0} + \theta_1 * x_{1}^{0} + ... + \theta_n * x_{n}^{0} \\ \theta_0 * x_{0}^{1} + \theta_1 * x_{1}^{1} + ... + \theta_n * x_{n}^{1} \\ ... \\ \theta_0 * x_{0}^{m} + \theta_1 * x_{1}^{m} + ... + \theta_n * x_{n}^{m} \end{pmatrix} = \begin{pmatrix} \sum_{i=0}^n \theta_i * x_i^0 \\ \sum_{i=0}^n \theta_i * x_i^1 \\ ... \\ \sum_{i=0}^m \theta_i * x_i^m \end{pmatrix} = \begin{pmatrix} x_0^0, x_1^0, ..., x_n^0 \\ x_0^1, x_1^1, ..., x_n^1 \\ ... \\ x_0^m, x_1^m, ..., x_n^m \end{pmatrix} * \begin{pmatrix} \theta_0, \theta_1, ..., \theta_n \end{pmatrix} ^T = X * \theta