0%

Vector(Matrix) Derivatives

Matrix/vector manipulation

You should be comfortable with these rules. They will come in handy when you want to simplify an expression before differentiating

  • (AB)T=BTAT(AB)^T = B^TA^T
  • (aTBc)T=cTBTa(a^TBc)^T = c^TB^Ta
  • aTb=bTaa^Tb = b^Ta (the result is a scalar, and the transpose of a scalar is itself)
  • (A+B)C=AC+BC(A + B)C = AC + BC (multiplication is distributive)
  • (a+b)TC=aTC+bTC(a + b)^T C = a^TC + b^T C
  • AB=/BAAB {=}\mathllap{/\,} BA (multiplication is not commutative)

Common Vector Derivatives

Scalar derivative
f(x)dfdxf(x) \to \frac{df}{dx}
Vector derivative
f(X)dfdXf(X) \to \frac{df}{dX}
bx → b xTBBx^TB \to B
bx → b xTbbx^Tb \to b
x2 → 2x xTx2xx^Tx \to 2x
bx22bxbx^2 \to 2bx xTBx2Bxx^TBx \to 2Bx

Common Derivatives Rule

  • dg(u)x=dg(u)ududx\frac{dg(u)}{x} = \frac{dg(u)}{u}\frac{du}{dx}

  • df(g(u))dx=d(f(g))dgdg(u)dududx\frac{df(g(u))}{dx} = \frac{d(f(g))}{dg}\frac{dg(u)}{du}\frac{du}{dx}

    • ugfu \to g \to f, all vector
  • XY=AX+Bl=f(Y)X \to Y = AX + B \to l = f(Y), dldX=ATdldY\frac{dl}{dX} = A^T\frac{dl}{dY} (“chain rule” for linear function)

    • XYlX \to Y \to l, l scalar, X, Y vector/matrix
    • proof, let dldX\frac{dl}{dX} is a m x 1 vector, dldY\frac{dl}{dY} is a n x 1 vector and dYdX\frac{dY}{dX} is a n x m vector

    dldX=dldYdYdX\frac{dl}{dX} = \frac{dl}{dY}\frac{dY}{dX}

    (dldX)T=(dYdX)T(dldY)T(\frac{dl}{dX})^T = (\frac{dY}{dX})^T(\frac{dl}{dY})^T

    dldX=(dYdX)TdldY\frac{dl}{dX} = (\frac{dY}{dX})^T\frac{dl}{dY}

  • XY=XA+Bl=f(Y)X \to Y = XA + B \to l = f(Y), dldX=dldYAT\frac{dl}{dX} = \frac{dl}{dY}A^T

    • XYlX \to Y \to l, l scalar, X, Y vector/matrix

Common Differentiate Rule

  • d(XT)=d(X)Td(X^T) = d(X)^T
  • d(X+Y)=d(X)+d(Y)d(X + Y) = d(X) + d(Y)
  • d(XY)=(dX)Y+X(dY)d(XY) = (dX)Y + X(dY)

From Differentiate to Derivatives

  • df=(dfdx)Tdxdf = (\frac{df}{dx})^Tdx

Derivatives of Matrices, Vectors and Scalar Forms

First-order(Linear Products)

Let f=xTaf = x^Ta, compute dfdx\frac{df}{dx}

Solution

df=d(xT)a=d(x)Ta=aTdxdfdx=adf = d(x^T)a = d(x)^Ta = a^Tdx \to \frac{df}{dx} = a

We also have d(aTx)dx=a\frac{d(a^Tx)}{dx} = a


Let f=aTXbf = a^TXb, compute dfdX\frac{df}{dX}

Solution

using AB=BAAB = BA when AB is a scalar,

df=aTdXb=baTdXdfdX=abTdf = a^Td{X}b = ba^TdX \to \frac{df}{dX} = ab^T


Let f=aTXTbf = a^TX^Tb, compute dfdX\frac{df}{dX}

Solution

df=aTd(XT)b=aTd(X)Tb=(d(X)a)Tb=bTd(X)a=abTd(X)dfdX=baTdf = a^Td(X^T)b = a^Td(X)^Tb = (d(X)a)^Tb = b^Td(X)a = ab^Td(X) \to \frac{df}{dX} = ba^T

Second-order(Quadratic Products)

Let f=bTXTXcf = b^TX^TXc, compute dfdX\frac{df}{dX}

Solution

df=bTd(XT)Xc+bTXTd(X)c(1)\tag{1} df = b^Td(X^T)Xc + b^TX^Td(X)c

From (1)

bTd(XT)Xc=(Xc)Td(X)b=b(Xc)Td(X)=bcTXTd(X)=(XcbT)Td(X)(2)\tag{2} b^Td(X^T)Xc = (Xc)^Td(X)b = b(Xc)^Td(X) = bc^TX^Td(X) = (Xcb^T)^Td(X)

bTXTd(X)c=cbTXTd(X)=(XbcT)Td(X)(3)\tag{3} b^TX^Td(X)c = cb^TX^Td(X) = (Xbc^T)^Td(X)

Replace (2),(3) into (1)

df=((Xcb)T+(XbcT)T)d(X)dfdX=X(cbT+bcT)df = ((Xcb)^T + (Xbc^T)^T)d(X) \to \frac{df}{dX} = X(cb^T + bc^T)


Let f=(Xθy)T(Xθy)f = (X\theta - y)^T(X\theta - y), compute dfdθ\frac{df}{d\theta}

Solution 1

df=d((Xθy)T)(Xθy)+(Xθy)Td(Xθy)(1)\tag{1} df = d((X\theta - y)^T)(X\theta - y) + (X\theta - y)^Td(X\theta - y)

From (1), we need to compute d((Xθy)T)d((X\theta - y)^T) and d(Xθy)d(X\theta - y)

d((Xθy)T)=(d(Xθy))T=(Xdθ)T(2)\tag{2} \begin{aligned} d((X\theta - y)^T) &= (d(X\theta - y))^T\\ &= (Xd\theta)^T \end{aligned}

d(Xθy)=Xdθ(3)\tag{3} \begin{aligned} d(X\theta - y) &= Xd\theta \end{aligned}

Apply (2)(3) into (1),

df=(Xdθ)T(Xθy)+(Xθy)TXdθ=(Xθy)TXdθ+(Xθy)TXdθ=2(Xθy)TXd(X)\begin{aligned} df &= (Xd\theta)^T(X\theta - y) + (X\theta - y)^TXd\theta \\ &= (X\theta - y)^TXd\theta + (X\theta - y)^TXd\theta \\ &= 2(X\theta - y)^TXd(X) \end{aligned}

From df=(dfdx)Tdxdf = (\frac{df}{dx})^Tdx,

dfdθ=2XT(Xθy)\frac{df}{d\theta} = 2X^T(X\theta - y)

Solution 2

dfdθ=dfd(Xθy)d(Xθy)dθ(1)\tag{1} \frac{df}{d\theta} = \frac{df}{d(X\theta - y)} \frac{d(X\theta - y)}{d\theta}

Let u=Xθyu = X\theta - y, we need to compute duTudu\frac{du^Tu}{du} and d(Xθy)dθ\frac{d(X\theta - y)}{d\theta}

df=duTu+uTdu=(du)Tu+uTdu=uTdu+uTdu=2uTdu\begin{aligned} df &= du^Tu + u^Tdu\\ &= (du)^Tu + u^Tdu\\ &= u^Tdu+u^Tdu\\ &= 2u^Tdu \end{aligned}

dfdu=2u(2)\tag{2} \frac{df}{du} = 2u

d(Xθy)=Xdθd(X\theta - y) = Xd\theta

Considering, dz=dzdxdxdz = \frac{dz}{dx} dx, when zz and xx are vectors

d(Xθy)dθ=X(3)\tag{3} \frac{d(X\theta - y)}{d\theta} = X

hence, apply (2) to (1),

dfdθ=2XT(Xθy)\frac{df}{d\theta} = 2X^T(X\theta - y)


J(θ)=ylog(sigmod(θTX))+(1y)log(1sigmod(θTX))J(\theta) = ylog(sigmod(\theta^TX)) + (1 - y)log(1 - sigmod(\theta^TX)), compute dJ(θ)θ\frac{dJ(\theta)}{\theta}
Solution
Let u=sigmond(θTX),v=θTXu = sigmond(\theta^TX), v = \theta^TX,

dJdθ=ylog(u)+(1y)log(1u)dududvdvθ\frac{dJ}{d\theta} = \frac{ylog(u) + (1 - y)log(1 - u)}{du}\frac{du}{dv}\frac{dv}{\theta}