Matrix/vector manipulation
You should be comfortable with these rules. They will come in handy when you want to simplify an expression before differentiating
- (AB)T=BTAT
- (aTBc)T=cTBTa
- aTb=bTa (the result is a scalar, and the transpose of a scalar is itself)
- (A+B)C=AC+BC (multiplication is distributive)
- (a+b)TC=aTC+bTC
- AB=/BA (multiplication is not commutative)
Common Vector Derivatives
Scalar derivative f(x)→dxdf |
Vector derivative f(X)→dXdf |
bx → b |
xTB→B |
bx → b |
xTb→b |
x2 → 2x |
xTx→2x |
bx2→2bx |
xTBx→2Bx |
Common Derivatives Rule
-
xdg(u)=udg(u)dxdu
-
dxdf(g(u))=dgd(f(g))dudg(u)dxdu
- u→g→f, all vector
-
X→Y=AX+B→l=f(Y), dXdl=ATdYdl (“chain rule” for linear function)
- X→Y→l, l scalar, X, Y vector/matrix
- proof, let dXdl is a m x 1 vector, dYdl is a n x 1 vector and dXdY is a n x m vector
dXdl=dYdldXdY
(dXdl)T=(dXdY)T(dYdl)T
dXdl=(dXdY)TdYdl
-
X→Y=XA+B→l=f(Y), dXdl=dYdlAT
- X→Y→l, l scalar, X, Y vector/matrix
Common Differentiate Rule
- d(XT)=d(X)T
- d(X+Y)=d(X)+d(Y)
- d(XY)=(dX)Y+X(dY)
From Differentiate to Derivatives
- df=(dxdf)Tdx
First-order(Linear Products)
Let f=xTa, compute dxdf
Solution
df=d(xT)a=d(x)Ta=aTdx→dxdf=a
We also have dxd(aTx)=a
Let f=aTXb, compute dXdf
Solution
using AB=BA when AB is a scalar,
df=aTdXb=baTdX→dXdf=abT
Let f=aTXTb, compute dXdf
Solution
df=aTd(XT)b=aTd(X)Tb=(d(X)a)Tb=bTd(X)a=abTd(X)→dXdf=baT
Second-order(Quadratic Products)
Let f=bTXTXc, compute dXdf
Solution
df=bTd(XT)Xc+bTXTd(X)c(1)
From (1)
bTd(XT)Xc=(Xc)Td(X)b=b(Xc)Td(X)=bcTXTd(X)=(XcbT)Td(X)(2)
bTXTd(X)c=cbTXTd(X)=(XbcT)Td(X)(3)
Replace (2),(3) into (1)
df=((Xcb)T+(XbcT)T)d(X)→dXdf=X(cbT+bcT)
Let f=(Xθ−y)T(Xθ−y), compute dθdf
Solution 1
df=d((Xθ−y)T)(Xθ−y)+(Xθ−y)Td(Xθ−y)(1)
From (1), we need to compute d((Xθ−y)T) and d(Xθ−y)
d((Xθ−y)T)=(d(Xθ−y))T=(Xdθ)T(2)
d(Xθ−y)=Xdθ(3)
Apply (2)(3) into (1),
df=(Xdθ)T(Xθ−y)+(Xθ−y)TXdθ=(Xθ−y)TXdθ+(Xθ−y)TXdθ=2(Xθ−y)TXd(X)
From df=(dxdf)Tdx,
dθdf=2XT(Xθ−y)
Solution 2
dθdf=d(Xθ−y)dfdθd(Xθ−y)(1)
Let u=Xθ−y, we need to compute duduTu and dθd(Xθ−y)
df=duTu+uTdu=(du)Tu+uTdu=uTdu+uTdu=2uTdu
dudf=2u(2)
d(Xθ−y)=Xdθ
Considering, dz=dxdzdx, when z and x are vectors
dθd(Xθ−y)=X(3)
hence, apply (2) to (1),
dθdf=2XT(Xθ−y)
J(θ)=ylog(sigmod(θTX))+(1−y)log(1−sigmod(θTX)), compute θdJ(θ)
Solution
Let u=sigmond(θTX),v=θTX,
dθdJ=duylog(u)+(1−y)log(1−u)dvduθdv