什么是 affine function ? 和 linear function 有什么不同?
两者的公式定义不同:
- affine function 的定义: \(f(X)=Ax+b\),其中A为非 0 向量。
- linear function 的定义: \(f(X)=Ax\),其中A为非 0 向量。
可以看出,linear function 是 affine function 的 \(b=0\) 的一种特殊情形。
问题设定
已知误差\(e\)(标量),有函数 \(Y=f(Q)\),且 \(\frac{\partial e}{\partial Q}\) 是一个 \((2,4)\) 大小的矩阵,我们用 \(\frac{\partial e}{\partial q_{ij}}\) 表示其中的一个元素。\(Q(X;W)=XW + B\),其中 \(X\) 是一个大小为 \((2,3)\) 的矩阵,我们用 \(x_{ij}\) 表示矩阵中的一个元素;\(W\) 是一个大小为 \((3,4)\) 的矩阵,我们用 \(x_{ij}\) 表示矩阵中的一个元素;\(B\) 是一个大小为 \((4,)\) 的矩阵,我们用 \(b_{i}\) 表示矩阵中的一个元素。求 \(\frac{\partial e}{\partial W}\),\(\frac{\partial e}{\partial X}\) 和 \(\frac{\partial e}{\partial B}\)。
求导
Y 对 W 的求导
首先我们知\(\frac{\partial Y}{\partial W}=\frac{\partial Y}{\partial Q}\frac{\partial Q}{\partial W}\),所以我们的重点在于计算\(\frac{\partial Q}{\partial W}\)。
我们的 \(X\) 为:
\[ X = \left[{\begin{array}{ccc} x_{00} & x_{01} & x_{02} \\ x_{10} & x_{11} & x_{12} \end{array}}\right] \] 我们的 \(W\) 为:
\[ W = \left[{\begin{array}{cccc} w_{00} & w_{01} & w_{02} & w_{03} \\ w_{10} & w_{11} & w_{12} & w_{13} \\ w_{20} & w_{21} & w_{22} & w_{23} \end{array}}\right] \]
我们的 \(B\) 为:
\[ B = \left[{\begin{array}{c} b_0 \\ b_1 \\ b_2 \\ b_3 \end{array}}\right] \]
我们的 \(Q\) 为
\[ Q = \left[{\begin{array}{cccc} w_{00}x_{00} + w_{10}x_{01} + w_{20}x_{02} + b_0 & w_{01}x_{00} + w_{11}x_{01} + w_{21}x_{02} + b_1 & w_{02}x_{00} + w_{12}x_{01} + w_{22}x_{02} + b_2 & w_{03}x_{00} + w_{13}x_{01} + w_{23}x_{02} + b_3 \\ w_{00}x_{10} + w_{10}x_{11} + w_{20}x_{12} + b_0 & w_{01}x_{10} + w_{11}x_{11} + w_{21}x_{12} + b_1 & w_{02}x_{10} + w_{12}x_{11} + w_{22}x_{12} + b_2 & w_{03}x_{10} + w_{13}x_{11} + w_{23}x_{12} + b_3 \\ \end{array}}\right] \]
我们的 \(\frac{\partial e}{\partial Q}\) 为:
\[ \frac{\partial e}{\partial Q} = \left[{\begin{array}{cccc} \frac{\partial e}{\partial q_{00}} & \frac{\partial e}{\partial q_{01}} & \frac{\partial e}{\partial q_{02}} & \frac{\partial e}{\partial q_{03}} \\ \frac{\partial e}{\partial q_{10}} & \frac{\partial e}{\partial q_{11}} & \frac{\partial e}{\partial q_{12}} & \frac{\partial e}{\partial q_{13}} \end{array}}\right] \]
在计算 \(\frac{\partial e}{\partial W}\) 之前,我们先求 \(\frac{\partial e}{\partial w_{00}}\): \[ \frac{\partial e}{\partial w_{00}} = \sum_{ij}\frac{\partial e}{\partial q_{ij}}\frac{\partial q_{ij}}{\partial w_{00}} = \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} \]
同理,我们可以得到: \[ \frac{\partial e}{\partial W} = \left[{\begin{array}{cccc} \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{01}}x_{00} + \frac{\partial e}{\partial q_{11}}x_{10} & \frac{\partial e}{\partial q_{02}}x_{00} + \frac{\partial e}{\partial q_{12}}x_{10} & \frac{\partial e}{\partial q_{03}}x_{00} + \frac{\partial e}{\partial q_{13}}x_{10} \\ \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} \\ \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} & \frac{\partial e}{\partial q_{00}}x_{00} + \frac{\partial e}{\partial q_{10}}x_{10} \end{array}}\right] \]
恰好等于:
\[ \frac{\partial e}{\partial W} = X^T\frac{\partial e}{\partial Q} \]
Y 对 X 的求导
在计算 \(\frac{\partial e}{\partial X}\) 之前,我们先求 \(\frac{\partial e}{\partial x_{00}}\): \[ \frac{\partial e}{\partial x_{00}} = \sum_{ij}\frac{\partial e}{\partial q_{ij}}\frac{\partial q_{ij}}{\partial x_{00}} = \frac{\partial e}{\partial q_{00}}w_{00} + \frac{\partial e}{\partial q_{01}}w_{01} + \frac{\partial e}{\partial q_{02}}w_{02} + \frac{\partial e}{\partial q_{03}}w_{03} \]
同理,我们可以得到: \[ \frac{\partial e}{\partial X} = \left[{\begin{array}{cccc} \frac{\partial e}{\partial q_{00}}w_{00} + \frac{\partial e}{\partial q_{01}}w_{01} + \frac{\partial e}{\partial q_{02}}w_{02} + \frac{\partial e}{\partial q_{03}}w_{03} & \frac{\partial e}{\partial q_{00}}w_{10} + \frac{\partial e}{\partial q_{01}}w_{11} + \frac{\partial e}{\partial q_{02}}w_{12} + \frac{\partial e}{\partial q_{03}}w_{13} & \frac{\partial e}{\partial q_{00}}w_{20} + \frac{\partial e}{\partial q_{01}}w_{21} + \frac{\partial e}{\partial q_{02}}w_{22} + \frac{\partial e}{\partial q_{03}}w_{23} \\ \frac{\partial e}{\partial q_{10}}w_{00} + \frac{\partial e}{\partial q_{11}}w_{01} + \frac{\partial e}{\partial q_{12}}w_{02} + \frac{\partial e}{\partial q_{13}}w_{03} & \frac{\partial e}{\partial q_{00}}w_{10} + \frac{\partial e}{\partial q_{01}}w_{11} + \frac{\partial e}{\partial q_{12}}w_{12} + \frac{\partial e}{\partial q_{13}}w_{13} & \frac{\partial e}{\partial q_{20}}w_{20} + \frac{\partial e}{\partial q_{21}}w_{21} + \frac{\partial e}{\partial q_{22}}w_{22} + \frac{\partial e}{\partial q_{23}}w_{23} \\ \end{array}}\right] \]
恰好等于:
\[ \frac{\partial e}{\partial X} = \frac{\partial e}{\partial Q}W^T \]
Y 对 B 的求导
首先对这个加法的理解,\(B\)的大小是\((4,0)\),为: \[ B = \left[{\begin{array}{cccc} b_0,b_1,b_2,b_3 \end{array}}\right]^T \]
很明显和\((2,4)\)的矩阵无法相加,在numpy的运算中,实际将它拓展为了: \[ B = \left[{\begin{array}{cccc} b_0,b_1,b_2,b_3 \\ b_0,b_1,b_2,b_3 \end{array}}\right] \]
有了上式,我们计算 \(\frac{de}{db}\) 就简单了,首先我们计算\(\frac{de}{db_0}\):
\[ \frac{de}{db_0} = \sum_{ij}\frac{\partial e}{\partial q_{ij}}\frac{\partial q_{ij}}{\partial d_{0}} = \frac{\partial e}{\partial q_{00}} + \frac{\partial e}{\partial q_{10}} \]
同理,我们可以得到:
\[ \frac{de}{db} = \left[{\begin{array}{cccc} \frac{\partial e}{\partial q_{00}} + \frac{\partial e}{\partial q_{10}} & \frac{\partial e}{\partial q_{01}} + \frac{\partial e}{\partial q_{11}} & \frac{\partial e}{\partial q_{02}} + \frac{\partial e}{\partial q_{12}} & \frac{\partial e}{\partial q_{03}} + \frac{\partial e}{\partial q_{13}} \end{array}}\right]^T \]
恰好等于:
\[ \frac{de}{db} = np.sum(\frac{\partial e}{\partial Q}, axis=0) \] # 总结
最最核心的公式,就是\(\frac{\partial e}{\partial w_{mn}} = \sum_{ij}\frac{\partial e}{\partial q_{ij}}\frac{\partial q_{ij}}{\partial w_{mn}}\),有了这个公式,就不需要矩阵对矩阵的求导,而且理解起来也容易的多。
如发现有什么错误的地方,可以邮件 ye.wenjie@outlook.com 联系我,非常感谢。