线性回归与逻辑回归的正则化(笔记)¶
过度拟合有三种解决方法:
(1)收集更多数据。
(2)不用太多功能。
(3)正则化。
一般情况下正则化只针对w。
线性回归的正则化¶
成本函数¶
\[
J(\mathbf{w},b)=\frac{1}{2m}\sum_{i=0}^{m-1}\Big(f_{\mathbf{w},b}(\mathbf{x}^{(i)})-y^{(i)}\Big)^2
\; +\; \frac{\lambda}{2m}\sum_{j=0}^{n-1}w_j^2 \tag{1}
\]
- (1)式为线性回归正则化后的成本函数,表达式中前一项为原始成本,后一项为正则化项。加入正则化项会鼓励梯度下降把w的各个分量减小,从而减轻过度拟合。式中的$f_{\mathbf{w},b}(\mathbf{x}^{(i)})=\mathbf{w}\cdot\mathbf{x}^{(i)}+b. $
#定义函数计算正则化后的成本
def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
m = X.shape[0]
n = len(w)
cost = 0.
#for循环进行原始成本的求和
for i in range(m):
f_wb_i = np.dot(X[i], w) + b
cost = cost + (f_wb_i - y[i])**2
cost = cost / (2 * m) #计算出原始成本
#for循环进行正则化项的求和
reg_cost = 0
for j in range(n):
reg_cost += (w[j]**2)
reg_cost = (lambda_/(2*m)) * reg_cost #计算正则化项
#计算正则化后的总成本
total_cost = cost + reg_cost
return total_cost
梯度计算¶
\[
\begin{aligned}
\text{repeat until convergence:}\\
w_j &:= w_j-\alpha\,\frac{\partial J(\mathbf{w},b)}{\partial w_j},\qquad j=0,\dots,n-1,\\
b &:= b-\alpha\,\frac{\partial J(\mathbf{w},b)}{\partial b}.
\end{aligned}\tag{2}
\]
- (2)式为梯度下降基本算法。
\[
\frac{\partial J}{\partial w_j}
=\frac{1}{m}\sum_{i=0}^{m-1}\Big(f_{\mathbf{w},b}(\mathbf{x}^{(i)})-y^{(i)}\Big)\,x^{(i)}_j
+\frac{\lambda}{m}w_j\qquad\\
\frac{\partial J}{\partial b}
=\frac{1}{m}\sum_{i=0}^{m-1}\Big(f_{\mathbf{w},b}(\mathbf{x}^{(i)})-y^{(i)}\Big).\tag{3}
\]
- (3)式为线性回归正则化后的梯度
#定义函数计算线性回归正则化后的梯度
def compute_gradient_linear_reg(X, y, w, b, lambda_):
m,n = X.shape
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
err = (np.dot(X[i], w) + b) - y[i] #计算每个样本预期值与目标值之差
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j] #计算每一个特征w的原始成本梯度
dj_db = dj_db + err
dj_dw = dj_dw / m #得到原始梯度
dj_db = dj_db / m
#for循环计算正则化后的特征w的梯度
for j in range(n):
dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]
return dj_db, dj_dw
逻辑回归的正则化¶
成本函数¶
\[
J(\mathbf{w},b)=\frac{1}{m}\sum_{i=0}^{m-1}\Big[
-y^{(i)}\log f_{\mathbf{w},b}(\mathbf{x}^{(i)})
-\big(1-y^{(i)}\big)\log\big(1-f_{\mathbf{w},b}(\mathbf{x}^{(i)})\big)
\Big]
\; +\; \frac{\lambda}{2m}\sum_{j=0}^{n-1}w_j^2. \tag{4}
\]
- 逻辑回归的正则化成本函数与线性回归类似,前一项为原始成本,后一项为正则化项。其中$f_{\mathbf{w},b}(\mathbf{x}^{(i)})=\sigma!\left(\mathbf{w}\cdot\mathbf{x}^{(i)}+b\right). $同样只对w正则化。
#定义函数计算正则化后的逻辑回归成本
def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
m,n = X.shape
cost = 0.
for i in range(m):
z_i = np.dot(X[i], w) + b
f_wb_i = sigmoid(z_i)
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
cost = cost/m
reg_cost = 0
for j in range(n):
reg_cost += (w[j]**2)
reg_cost = (lambda_/(2*m)) * reg_cost
total_cost = cost + reg_cost
return total_cost
梯度计算¶
\[
\begin{aligned}
\frac{\partial J}{\partial w_j}
&= \frac{1}{m}\sum_{i=0}^{m-1}\Big(f_{\mathbf{w},b}(\mathbf{x}^{(i)})-y^{(i)}\Big)\,x^{(i)}_j
{}+ \frac{\lambda}{m}w_j,\\
\frac{\partial J}{\partial b}
&= \frac{1}{m}\sum_{i=0}^{m-1}\Big(f_{\mathbf{w},b}(\mathbf{x}^{(i)})-y^{(i)}\Big).
\end{aligned}\tag{5}
\]
- (2)式为梯度下降基本算法,(5)式为逻辑回归的梯度公式。(5)与(3)类似,只有\(f\)不一样。
#定义函数,计算逻辑回归的梯度
def compute_gradient_logistic_reg(X, y, w, b, lambda_):
m,n = X.shape
dj_dw = np.zeros((n,))
dj_db = 0.0
for i in range(m):
f_wb_i = sigmoid(np.dot(X[i],w) + b)
err_i = f_wb_i - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err_i * X[i,j]
dj_db = dj_db + err_i
dj_dw = dj_dw/m
dj_db = dj_db/m
for j in range(n):
dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j] #得到正则化后的特征w梯度
return dj_db, dj_dw
总结¶
\[
\nabla_{\mathbf{w}}J
=\frac{1}{m}\,\mathbf{X}^{\!\top}\big(\mathbf{f}-\mathbf{y}\big)+\frac{\lambda}{m}\,\mathbf{w}
\qquad\\
\frac{\partial J}{\partial b}
=\frac{1}{m}\,\mathbf{1}^{\!\top}\big(\mathbf{f}-\mathbf{y}\big).\tag{6}
\]
- (6)式为梯度公式的向量形式。在实现层面,带正则化与不带正则化的唯一区别就是在 \(\partial J/\partial w_j\) 上额外加上一项 \(\frac{\lambda}{m}w_j\)。
编辑于9月27日