逻辑回归梯度下降公式与代码实现(笔记)¶
1.逻辑回归的成本函数¶
\[
\begin{align}
f_{\mathbf{w},b}(\mathbf{x^{(i)}}) &= g(z^{(i)})\tag{1} \\
z^{(i)} &= \mathbf{w} \cdot \mathbf{x}^{(i)}+ b\tag{2} \\
g(z^{(i)}) &= \frac{1}{1+e^{-z^{(i)}}}\tag{3}
\end{align}
\]
\[
J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) \right] \tag{4}
\]
- (1)~(3)式为逻辑回归的基本表达式;(4)式为逻辑回归的成本函数公式。
\[
loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = -y^{(i)}\log\!\big(f_{\mathbf{w},b}\!\left( \mathbf{x}^{(i)} \right)\big) - \big(1-y^{(i)}\big)\log\!\Big(1-f_{\mathbf{w},b}\!\left( \mathbf{x}^{(i)} \right)\Big) \tag{5}
\]
- (5)式为单个样本的损失,而逻辑回归的成本函数就是将所有样本的损失相加再除以样本数。
#定义一个函数用于计算逻辑回归的成本
def compute_cost_logistic(X, y, w, b):
m = X.shape[0]
cost = 0.0
for i in range(m):
z_i = np.dot(X[i],w) + b
f_wb_i = sigmoid(z_i)
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)#将样本损失逐个相加
cost = cost / m #得到逻辑回归模型的成本
return cost
2.梯度计算¶
\[
\begin{align}
\frac{\partial J(\mathbf w,b)}{\partial w_j}
&= \frac{1}{m}\sum_{i=0}^{m-1}\big(f_{\mathbf w,b}(\mathbf x^{(i)})-y^{(i)}\big)\,x_j^{(i)} \tag{6}\\
\frac{\partial J(\mathbf w,b)}{\partial b}
&= \frac{1}{m}\sum_{i=0}^{m-1}\big(f_{\mathbf w,b}(\mathbf x^{(i)})-y^{(i)}\big) \tag{7}
\end{align}
\]
- (6)、(7)式为成本J分别对单个wj与b的导数,形式上与线性回归的梯度很像但逻辑回归的f为(1)~(3)式。
#定义一个函数计算逻辑回归的成本对于参数的梯度
def compute_gradient_logistic(X, y, w, b):
m,n = X.shape
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
f_wb_i = sigmoid(np.dot(X[i],w) + b) #计算每一个样本的预测值
err_i = f_wb_i - y[i] #计算每一个样本的预测值与实际值之差
for j in range(n):
dj_dw[j] = dj_dw[j] + err_i * X[i,j]
dj_db = dj_db + err_i
dj_dw = dj_dw/m #计算每一特征的梯度
dj_db = dj_db/m
return dj_db, dj_dw
3.梯度下降的计算¶
\[
\begin{align*}
&\text{repeat until convergence:}\;\lbrace \\
&\;\;\;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{8}\; & \text{for j := 0..n-1} \\
&\;\;\;\;\;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\
&\rbrace
\end{align*}
\]
- (8)式为wj与b的更新。
#定义一个函数实现梯度下降
def gradient_descent(X, y, w_in, b_in, alpha, num_iters): #num_iters表示迭代次数
J_history = []
w = copy.deepcopy(w_in)
b = b_in
for i in range(num_iters):
dj_db, dj_dw = compute_gradient_logistic(X, y, w, b) #调用计算梯度的函数
w = w - alpha * dj_dw #更新w
b = b - alpha * dj_db #更新b
if i<100000:
J_history.append( compute_cost_logistic(X, y, w, b) )
if i% math.ceil(num_iters / 10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]} ")
return w, b, J_history
编辑与9月25日