且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么必须在反向传播神经网络中使用非线性激活函数?

更新时间:2023-11-25 19:15:40

激活函数的目的是将非线性引入网络

反过来,这允许您对随解释变量非线性变化的响应变量(又名目标变量、类别标签或分数)进行建模

非线性意味着不能从输入的线性组合中复制输出(这与呈现为直线的输出不同——这个词是仿射).

另一种思考方式:如果网络中没有非线性激活函数,NN,无论它有多少层,都将表现得像一个单层感知器,因为将这些层相加只会给你另一个线性函数(见上面的定义).

>>>in_vec = NP.random.rand(10)>>>输入向量数组([ 0.94, 0.61, 0.65, 0. , 0.77, 0.99, 0.35, 0.81, 0.46, 0.59])>>># 常用激活函数,双曲正切>>>out_vec = NP.tanh(in_vec)>>>out_vec数组([ 0.74, 0.54, 0.57, 0. , 0.65, 0.76, 0.34, 0.67, 0.43, 0.53])

反向传播中使用的常用激活函数(双曲正切)从 -2 到 2 求值:

I've been reading some things on neural networks and I understand the general principle of a single layer neural network. I understand the need for aditional layers, but why are nonlinear activation functions used?

This question is followed by this one: What is a derivative of the activation function used for in backpropagation?

The purpose of the activation function is to introduce non-linearity into the network

in turn, this allows you to model a response variable (aka target variable, class label, or score) that varies non-linearly with its explanatory variables

non-linear means that the output cannot be reproduced from a linear combination of the inputs (which is not the same as output that renders to a straight line--the word for this is affine).

another way to think of it: without a non-linear activation function in the network, a NN, no matter how many layers it had, would behave just like a single-layer perceptron, because summing these layers would give you just another linear function (see definition just above).

>>> in_vec = NP.random.rand(10)
>>> in_vec
  array([ 0.94,  0.61,  0.65,  0.  ,  0.77,  0.99,  0.35,  0.81,  0.46,  0.59])

>>> # common activation function, hyperbolic tangent
>>> out_vec = NP.tanh(in_vec)
>>> out_vec
 array([ 0.74,  0.54,  0.57,  0.  ,  0.65,  0.76,  0.34,  0.67,  0.43,  0.53])

A common activation function used in backprop (hyperbolic tangent) evaluated from -2 to 2: