且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Python中进行指数和对数曲线拟合?我发现只有多项式拟合

更新时间:2022-01-18 04:00:52

用于拟合 y = A + B log x ,将 y 恰好适合(log x ).

For fitting y = A + B log x, just fit y against (log x).

>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> numpy.polyfit(numpy.log(x), y, 1)
array([ 8.46295607,  6.61867463])
# y ≈ 8.46 log(x) + 6.62


为拟合 y = Ae Bx ,取双方的对数给出对数 y = log A + Bx .因此,将(log y )与 x 匹配.


For fitting y = AeBx, take the logarithm of both side gives log y = log A + Bx. So fit (log y) against x.

请注意,将拟合(log y )视为线性拟合将强调 y 的较小值,从而导致较大的 y 产生较大偏差.这是因为polyfit(线性回归)通过最小化∑ i Y ) 2 = ∑而起作用 i ( Y i Ŷ i ) 2 .当 Y i = log y i 时,残基Δ Y i =Δ(log y i )≈Δ y i /| y i |.因此,即使polyfit对于较大的 y 做出了非常错误的决定,除以|| y |"因数会对其进行补偿,从而导致polyfit偏爱较小的值.

Note that fitting (log y) as if it is linear will emphasize small values of y, causing large deviation for large y. This is because polyfit (linear regression) works by minimizing ∑iY)2 = ∑i (YiŶi)2. When Yi = log yi, the residues ΔYi = Δ(log yi) ≈ Δyi / |yi|. So even if polyfit makes a very bad decision for large y, the "divide-by-|y|" factor will compensate for it, causing polyfit favors small values.

可以通过为每个条目赋予与 y 成比例的权重"来缓解这种情况. polyfit通过w关键字参数支持加权最小二乘.

This could be alleviated by giving each entry a "weight" proportional to y. polyfit supports weighted-least-squares via the w keyword argument.

>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> numpy.polyfit(x, numpy.log(y), 1)
array([ 0.10502711, -0.40116352])
#    y ≈ exp(-0.401) * exp(0.105 * x) = 0.670 * exp(0.105 * x)
# (^ biased towards small values)
>>> numpy.polyfit(x, numpy.log(y), 1, w=numpy.sqrt(y))
array([ 0.06009446,  1.41648096])
#    y ≈ exp(1.42) * exp(0.0601 * x) = 4.12 * exp(0.0601 * x)
# (^ not so biased)

请注意,Excel,LibreOffice和大多数科学计算器通常对指数回归/趋势线使用未加权(有偏)公式.如果您希望结果与这些平台兼容,请不要包括权重即使可以提供更好的结果.

Note that Excel, LibreOffice and most scientific calculators typically use the unweighted (biased) formula for the exponential regression / trend lines. If you want your results to be compatible with these platforms, do not include the weights even if it provides better results.

现在,如果可以使用scipy,则可以使用 scipy.optimize.curve_fit 以适合任何无需转换的模型.

Now, if you can use scipy, you could use scipy.optimize.curve_fit to fit any model without transformations.

对于 y = A + B log x ,结果与转换方法相同:

For y = A + B log x the result is the same as the transformation method:

>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> scipy.optimize.curve_fit(lambda t,a,b: a+b*numpy.log(t),  x,  y)
(array([ 6.61867467,  8.46295606]), 
 array([[ 28.15948002,  -7.89609542],
        [ -7.89609542,   2.9857172 ]]))
# y ≈ 6.62 + 8.46 log(x)

对于 y = Ae Bx ,但是,由于它计算Δ(直接记录 y ).但是我们需要提供一个初始化猜测,以便curve_fit可以达到所需的局部最小值.

For y = AeBx, however, we can get a better fit since it computes Δ(log y) directly. But we need to provide an initialize guess so curve_fit can reach the desired local minimum.

>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t),  x,  y)
(array([  5.60728326e-21,   9.99993501e-01]),
 array([[  4.14809412e-27,  -1.45078961e-08],
        [ -1.45078961e-08,   5.07411462e+10]]))
# oops, definitely wrong.
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t),  x,  y,  p0=(4, 0.1))
(array([ 4.88003249,  0.05531256]),
 array([[  1.01261314e+01,  -4.31940132e-02],
        [ -4.31940132e-02,   1.91188656e-04]]))
# y ≈ 4.88 exp(0.0553 x). much better.