且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么在运行 ADF 测试时会出现 MemoryError?

更新时间:2023-11-14 08:12:22

autolag 正在浪费内存,因为它在滞后搜索期间将所有完整模型保留在内存中.

参见 https://github.com/statsmodels/statsmodels/issues/1849

一些可能的解决方法是

  • 修复滞后次数并避免自动滞后搜索,或
  • 限制滞后次数,设置 maxlag,为滞后搜索评估

这在设计时并没有真正考虑到大的时间序列.

This is my Timeseries:


                          data    z_data   zp_data
time                                              
2018-01-01 00:00:00  -0.045988       NaN       NaN
2018-01-01 00:01:00  -0.046024       NaN       NaN
2018-01-01 00:02:00  -0.044360       NaN       NaN
2018-01-01 00:03:00  -0.044722       NaN       NaN
2018-01-01 00:04:00  -0.043637       NaN       NaN
                        ...       ...       ...
2018-12-12 23:55:00  11.454639  0.088124  1.631736
2018-12-12 23:56:00  11.498422  0.935382  2.551753
2018-12-12 23:57:00  11.521695  1.251496  1.223949
2018-12-12 23:58:00  11.476974  0.244583 -0.012273
2018-12-12 23:59:00  11.480120  0.278023  0.015562
[498240 rows x 3 columns]


I used the Augmented Dickey-Fuller Test. This test is used to assess whether or not a time-series is stationary. but I get the MemoryError. How can I solve this issue?

autolag is wasting memory because it keeps all full models in memory during the lag search.

see https://github.com/statsmodels/statsmodels/issues/1849

Some possible workarounds are to either

  • fix the number of lags and avoid the automatic lag search, or
  • limit the number of lags, set maxlag, that are evaluated for the lag search

This wasn't really designed with large time series in mind.