且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Pandas - 根据日期将数据帧拆分为多个数据帧?

更新时间:2021-12-09 23:07:16

如果必须循环,则需要在迭代 groupby 对象时解压键和数据帧:

If you must loop, you need to unpack the key and the dataframe when you iterate over a groupby object:

import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')

注意这里group_name的使用:

for group_name, df_group in df.groupby(pd.Grouper(freq='M')):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')

如果您想避免迭代,请查看 Paul H 的要点(见他的评论),但一个使用 apply 的简单例子是:

If you want to avoid iteration, do have a look at the notebook in Paul H's gist (see his comment), but a simple example of using apply would be:

def do_regression(df_group, ret='outcome'):
    """Apply the function to each group in the data and return one result."""
    y,X = dmatrices('value1 ~ value2 + value3',
                    data=df_group,      
                    return_type='dataframe')
    if ret == 'outcome':
        return y
    else:
        return X

outcome = df.groupby(pd.Grouper(freq='M')).apply(do_regression, ret='outcome')