且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R tbats 模型季节性客户标记 无结果

更新时间:2023-09-23 13:40:04

回过头来回答我是如何让它工作的,只是将原始"数据帧传递到 for 循环中.我的数据已经有空的 $0 月份,所以我不需要运行那部分代码.谢谢大家的帮助

I am following this blog to identify seasonal customers in my time series data: https://www.kristenkehrer.com/seasonality-code

My code is shamelessly nearly identical to the blog, with some small tweaks, code is below. I am able to run the code entirely, for 2000 customers. A few hours later, 0 customers were flagged as seasonal in my results.

Manually looking at customers data over time, I do believe I have many examples of seasonal customers that should have been picked up. Below is a sample of the data I am using.

Am I missing something stupid? am I in way over my head to even try this, being very new to python?

Note that I am adding the "0 months" in my data source, but I don't think it would hurt anything for that function to check again. I'm also not including the data source credentials step.

Thank you

import pandas as pa
import numpy as np
import pyodbc as py

cnxn = py.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password)

original = pa.read_sql_query('SELECT s.customer_id, s.yr, s.mnth, Case when s.usage<0 then 0 else s.usage end as usage  FROM dbo.Seasonal  s   Join   ( Select Top 2000  customer_id, SUM(usage) as usage   From dbo.Seasonal where Yr!=2018    Group by customer_id ) t ON s.customer_id = t.customer_id Where yr!= 2018 Order by customer_id, yr, mnth', cnxn)

grouped = original.groupby(by='customer_id')

def yearmonth_to_justmonth(year, month):

    return year * 12 + month - 1

def fillInForOwner(group):
    min = group.head(1).iloc[0]
    max = group.tail(1).iloc[0]
    minMonths = yearmonth_to_justmonth(min.yr, min.mnth)
    maxMonths = yearmonth_to_justmonth(max.yr, max.mnth)
    filled_index = pa.Index(np.arange(minMonths, maxMonths, 1), name="filled_months")
    group['months'] = group.yr * 12 + group.mnth - 1
    group = group.set_index('months')
    group = group.reindex(filled_index)
    group.customer_id = min.customer_id
    group.yr = group.index // 12
    group.mnth = group.index % 12 + 1
    group.usage = np.where(group.usage.isnull(), 0, group.usage).astype(int)
    return group

filledIn = grouped.apply(fillInForOwner)
newIndex = pa.Index(np.arange(filledIn.customer_id.count()))

import rpy2 as r
from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri, globalenv
pandas2ri.activate()

base = importr('base')
colorspace = importr('colorspace')
forecast = importr('forecast')
times = importr('timeSeries')
stats = importr('stats')

outfile = 'results.csv'
df_list = []

for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
    startYear = dataForCustomer.head(1).iloc[0].yr
    startMonth = dataForCustomer.head(1).iloc[0].mnth
    endYear = dataForCustomer.tail(1).iloc[0].yr
    endMonth = dataForCustomer.tail(1).iloc[0].mnth


    customerTS = stats.ts(dataForCustomer.usage.astype(int),
                      start=base.c(startYear,startMonth),
                      end=base.c(endYear, endMonth), 
                      frequency=12)
    r.assign('customerTS', customerTS)

    try:
        seasonal = r('''
                        fit<-tbats(customerTS, seasonal.periods = 12, 
                        use.parallel = TRUE)
                        fit$seasonal
                     ''')
    except: 
        seasonal = 1
    df_list.append({'customer_id': customerid, 'seasonal': seasonal})
    print(f' {customerid} | {seasonal} ')

seasonal_output = pa.DataFrame(df_list)
print(seasonal_output)
seasonal_output.to_csv(outfile)

Circling back to answer how I got this to work was by just passing the "original" dataframe into the for loop. My data already had the empty $0 months so I didn't need that part of the code to run. Thank you all for your help

推荐文章