相当于dplyr 1.0.0的python/pandas summary(across())

更新时间：2022-05-03 10:01:34

对于第一种情况， pandas concat 即可:

For the first scenario, pandas concat suffices :

dat = df.groupby("cyl")

pd.concat([dat[["mpg", "disp"]].sum(), dat[["drat", "wt", "qsec"]].mean()], axis=1)

对于正则表达式/字符串处理部分，冗长是不可避免的:

For the regex/string processing part, verbose is unavoidable :

cols_p = [col for col in df.columns if col.endswith("p")]
cols_t = [col for col in df.columns if col.endswith("t")]

pd.concat((dat[cols_p].sum(), dat[cols_t].mean()), axis=1)

但是，如果您可以编写一个可以封装 across 的函数，那就太酷了，特别是对于 regex 来说，这是一个很好的技巧.

It would be cool though, if you could write a function that could encapsulate the across, particularly for regex - that's a nice lovely trick.

注意:通过字典并不比您引用的第一个示例更长或更冗长.我建议通过 pandas concat 方法:

Note: passing a dictionary is not longer or more verbose than the first example you quoted. I would suggest that over the pandas concat method :

dat.agg({"mpg": "sum", 
         "disp": "sum", 
         "drat": "mean", 
         "wt": "mean", 
         "qsec": "mean"})

不会带走 cross ->的光芒.看起来很酷.

Doesn't take away the shine from across -> looks cool.

更新:对于正则表达式/字符串部分，请从 @Richiev 帖子中获取提示，其中的字典理解非常适合:

Update : For the regex/string part, taking a cue from @Richiev post, a dictionary comprehension fits in quite nicely here :

dat.agg({col :'mean'
         if col.endswith('t') 
         else 'sum' 
         for col in df.filter(regex=r".*(p|t)$").columns
         })

或者，您可以在不召唤 filter 的情况下做到这一点(必须再次使用该代码，并仔细研究Stack Overflow的想法以实现这一目标):

Alternatively, you could do it without summoning filter (had to play with the code again, and look through Stack Overflow ideas to pull this off) :

    dat.agg({col: "mean" 
             if col.endswith("t") else "sum"
             for col in df
             if col.endswith(("t", "p"))})

来自此处的另一种想法:

   mapping = {"t": "mean", "p": "sum"}
   dat.agg({col: mapping.get(col[-1]) 
            for col in df 
            if col.endswith(("t", "p"))})

使用Python中可用的工具，可能有更多的方法可以实现这一目标.

There are probably more ways to pull it off, using the available tools within Python.

上一篇 : ：将函数应用于 pandas 系列的累积值下一篇 : 将多个函数应用于函数式Python中的相同参数

相当于dplyr 1.0.0的python/pandas summary(across())

相关阅读

技术问答最新文章