且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Pandas Groupby函数中重命名列名称

更新时间:2023-11-22 22:45:34

对于第一个问题,我认为答案是:

 < your DataFrame> .rename(columns = {'count':'Total_Numbers'})

 &lt ;您的DataFrame> .columns = ['ID','Region','Total_Numbers'] 

对于第二个我会说答案是否定的。由于 python datamodel ,可以像'df.ID'一样使用它:


属性引用被转换为本字典中的查找,
例如,mx等于m。 dict [x]



1). I have a following example dataset:

>>> df
    ID     Region  count
0  100       Asia      2
1  101     Europe      3
2  102         US      1
3  103     Africa      5
4  100     Russia      5
5  101  Australia      7
6  102         US      8
7  104       Asia     10
8  105     Europe     11
9  110     Africa     23

I wanted to group the observations of this dataset by ID an Region and summing the count for each group. So I used something like this:

>>> print(df.groupby(['ID','Region'],as_index=False).count.sum())

    ID     Region  count
0  100       Asia      2
1  100     Russia      5
2  101  Australia      7
3  101     Europe      3
4  102         US      9
5  103     Africa      5
6  104       Asia     10
7  105     Europe     11
8  110     Africa     23

On using as_index=False I am able to get a "Sql-Like" output. My problem is that I am unable to rename the aggregate variable count here. So in Sql if wanted to do the above thing I would do something like this:

select ID, Region, sum(count) as Total_Numbers
from df
group by ID,Region
order by ID, Region

As we see, it's very easy for me to rename the aggregate variable 'count' to Total_Numbers in SQL. I wanted to do the same thing in Pandas but unable to find such option in groupby function. Can somebody help?

2). The second question and more of an observation is that is it possible to use directly the column names in Pandas dataframe function witout enclosing them inside quotes? I understand that the variable names are string, so has to be inside quotes, but I see if use outside dataframe function and as an attribute we don't require them to be inside quotes. Like df.ID.sum() etc. It's only when we use it in a DataFrame function like df.sort() or df.groupby we have to use it inside quotes. This is actually a bit of pain as in SQL or in SAS or other language we simply use the variable name without quoting them. Any suggestion on this?

Kindly suggest on the above two points(1st one main, 2nd more of an opinion).

Thanks

For the first question I think answer would be:

<your DataFrame>.rename(columns={'count':'Total_Numbers'})

or

<your DataFrame>.columns = ['ID', 'Region', 'Total_Numbers']

As for second one I'd say the answer would be no. It's possible to use it like 'df.ID' because of python datamodel:

Attribute references are translated to lookups in this dictionary, e.g., m.x is equivalent to m.dict["x"]