且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将CSV加载到Pandas MultiIndex DataFrame

更新时间:2022-10-21 17:56:55

您可以使用 pd.read_csv

 >>> df = pd.read_csv(test_data2.csv,index_col = [0,1],skipinitialspace = True)
>>> df
dep freq arr代码模式
从到
RGBOXFD RGBPADTON 127 0 27 99999 2
RGBPADTON 127 0 33 99999 2
RGBRDLEY 127 0 1425 99999 2
RGBCHOLSEY 127 0 52 99999 2
RGBMDNHEAD 127 0 91 99999 2
RGBDIDCOTP RGBPADTON 127 0 46 99999 2
RGBPADTON 127 0 3 99999 2
RGBCHOLSEY 127 0 61 99999 2
RGBRDLEY 127 0 1430 99999 2
RGBPADTON 127 0 115 99999 2

ve使用 skipinitialspace = True 来摆脱标题行中的那些恼人的空格。


I have a 719mb CSV file that looks like:

from, to, dep, freq, arr, code, mode   (header row)
RGBOXFD,RGBPADTON,127,0,27,99999,2
RGBOXFD,RGBPADTON,127,0,33,99999,2
RGBOXFD,RGBRDLEY,127,0,1425,99999,2
RGBOXFD,RGBCHOLSEY,127,0,52,99999,2
RGBOXFD,RGBMDNHEAD,127,0,91,99999,2
RGBDIDCOTP,RGBPADTON,127,0,46,99999,2
RGBDIDCOTP,RGBPADTON,127,0,3,99999,2
RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2
RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2
RGBDIDCOTP,RGBPADTON,127,0,115,99999,2
and so on... 

I want to load in to a pandas DataFrame. Now I know there is a load from csv method:

 r = pd.DataFrame.from_csv('test_data2.csv')

But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:

So ending up with:

                   dep, freq, arr, code, mode
RGBOXFD RGBPADTON  127     0   27  99999    2
        RGBRDLEY   127     0   33  99999    2
        RGBCHOLSEY 127     0 1425  99999    2
        RGBMDNHEAD 127     0 1525  99999    2

etc. I'm not sure how to do that?

You could use pd.read_csv:

>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True)
>>> df
                       dep  freq   arr   code  mode
from       to                                      
RGBOXFD    RGBPADTON   127     0    27  99999     2
           RGBPADTON   127     0    33  99999     2
           RGBRDLEY    127     0  1425  99999     2
           RGBCHOLSEY  127     0    52  99999     2
           RGBMDNHEAD  127     0    91  99999     2
RGBDIDCOTP RGBPADTON   127     0    46  99999     2
           RGBPADTON   127     0     3  99999     2
           RGBCHOLSEY  127     0    61  99999     2
           RGBRDLEY    127     0  1430  99999     2
           RGBPADTON   127     0   115  99999     2

where I've used skipinitialspace=True to get rid of those annoying spaces in the header row.