且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python Pandas检查字符串是否仅是"Date".或仅“时间"或“约会时间"

更新时间:2023-02-26 20:56:39

如果需要测试时间,默认情况下熊猫使用今天的日期,因此可能的解决方案是使用 Timestamp.date

If want test times, pandas by default use today dates, so possible solution is test them with Series.dt.date, Timestamp.date and Series.all if all values of column match.

还添加了另一个测试日期的解决方案-测试

Also added another solution for test dates - test if same values after removed times by Series.dt.floor:

df = pd.DataFrame({'a':['2019-01-01 12:23:10',
                        '2019-01-02 12:23:10'],
                   'b':['2019-01-01',
                        '2019-01-02'],
                   'c':['12:23:10',
                        '15:23:10'],
                   'd':['a','b']})
print (df)
                     a           b         c  d
0  2019-01-01 12:23:10  2019-01-01  12:23:10  a
1  2019-01-02 12:23:10  2019-01-02  15:23:10  b

def check(col):
    try:
        dt = pd.to_datetime(df[col])

        if (dt.dt.floor('d') == dt).all():
            return ('Its a pure date field')
        elif (dt.dt.date == pd.Timestamp('now').date()).all():
            return ('Its a pure time field')
        else:
            return ('Its a Datetime field') 
    except:
        return ('its not a datefield')


print (check('a'))
print (check('b'))
print (check('c'))
print (check('d'))
Its a Datetime field
Its a pure date field
Its a pure time field
its not a datefield


另一个想法是测试数字列是否默认情况下不返回数字,以防止将数字强制转换为日期时间,但是如果可能的话,所有日期时间仅包含今天的日期(f列),则测试时间与 Series.str.contains 用于匹配模式H:MM:SS:


Another idea is also test if numeric columns and by default return not numeric for prevent casting numeric to datetimes, but if possible all datetimes contains only todays dates (f column) then test for times is different with Series.str.contains for match pattern HH:MM:SS or H:MM:SS:

df = pd.DataFrame({'a':['2019-01-01 12:23:10',
                        '2019-01-02'],
                   'b':['2019-01-01',
                        '2019-01-02'],
                   'c':['12:23:10',
                        '15:23:10'],
                   'd':['a','b'],
                   'e':[1,2],
                  'f':['2019-11-13 12:23:10',
                       '2019-11-13'],})
print (df)
                     a           b         c  d  e                    f
0  2019-01-01 12:23:10  2019-01-01  12:23:10  a  1  2019-11-13 12:23:10
1           2019-01-02  2019-01-02  15:23:10  b  2           2019-11-13


def check(col):
    if np.issubdtype(df[col].dtype, np.number):
        return ('its not a datefield')

    try:
        dt = pd.to_datetime(df[col])
        if (dt.dt.floor('d') == dt).all():
            return ('Its a pure date field')
        elif df[col].str.contains(r"^\d{1,2}:\d{2}:\d{2}$").all():
            return ('Its a pure time field')
        else:
            return ('Its a Datetime field') 
    except:
        return ('its not a datefield')


print (check('a'))
print (check('b'))
print (check('c'))
print (check('d'))
print (check('e'))
print (check('f'))
Its a Datetime field
Its a pure date field
Its a pure time field
its not a datefield
its not a datefield
Its a Datetime field