更新时间:2023-09-03 18:29:10
您可以应用一个功能,按行行方式测试您的 DataFrame
表示存在字符串,例如说 df
是您的 DataFrame
rows_with_strings = df.apply(
lambda row:
any([isinstance(e,basestring)for e in row])
,axis = 1)
这将为您的DataFrame创建一个掩码,指出哪些行包含至少一个字符串。因此,您可以通过相对的掩码选择没有字符串的行。
df_with_no_strings = df [〜rows_with_strings]
。
示例:
a = [[1,2],['a',2],[3,4],[7,'d' ]]
df = pd.DataFrame(a,columns = ['a','b'])
df
ab
0 1 2
1 a 2
2 3 4
3 7 d
select = df.apply(lambda r:any([isinstance(e,basestring)for e in r]),轴= 1)
df [〜select]
ab
0 1 2
2 3 4
I've got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the 'gdp' column has a string at index 3, and 'cap' at index 1.
data =
y gdp cap
0 1 2 5
1 2 3 ab
2 8 7 2
3 3 bc 7
4 6 7 7
5 4 8 3
...
I've been trying to use something like this script because I will not know what is contained in exp_list ahead of time. Unfortunately, "data.var_name" throws out this error: 'DataFrame' object has no attribute 'var_name'. I also don't know what the strings will be ahead of time so is there anyway to generalize that as well?
exp_list = ['gdp', 'cap']
for var_name in exp_list:
data = data[data.var_name != 'ab']
You can apply a function that tests row-wise your DataFrame
for the presence of strings, e.g., say that df
is your DataFrame
rows_with_strings = df.apply(
lambda row :
any([ isinstance(e, basestring) for e in row ])
, axis=1)
This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask
df_with_no_strings = df[~rows_with_strings]
.
Example:
a = [[1,2],['a',2], [3,4], [7,'d']]
df = pd.DataFrame(a,columns = ['a','b'])
df
a b
0 1 2
1 a 2
2 3 4
3 7 d
select = df.apply(lambda r : any([isinstance(e, basestring) for e in r ]),axis=1)
df[~select]
a b
0 1 2
2 3 4