更新时间:2023-11-21 23:03:16
如果x标签具有数字值,则熊猫处理条形图的x标记的方式可能会令人困惑.让我们举个例子:
The way that pandas handles x-ticks for bar plots can be quite confusing if your x-labels have numeric values. Let's take this example:
import pandas as pd
import numpy as np
x = np.linspace(0, 1, 21)
y = np.random.rand(21)
s = pd.Series(y, index=x)
ax = s.plot(kind='bar', figsize=(10, 3))
ax.figure.tight_layout()
您可能希望刻度位置直接与x
中的值相对应,即0、0.05、0.1,...,1.0.但是,事实并非如此:
You might expect the tick locations to correspond directly to the values in x
, i.e. 0, 0.05, 0.1, ..., 1.0. However, this isn't the case:
print(ax.get_xticks())
# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
pandas会根据x
中每个元素的 indices 设置刻度位置,然后根据x
中的值:
Instead pandas sets the tick locations according to the indices of each element in x
, but then sets the tick labels according to the values in x
:
print(' '.join(label.get_text() for label in ax.get_xticklabels()))
# 0.0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1.0
因此,直接设置刻度位置(通过使用ax.set_xticks
)或将xticks=
参数传递给pd.Series.plot()
都不会给您带来预期的效果:
Because of this, setting the tick positions directly (either by using ax.set_xticks
) or passing the xticks=
argument to pd.Series.plot()
will not give you the effect you are expecting:
new_ticks = np.linspace(0, 1, 11) # 0.0, 0.1, 0.2, ..., 1.0
ax.set_xticks(new_ticks)
相反,您需要分别更新x标记的位置和标签:
Instead you would need to update the positions and the labels of your x-ticks separately:
# positions of each tick, relative to the indices of the x-values
ax.set_xticks(np.interp(new_ticks, s.index, np.arange(s.size)))
# labels
ax.set_xticklabels(new_ticks)
在大多数情况下,这种行为实际上很有意义.对于条形图,x标签通常是非数字的(例如,与类别对应的字符串),在这种情况下,将无法使用x
中的值来设置刻度位置.在不引入其他参数来指定其位置的情况下,最合乎逻辑的选择是改为使用其索引.
This behavior actually makes a lot of sense in most cases. For bar plots it is common for the x-labels to be non-numeric (e.g. strings corresponding to categories), in which case it wouldn't be possible to use the values in x
to set the tick locations. Without introducing another argument to specify their locations, the most logical choice would be to use their indices instead.