更新时间:2023-11-09 18:26:34
如果你想要一个直方图,你不需要为 x 值附加任何名称",因为在 x 轴上你会有数据箱:
将 matplotlib.pyplot 导入为 plt将 numpy 导入为 np%matplotlib 内联np.random.seed(42)x = np.random.normal(大小=1000)plt.hist(x, density=True, bins=30) # density=False 进行计数plt.ylabel('概率')plt.xlabel('数据');
注意,bins=30
的数量是任意选择的,还有 ,其中 IQR
是
最后,您可以使用 PDF
行、标题和图例使您的直方图更漂亮:
import scipy.stats as stplt.hist(x,密度=真,bins=82,标签=数据")mn, mx = plt.xlim()plt.xlim(mn, mx)kde_xs = np.linspace(mn, mx, 300)kde = st.gaussian_kde(x)plt.plot(kde_xs, kde.pdf(kde_xs), label="PDF")plt.legend(loc=左上")plt.ylabel(概率")plt.xlabel(数据")plt.title(直方图");
如果您愿意探索其他机会,seaborn
有一个捷径:
# !pip install seaborn将 seaborn 作为 sns 导入sns.displot(x, bins=82, kde=True);
现在回到 OP.
如果您的数据点数量有限,则使用条形图来表示您的数据会更有意义.然后您可以将标签附加到 x 轴:
x = np.arange(3)plt.bar(x, height=[1,2,3])plt.xticks(x, ['a','b','c']);
I am trying to plot a histogram using the matplotlib.hist()
function but I am not sure how to do it.
I have a list
probability = [0.3602150537634409, 0.42028985507246375,
0.373117033603708, 0.36813186813186816, 0.32517482517482516,
0.4175257731958763, 0.41025641025641024, 0.39408866995073893,
0.4143222506393862, 0.34, 0.391025641025641, 0.3130841121495327,
0.35398230088495575]
and a list of names(strings).
How do I make the probability as my y-value of each bar and names as x-values?
If you want a histogram, you don't need to attach any 'names' to x-values, as on x-axis you would have data bins:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
np.random.seed(42)
x = np.random.normal(size=1000)
plt.hist(x, density=True, bins=30) # density=False would make counts
plt.ylabel('Probability')
plt.xlabel('Data');
Note, the number of bins=30
was chosen arbitrarily, and there is Freedman–Diaconis rule to be more scientific in choosing the "right" bin width:
, where
IQR
is Interquartile range andn
is total number of datapoints to plot
So, according to this rule one may calculate number of bins
as:
q25, q75 = np.percentile(x, [25, 75])
bin_width = 2 * (q75 - q25) * len(x) ** (-1/3)
bins = round((x.max() - x.min()) / bin_width)
print("Freedman–Diaconis number of bins:", bins)
plt.hist(x, bins=bins);
Freedman–Diaconis number of bins: 82
And finally you can make your histogram a bit fancier with PDF
line, titles, and legend:
import scipy.stats as st
plt.hist(x, density=True, bins=82, label="Data")
mn, mx = plt.xlim()
plt.xlim(mn, mx)
kde_xs = np.linspace(mn, mx, 300)
kde = st.gaussian_kde(x)
plt.plot(kde_xs, kde.pdf(kde_xs), label="PDF")
plt.legend(loc="upper left")
plt.ylabel("Probability")
plt.xlabel("Data")
plt.title("Histogram");
If you're willing to explore other opportunities, there is a shortcut with seaborn
:
# !pip install seaborn
import seaborn as sns
sns.displot(x, bins=82, kde=True);
Now back to the OP.
If you have limited number of data points, a bar plot would make more sense to represent your data. Then you may attach labels to x-axis:
x = np.arange(3)
plt.bar(x, height=[1,2,3])
plt.xticks(x, ['a','b','c']);