且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用数组绘制图

更新时间:2023-11-10 12:43:58

您没有以 matplotlib 可以理解它们之间的关系的方式准备x-y数据.

简单的答案"将是直接相对地绘制 res df ['HourOfDay'].value_counts():

 #...#range =(0,24)#bins = 2#plt.hist(df ['DateTime'].dt.hour,bins,range)plt.plot(res,df ['HourOfDay'].value_counts())plt.show() 

但是示例输出向您显示了问题:

matplotlib 不会为您排序 x 值(这会在不同的上下文中错误地表示数据).因此,我们必须在绘制之前执行以下操作:

 #...#range =(0,24)#bins = 2#plt.hist(df ['DateTime'].dt.hour,bins,range)xy = np.stack((res,df ['HourOfDay'].value_counts()))xy = xy [:, np.argsort(xy [0 ,:])]plt.plot(* xy)plt.show() 

现在, x 值的顺序正确,并且 y 值已在合并的 xy 中与它们一起排序为此创建的数组:

显然,***直接准备 res df ['HourOfDay'].value_counts(),因此我们不必创建组合数组将它们排序在一起.由于您没有提供代码应该做的解释,因此我们只能将所创建代码的问题后修复-您应该以不同的方式构造它,这样就不会出现此问题.但是只有您能做到这一点(或者了解您的代码意图的人-我不会).

我还建议您花一些时间来指导

更新2
要将它们绘制成单个图形,可以修改循环:

 #...dfplot = dfcounts.groupby(dfcounts.Date)对于dfplot.groups中的groupdate:图,(ax1,ax2)= plt.subplots(1,2,图大小=(8,4))fig.suptitle(日期:" + str(groupdate),fontsize = 16)#scaled为图形之间的可比性ax1.plot(dfplot.get_group(groupdate).小时,dfplot.get_group(groupdate).Count,颜色=蓝色",标记="o")ax1.set_xlim(0,24)ax1.xaxis.set_ticks(np.arange(0,25,2))ax1.set_ylim(0,maxcount * 1.1)ax1.set_title(可比较版本")#scaled最大化每天的可见度ax2.plot(dfplot.get_group(groupdate).小时,dfplot.get_group(groupdate).Count,颜色=红色",标记="x")ax2.set_xlim(0,24)ax2.xaxis.set_ticks(np.arange(0,25,2))ax2.set_title(扩展版本")plt.tight_layout()#选择性地保存#plt.savefig("MyDataForDay" + str(groupdate)+.eps")打印(生成所有数字")plt.show() 

其中某天的样本输出:

使用以下测试数据创建:

  Date; Time2020:02:13; 12:39:02:9132020:02:13; 12:39:42:9152020:02:13; 13:06:20:7182020:02:13; 13:18:25:9882020:02:13; 13:34:02:8352020:02:13; 13:46:35:7932020:02:13; 13:59:10:6592020:02:13; 14:14:33:5712020:02:13; 14:25:36:3812020:02:13; 14:35:38:3422020:02:13; 14:46:04:0062020:02:13; 14:56:57:3462020:02:13; 15:07:39:7522020:02:13; 15:19:44:8682020:02:13; 15:32:31:4382020:02:13; 15:44:44:9282020:02:13; 15:56:54:4532020:02:13; 16:08:21:0232020:02:13; 16:19:17:6202020:02:13; 16:29:56:9442020:02:13; 16:40:11:1322020:02:13; 16:49:12:1132020:02:13; 16:57:26:6522020:02:13; 16:57:26:6522020:02:13; 17:04:22:0922020:02:17; 08:58:08:5622020:02:17; 08:58:42:5452020:02:17; 15:19:44:8682020:02:17; 17:32:31:4382020:02:17; 17:44:44:9282020:02:17; 17:56:54:4532020:02:17; 18:08:21:0232020:03:19; 06:19:17:6202020:03:19; 06:29:56:9442020:03:19; 06:40:11:1322020:03:19; 14:49:12:1132020:03:19; 16:57:26:6522020:03:19; 16:57:26:6522020:03:19; 17:04:22:0922020:03:19; 18:58:08:5622020:03:19; 18:58:42:545 

I have a set of data that I want to plot in a graph. I have a list of timestamps which I want to group per hour and then I want to see the amount of points per hour in a line graph (over one day, where I have data of multiple days, which I want in a graph per day).

I have the value of the points per hour and I have the hours on which they occur. I do not get it to work that it gives a line in my graph and I think I am missing a simple solution. I have posted a picture as well to you can see the output. What is the following step to take to get the line to show?

I have the following code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import csv
from datetime import timedelta
import datetime as dt
 
data= pd.read_csv('test2.csv', header=0, index_col=None, parse_dates=True, sep=';', usecols=[0,1])
df=pd.DataFrame(data, columns=['Date', 'Time'])
df['DateTime'] = df['Date'] + df['Time']

#for date in df['DateTime']:


def RemoveMilliSeconds(x):
    return x[:-5]

df['Time'] = df['Time'].apply(RemoveMilliSeconds)

df['DateTime'] = df['Date'] + df['Time']
df['DateTime'] = pd.to_datetime(df['DateTime'], format="%Y:%m:%d %H:%M:%S")
df['TimeDelta'] = df.groupby('Date')['DateTime'].apply(lambda x: x.diff())

#print(df['TimeDelta'] / np.timedelta64(1, 'h'))
df['HourOfDay'] = df['DateTime'].dt.hour
df['Day'] = df['DateTime'].dt.day

grouped_df = df.groupby('Day')

for key, item in grouped_df:
    print(grouped_df.get_group(key)['HourOfDay'].value_counts(), "\n\n")


res=[]
for i in df['DateTime'].dt.hour:
    if i not in res:
        res.append(i)
print("enkele lijst:" + str(res))
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

x=np.array([res])

y=np.array([df['HourOfDay'].value_counts()])
plt.plot(x,y)
plt.show()

#times = pd.DatetimeIndex(df.Time)
#grouped = df.groupby([times.hour])

The picture that shows the output

My sample data:

Date;Time
2020:02:13 ;12:39:02:913 
2020:02:13 ;12:39:42:915 
2020:02:13 ;13:06:20:718 
2020:02:13 ;13:18:25:988 
2020:02:13 ;13:34:02:835 
2020:02:13 ;13:46:35:793 
2020:02:13 ;13:59:10:659 
2020:02:13 ;14:14:33:571 
2020:02:13 ;14:25:36:381 
2020:02:13 ;14:35:38:342 
2020:02:13 ;14:46:04:006 
2020:02:13 ;14:56:57:346 
2020:02:13 ;15:07:39:752 
2020:02:13 ;15:19:44:868 
2020:02:13 ;15:32:31:438 
2020:02:13 ;15:44:44:928 
2020:02:13 ;15:56:54:453 
2020:02:13 ;16:08:21:023 
2020:02:13 ;16:19:17:620 
2020:02:13 ;16:29:56:944 
2020:02:13 ;16:40:11:132 
2020:02:13 ;16:49:12:113 
2020:02:13 ;16:57:26:652 
2020:02:13 ;16:57:26:652 
2020:02:13 ;17:04:22:092 
2020:02:17 ;08:58:08:562 
2020:02:17 ;08:58:42:545 

You did not prepare your x-y data in a way that matplotlib can understand their relationship.

The easy "answer" would be to plot res and df['HourOfDay'].value_counts() directly against each other:

#.....
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

plt.plot(res, df['HourOfDay'].value_counts())
plt.show()

But the sample output shows you the problem:

matplotlib does not order the x-values for you (that would misrepresent the data in a different context). So, we have to do this before plotting:

#.....
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

xy=np.stack((res, df['HourOfDay'].value_counts()))
xy = xy[:, np.argsort(xy[0,:])]
plt.plot(*xy)
plt.show()

Now, the x-values are in the correct order, and the y-values have been sorted with them in the combined xy array that we created for this purpose:

Obviously, it would be better to prepare res and df['HourOfDay'].value_counts() directly, so we don't have to create a combined array to sort them together. Since you did not provide an explanation what your code is supposed to do, we can only post-fix the problem the code created - you should structure it differently, so that this problem does not occur in the first place. But only you can do this (or people who understand the intention of your code - I don't).

I also suggest spending some time with the instructive matplotlib tutorials - this time is not wasted.

Update
It seems you try to create a subplot for each day and count the number of entries per hour. I would approach it like this (but I am sure, some panda experts have better ways for this):

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
 
#read your data and create datetime index
df= pd.read_csv('test1.txt', sep=";") 
df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S")

#group by date and hour, count entries
dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index()
dfcounts.columns = ["Date", "Hour", "Count"]
maxcount = dfcounts.Count.max()

#group by date for plotting
dfplot = dfcounts.groupby(dfcounts.Date)

#plot each day into its own subplot
fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8))

for i, groupdate in enumerate(dfplot.groups):
    ax=axs[i]
    #the marker is not really necessary but has been added in case there is just one entry per day
    ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
    ax.set_title(str(groupdate))
    ax.set_xlim(0, 24)
    ax.set_ylim(0, maxcount * 1.1)
    ax.xaxis.set_ticks(np.arange(0, 25, 2))

plt.tight_layout()
plt.show()

Sample output:

Update 2
To plot them into individual figures, you can modify the loop:

#...
dfplot = dfcounts.groupby(dfcounts.Date)

for groupdate in dfplot.groups:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
    fig.suptitle("Date:"+str(groupdate), fontsize=16)

    #scaled for comparability among graphs
    ax1.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
    ax1.set_xlim(0, 24)
    ax1.xaxis.set_ticks(np.arange(0, 25, 2))
    ax1.set_ylim(0, maxcount * 1.1)
    ax1.set_title("comparable version")

    #scaled to maximize visibility per day
    ax2.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="red", marker="x")
    ax2.set_xlim(0, 24)
    ax2.xaxis.set_ticks(np.arange(0, 25, 2))
    ax2.set_title("expanded version")
    
    plt.tight_layout()
    #save optionally 
    #plt.savefig("MyDataForDay"+str(groupdate)+".eps")

print("All figures generated")
plt.show()

Sample output for one of the days:

created with the following test data:

Date;Time
2020:02:13 ;12:39:02:913 
2020:02:13 ;12:39:42:915 
2020:02:13 ;13:06:20:718 
2020:02:13 ;13:18:25:988 
2020:02:13 ;13:34:02:835 
2020:02:13 ;13:46:35:793 
2020:02:13 ;13:59:10:659 
2020:02:13 ;14:14:33:571 
2020:02:13 ;14:25:36:381 
2020:02:13 ;14:35:38:342 
2020:02:13 ;14:46:04:006 
2020:02:13 ;14:56:57:346 
2020:02:13 ;15:07:39:752 
2020:02:13 ;15:19:44:868 
2020:02:13 ;15:32:31:438 
2020:02:13 ;15:44:44:928 
2020:02:13 ;15:56:54:453 
2020:02:13 ;16:08:21:023 
2020:02:13 ;16:19:17:620 
2020:02:13 ;16:29:56:944 
2020:02:13 ;16:40:11:132 
2020:02:13 ;16:49:12:113 
2020:02:13 ;16:57:26:652 
2020:02:13 ;16:57:26:652 
2020:02:13 ;17:04:22:092 
2020:02:17 ;08:58:08:562 
2020:02:17 ;08:58:42:545 
2020:02:17 ;15:19:44:868 
2020:02:17 ;17:32:31:438 
2020:02:17 ;17:44:44:928 
2020:02:17 ;17:56:54:453 
2020:02:17 ;18:08:21:023 
2020:03:19 ;06:19:17:620 
2020:03:19 ;06:29:56:944 
2020:03:19 ;06:40:11:132 
2020:03:19 ;14:49:12:113 
2020:03:19 ;16:57:26:652 
2020:03:19 ;16:57:26:652 
2020:03:19 ;17:04:22:092 
2020:03:19 ;18:58:08:562 
2020:03:19 ;18:58:42:545