且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

mysql选择时间跨度之间的行数

更新时间:2023-11-23 12:05:40

好,我知道我来晚了,但是我还是想发表我的回答:-)

OK, I realise I'm a bit late, but I wanted to post my answer anyway :-)

您可以使用子查询来完成所需的操作,但这可能需要很长时间才能在大型表上完成...

What you require can be done using a subquery, but this might take ages to complete on a large table...

关于这个问题,我想到了两种不同的方法.

Thinking about the question I came to two different approaches.

其中一个已经在其他答案中得到了解决,它的工作方式是从特定的时间点开始,查看此时开始的时间间隔,然后查看紧随其后的等时间隔.这样可以带来清晰,可理解的结果,并且可能是所需的结果(例如,每个日历日用户不得超过100次下载).但是,这将完全错过用户在午夜前一小时进行99次下载,而在新一天的第一小时又进行99次下载的情况.

One of them has already been dealt with in the other answers, it works by starting at a specific point in time, looking at the interval that begins at this time and then looking at the interval of equal duration that immediately follows. This leads to clear, understandable results and is probably what would be required (e.g. user must not exceed 100 downloads per calender day). This however would completely miss situations in which a user does 99 downloads during the hour before midnight and another 99 in the first hour of the new day.

因此,如果所需结果是前十名下载者列表"中的更多内容,则这是另一种方法.乍看之下,这里的结果可能不太容易理解,因为一次下载可以计入多个时间间隔.这是因为间隔将(并且需要)重叠.

So if the required result is more of a "top ten downloaders list", then this is the other approach. The results here may not be as understandable at first glance, because one single download can count towards multiple intervals. This is because the intervals will (and need to) overlap.

这是我的设置.我已经根据您的声明创建了表格,并添加了两个索引:

Here's my setup. I've created the table from your statement and added two indexes:

CREATE INDEX downloads_timestamp on downloads (dl_date);
CREATE INDEX downloads_user_id on downloads (user_id);

我插入表中的数据:

SELECT * FROM downloads;
+----+----------+---------+---------------------+
| id | stuff_id | user_id | dl_date             |
+----+----------+---------+---------------------+
|  1 |        1 |       1 | 2011-01-24 09:00:00 |
|  2 |        1 |       1 | 2011-01-24 09:30:00 |
|  3 |        1 |       1 | 2011-01-24 09:35:00 |
|  4 |        1 |       1 | 2011-01-24 10:00:00 |
|  5 |        1 |       1 | 2011-01-24 11:00:00 |
|  6 |        1 |       1 | 2011-01-24 11:15:00 |
|  7 |        1 |       1 | 2011-01-25 09:15:00 |
|  8 |        1 |       1 | 2011-01-25 09:30:00 |
|  9 |        1 |       1 | 2011-01-25 09:45:00 |
| 10 |        1 |       2 | 2011-01-24 08:00:00 |
| 11 |        1 |       2 | 2011-01-24 12:00:00 |
| 12 |        1 |       2 | 2011-01-24 12:01:00 |
| 13 |        1 |       2 | 2011-01-24 12:02:00 |
| 14 |        1 |       2 | 2011-01-24 12:03:00 |
| 15 |        1 |       2 | 2011-01-24 12:00:00 |
| 16 |        1 |       2 | 2011-01-24 12:04:00 |
| 17 |        1 |       2 | 2011-01-24 12:05:00 |
| 18 |        1 |       2 | 2011-01-24 12:06:00 |
| 19 |        1 |       2 | 2011-01-24 12:07:00 |
| 20 |        1 |       2 | 2011-01-24 12:08:00 |
| 21 |        1 |       2 | 2011-01-24 12:09:00 |
| 22 |        1 |       2 | 2011-01-24 12:10:00 |
| 23 |        1 |       2 | 2011-01-25 14:00:00 |
| 24 |        1 |       2 | 2011-01-25 14:12:00 |
| 25 |        1 |       2 | 2011-01-25 14:25:00 |
+----+----------+---------+---------------------+
25 rows in set (0.00 sec)

如您所见,所有下载都是在昨天或今天进行的,并且是由两个不同的用户执行的.

As you can see, all downloads occured either yesterday or today and were executed by two different users.

现在,我们要注意的是:在数学上"为"2011-01-24 0:00"到"2011-01"之间的24小时间隔(或任何其他持续时间的间隔)是无限的-25 23:59:59'.但是,由于服务器的精度为一秒,因此可以归结为86,400个间隔:

Now, what we have to mind is the following: There is (mathematically) an infinite number of 24 hour intervals (or intervals of any other duration) between '2011-01-24 0:00' and '2011-01-25 23:59:59'. But as the server's precision is one second, this boils down to 86,400 intervals:

First interval:  2011-01-24 0:00:00 -> 2011-01-25 0:00:00
Second interval: 2011-01-24 0:00:01 -> 2011-01-25 0:00:01
Third interval: 2011-01-24 0:00:02 -> 2011-01-25 0:00:02
   .
   .
   .
86400th interval: 2011-01-24 23:59:59 -> 2011-01-25 23:59:59

因此,我们可以使用循环遍历所有这些间隔,并计算每个用户和每个间隔的下载次数.当然,并非所有间隔都对我们有相同的兴趣,因此我们可以通过使用表中的时间戳作为间隔开始"来跳过其中的一些.

So we could use a loop to iterate over all these intervals and calculate the number of downloads per user and per interval. Of course, not all intervals are of the same interest to us, so we can skip some of them by using the timestamps in the table as "beginning of interval".

这是以下查询的作用.它使用表中的每个下载时间戳作为间隔开始",添加间隔持续时间,然后查询此间隔内每个用户的下载次数.

This is what the following query does. It uses every download timestamp in the table as "start of interval", adds the interval duration and then queries the number of downloads per user during this interval.

SET @duration = '24:00:00';
SET @limit = 5;
SELECT * FROM 
    (SELECT t1.user_id, 
            t1.dl_date startOfPeriod, 
            ADDTIME(t1.dl_date,@duration) endOfPeriod, 
           (SELECT COUNT(1) 
            FROM downloads t2 
            WHERE t1.user_id = t2.user_id 
            AND t1.dl_date <= t2.dl_date 
            AND ADDTIME(t1.dl_date,@duration) >= t2.dl_date) count
     FROM downloads t1) t3 
WHERE count > @limit;

结果如下:

+---------+---------------------+---------------------+-------+
| user_id | startOfPeriod       | endOfPeriod         | count |
+---------+---------------------+---------------------+-------+
|       1 | 2011-01-24 09:00:00 | 2011-01-25 09:00:00 |     6 |
|       1 | 2011-01-24 09:30:00 | 2011-01-25 09:30:00 |     7 |
|       1 | 2011-01-24 09:35:00 | 2011-01-25 09:35:00 |     6 |
|       1 | 2011-01-24 10:00:00 | 2011-01-25 10:00:00 |     6 |
|       2 | 2011-01-24 08:00:00 | 2011-01-25 08:00:00 |    13 |
|       2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 |    12 |
|       2 | 2011-01-24 12:01:00 | 2011-01-25 12:01:00 |    10 |
|       2 | 2011-01-24 12:02:00 | 2011-01-25 12:02:00 |     9 |
|       2 | 2011-01-24 12:03:00 | 2011-01-25 12:03:00 |     8 |
|       2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 |    12 |
|       2 | 2011-01-24 12:04:00 | 2011-01-25 12:04:00 |     7 |
|       2 | 2011-01-24 12:05:00 | 2011-01-25 12:05:00 |     6 |
+---------+---------------------+---------------------+-------+
12 rows in set (0.00 sec)