且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

按月分组后找到最高的

更新时间:2023-01-29 21:01:36

测试数据:

CREATE TEMPORARY TABLE foo( id SERIAL PRIMARY KEY, name INTEGER NOT NULL,
    dt DATE NULL, request BOOL NOT NULL );
INSERT INTO foo (name,dt,request) SELECT random()*1000, 
   '2010-01-01'::DATE+('1 DAY'::INTERVAL)*(random()*3650), random()>0.5 
   FROM generate_series(1,100000) n;
 SELECT * FROM foo LIMIT 10;
 id | name |     dt     | request
----+------+------------+---------
  1 |  110 | 2014-11-05 | f
  2 |  747 | 2015-03-12 | t
  3 |  604 | 2014-09-26 | f
  4 |  211 | 2011-12-14 | t
  5 |  588 | 2016-12-15 | f
  6 |   96 | 2012-02-19 | f
  7 |   17 | 2018-09-18 | t
  8 |  591 | 2018-02-15 | t
  9 |  370 | 2015-07-28 | t
 10 |  844 | 2019-05-16 | f

现在,您必须获取每个名称和月份的计数,然后获取最大计数,但这不会给您提供具有最大计数的名称,这需要返回上一个结果.为了只进行一次分组,需要在CTE中完成:

Now you have to get the count per name and month, then get the max count, but that won't give you the name that has the maximum, which requires joining back with the previous result. In order to do the group by only once, it is done in a CTE:

WITH totals AS (
     SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo 
      WHERE request=true GROUP BY name,mon
  )
SELECT * FROM 
   (SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
   JOIN totals USING (mon,cnt);

如果多个名称具有相同的最大计数,则两个名称都将被返回.要只保留一个,可以使用DISTICT ON:

If several names have the same maximum count, they will be returned both. To keep only one, you can use DISTICT ON:

WITH (same as above)
SELECT DISTINCT ON (mon) * FROM
   (SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
   JOIN totals USING (mon,cnt) ORDER BY mon,name;

您还可以使用DISTINCT ON来按计数desc在该cas中仅保留每月一次由ORDER子句指定的行,以便它保留最高的计数.

You can also use DISTINCT ON to keep only one row per month, specified by the ORDER clause, in this cas by count desc, so it keeps the highest count.

SELECT DISTINCT ON (mon) * FROM (
     SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo 
      WHERE request=true GROUP BY name,mon
  )x ORDER BY mon, cnt DESC;

...或者您可以通过将主键粘贴到传递给max()的数组中来破解argmax()函数,这意味着它将返回具有最大值的行的id:

...or you could hack an argmax() function by sticking the primary key into an array passed to max(), which means it will return the id of the row which has the maximum value:

SELECT mon, cntid[1] cnt, name FROM
(SELECT mon, max(ARRAY[cnt,id]) cntid FROM (
     SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt, min(id) id FROM foo
      WHERE request=true GROUP BY name,mon
  ) x GROUP BY mon)y
 JOIN foo ON (foo.id=cntid[2]);

哪个会更快?...