且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

选择15分钟窗口的数据-PostgreSQL

更新时间:2022-01-28 07:29:06

快速而肮脏的方式: http://sqlfiddle.com/#!1/bd2f6/21 我将列命名为 tstamp 而不是您的时间戳记

Quick and dirty way: http://sqlfiddle.com/#!1/bd2f6/21 I named my column tstamp instead of your timestamp

with t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
    (select duration from tmp group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmp on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

简要说明:


  1. 计算最小时间戳和最大时间戳

  2. 生成最小和最大时间戳之间的15分钟间隔

  3. 具有唯一持续时间值的交叉联接结果

  4. 左联接原始数据(左联接很重要,因为这将保留输出中所有可能的组合,并且会出现 null 其中给定间隔不存在持续时间。

  5. 汇总数据。 count(null)= 0

  1. Calculate minimum and maximum timestamp
  2. Generate 15 minutes intervals between minimum and maximum
  3. Cross join results with unique values of duration
  4. Left join original data (left join is important, because this will keep all possible combination in output and there will be null where duration does not exists for given interval.
  5. Aggregate data. count(null)=0

如果您有更多的表并且应该对它们的联合应用算法,假设我们有三个表 tmp1 ,tmp2,tmp3 全部包含列 tstamp duration 的列。解决方案:

In case you have more tables and the algorithm should be applied on their union. Suppose we have three tables tmp1, tmp2, tmp3 all with columns tstamp and duration. The we can extend the previous solution:

with 

tmpout as (
  select * from tmp1 union all
  select * from tmp2 union all
  select * from tmp3
)

,t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
    (select duration from tmpout group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmpout on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

您应该真的知道 with 子句。对于PostgreSQL中的任何数据分析来说,它都是无价之宝。

You should really know with clause in PostgreSQL. It is invaluable concept for any data analysis in PostgreSQL.