且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

按类型组合连续日期时间间隔

更新时间:2022-10-17 22:49:41

由于您的范围是连续的,因此问题本质上变成了 一.如果您有一个标准来帮助您区分具有相同 t 值的不同序列,您可以使用该标准对所有行进行分组,然后只需取 MIN(s), MAX(e) 每个组.

获得此类标准的一种方法是使用两个 ROW_NUMBER 调用.考虑以下查询:

SELECT*,rnk1 = ROW_NUMBER() OVER(ORDER BY s),rnk2 = ROW_NUMBER() OVER(PARTITION BY t ORDER BY s)发件人@句号;

对于您的示例,它将返回以下集合:

s e t rnk1 rnk2---------- ---------- -- -- ---- ----2013-01-01 2013-01-02 3 1 12013-01-02 2013-01-04 1 2 12013-01-04 2013-01-05 1 3 22013-01-05 2013-01-06 2 4 12013-01-06 2013-01-07 2 5 22013-01-07 2013-01-08 2 6 32013-01-08 2013-01-09 1 7 3

rnk1rnk2 排名的有趣之处在于,如果您从另一个中减去一个,您将获得与 t,唯一标识每个具有相同t的不同行序列:

s e t rnk1 rnk2 rnk1 - rnk2---------- ---------- -- ---- ---- ------------2013-01-01 2013-01-02 3 1 1 02013-01-02 2013-01-04 1 2 1 12013-01-04 2013-01-05 1 3 2 12013-01-05 2013-01-06 2 4 1 32013-01-06 2013-01-07 2 5 2 32013-01-07 2013-01-08 2 6 3 32013-01-08 2013-01-09 1 7 3 4

知道了这一点,您可以轻松地应用分组和聚合.这是最终查询的样子:

WITH 分区 AS (选择*,g = ROW_NUMBER() OVER ( ORDER BY s)- ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)发件人@句号)选择s = MIN(s),e = MAX(e),吨从分区通过...分组吨,G;

如果您愿意,可以在 在 SQL Fiddle 使用此解决方案.>

Say we have such a table:

declare @periods table (
    s date, 
    e date,
    t tinyint
);

with date intervals without gaps ordered by start date (s)

insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

All date intervals have different types (t).

It is required to combine date intervals of the same type where they are not broken by intervals of the other types (having all intervals ordered by start date).

So the result table should look like:

      s     |      e     |  t
------------|------------|-----
 2013-01-01 | 2013-01-02 |  3
 2013-01-02 | 2013-01-05 |  1
 2013-01-05 | 2013-01-08 |  2
 2013-01-08 | 2013-01-09 |  1

Any ideas how to do this without cursor?


I've got one working solution:

declare @periods table (
    s datetime primary key clustered, 
    e datetime,
    t tinyint,
    period_number int   
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

declare @t tinyint = null;  
declare @PeriodNumber int = 0;
declare @anchor date;

update @periods
    set  period_number = @PeriodNumber, 
    @PeriodNumber = case
                        when @t <> t
                            then  @PeriodNumber + 1
                        else
                            @PeriodNumber
                    end,
    @t = t,
    @anchor = s
option (maxdop 1);

select 
    s = min(s),
    e = max(e),
    t = min(t)
from 
    @periods    
group by 
    period_number
order by 
    s;

but I doubt if I can rely on such a behavior of UPDATE statement?

I use SQL Server 2008 R2.


Edit:

Thanks to Daniel and this article: http://www.sqlservercentral.com/articles/T-SQL/68467/

I found three important things that were missed in the solution above:

  1. There must be clustered index on the table
  2. There must be anchor variable and call of the clustered column
  3. Update statement should be executed by one processor, i.e. without parallelism

I've changed the above solution in accordance with these rules.

Since your ranges are continuous, the problem essentially becomes a one. If only you had a criterion to help you to distinguish between different sequences with the same t value, you could group all the rows using that criterion, then just take MIN(s), MAX(e) for every group.

One method of obtaining such a criterion is to use two ROW_NUMBER calls. Consider the following query:

SELECT
  *,
  rnk1 = ROW_NUMBER() OVER (               ORDER BY s),
  rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;

For your example it would return the following set:

s           e           t   rnk1  rnk2
----------  ----------  --  ----  ----
2013-01-01  2013-01-02  3   1     1
2013-01-02  2013-01-04  1   2     1
2013-01-04  2013-01-05  1   3     2
2013-01-05  2013-01-06  2   4     1
2013-01-06  2013-01-07  2   5     2
2013-01-07  2013-01-08  2   6     3
2013-01-08  2013-01-09  1   7     3

The interesting thing about the rnk1 and rnk2 rankings is that if you subtract one from the other, you will get values that, together with t, uniquely identify every distinct sequence of rows with the same t:

s           e           t   rnk1  rnk2  rnk1 - rnk2
----------  ----------  --  ----  ----  -----------
2013-01-01  2013-01-02  3   1     1     0
2013-01-02  2013-01-04  1   2     1     1
2013-01-04  2013-01-05  1   3     2     1
2013-01-05  2013-01-06  2   4     1     3
2013-01-06  2013-01-07  2   5     2     3
2013-01-07  2013-01-08  2   6     3     3
2013-01-08  2013-01-09  1   7     3     4

Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:

WITH partitioned AS (
  SELECT
    *,
    g = ROW_NUMBER() OVER (               ORDER BY s)
      - ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
  FROM @periods
)
SELECT
  s = MIN(s),
  e = MAX(e),
  t
FROM partitioned
GROUP BY
  t,
  g
;

If you like, you can play with this solution at SQL Fiddle.