且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

SQL Server 2014:根据来自第三个表的值配对来自 2 个表的行

更新时间:2023-01-31 11:11:00

下面的更新/新版本 - 现在允许 T1 中的行而不匹配 T2 中的行.

根据对以下评论的讨论,我更新了此建议.

此代码替换了从 DECLARE @EventIdSELECT 语句结尾的所有内容.

逻辑如下——对于T1中的每一行...

  1. 确定 T1 中该行的时间边界(在其 EventTimestamp 与该车辆的 T1 中的下一个 EventTimestamp 之间;如果没有下一个事件,则为未来 1 天)
  2. 在 T2 中找到匹配的行,其中匹配"意味着 a) 相同的 VehicleId,b) 相同的 EventId,c) EventValue 受#E 中可能性的限制,并且 d) 发生在 T1 的时间边界内
  3. 查找这些行中的第一行(如果有)
  4. 将 EventDelay 计算为两个时间戳之间的时间

;与 t1 AS(选择车辆 ID,事件时间戳,事件 ID,事件值,COALESCE(LEAD(EventTimestamp, 1) OVER (PARTITION BY VehicleID ORDER BY EventTimestamp), DATEADD(day, 1, getdate())) AS NextT1_EventTimeStamp从#T1),订单AS(选择 t1.VehicleId,t1.EventTimestamp AS 首先,t2.EventTimestamp AS 最后,t1.EventId,t2.EventValue,ROW_NUMBER() OVER (PARTITION BY t1.VehicleId, t1.EventTimestamp, t1.EventId ORDER BY t2.EventTimestamp) AS rn从 t1LEFT OUTER JOIN #E AS e ON t1.EventId = e.EventIdAND t1.EventValue = e.FirstValue左外连接#T2 AS t2 ON t1.VehicleID = t2.VehicleIDAND t1.EventID = t2.EventIDAND t2.eventId = e.EventIdAND t2.EventValue = e.LastValueAND t2.EventTimestamp >t1.EventTimestampAND t2.EventTimestamp 

不断增长的DB<>fiddle 更新以及原始帖子和以前的建议.

I have 2 tables that contains typed events over time.

The first table #T1 contains events that always comes before events in the second table #T2.

A third table #E contains records that defines for an event the values that comes in #T1 and #T2 respectively.

Sample data:

CREATE TABLE #T1 
(
    EventTimestamp DateTime, 
    VehicleId int, 
    EventId varchar(50), 
    EventValue varchar(50)
);

CREATE TABLE #T2 
(
    EventTimestamp DateTime, 
    VehicleId int, 
    EventId varchar(50), 
    EventValue varchar(50)
);

CREATE TABLE #E 
(
     EventId varchar(50), 
     FirstValue int, 
     LastValue varchar(50)
);

INSERT INTO #T1(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (GETDATE(), 1, 'TwigStatus', '12'),
       (GETDATE(), 2, 'SafeProtectEvent', '5')

INSERT INTO #T2(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (DATEADD(second, 30, GETDATE()), 1, 'TwigStatus', '7'),
       (DATEADD(second, 30, GETDATE()), 2, 'SafeProtectEvent', '6')

INSERT INTO #E(EventId, FirstValue, LastValue)
VALUES ('TwigStatus', '12', '7'),
       ('SafeProtectEvent', '5', '6')

DECLARE @EventId varchar(50)  = 'TwigStatus';
DECLARE @FirstValue varchar(50) = '12';
DECLARE @LastValue varchar(50) = '7';

WITH ord AS 
(
    SELECT
        first, last,
        EventNr = ROW_NUMBER() OVER (ORDER BY first) 
    FROM
        (SELECT 
             first = t1.EventTimestamp, last = t2.EventTimestamp,
             rn = ROW_NUMBER() OVER (PARTITION BY t1.VehicleId ORDER BY t2.EventTimestamp) 
        FROM
            #T1 t1
        INNER JOIN 
            #T2 t2 ON t2.EventTimestamp > t1.EventTimestamp  
                   AND t2.EventValue = @LastValue
        WHERE 
            t1.EventId = @EventId AND t1.EventValue = @FirstValue) ids
    WHERE 
        rn = 1
)
SELECT
    t.VehicleId, o.first, o.last, t.EventId, t.EventValue 
FROM
    #T2 t
INNER JOIN 
    ord o ON t.EventTimestamp >= o.first 
          AND t.EventTimestamp <= o.last;
WHERE t.EventId = @EventId;    

DROP TABLE #E;
DROP TABLE #T1;
DROP TABLE #T2;

Basically, for a record in table E you see that for eventID 'TwigStatus' the value '12' should come first in table T1 and then '7' should be next in table T2. There is a second event sequence that is defined.

The VehicleId column is the link between the tables T1 and T2.

I need to compute the delay between two matching events in table T1 and T2.

To start simple, I do not use the table E yet, I'm using variables that contains predefined values and I'm returning timestamps.

But the result of the query above;

VehicleId   first                   last                    EventId             EventValue
1           2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 TwigStatus          7
2           2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 SafeProtectEvent    6

Is not what I'm expecting because the EventId 'SafeProtectEvent' Should be filtered out for now.

So I have 2 questions:

  1. How to avoid the second event to show with the actual query.
  2. How to work with the content of the table E and get rid of variables to process event sequences.


EDIT 1: Problem 1 Solved by adding a restriction on the query (see above)

Update/new version below - now allows rows in T1 without matching rows in T2.

Based on discussion on comments below, I have updated this suggestion.

This code replaces everything from the DECLARE @EventId to the end of that SELECT statement.

Logic is as follows - for each row in T1 ...

  1. Determine the time boundaries for that row in T1 (between its EventTimestamp, and the next EventTimestamp in T1 for that vehicle; or 1 day in the future if there is no next event)
  2. Find the matching rows in T2, where 'matching' means a) same VehicleId, b) same EventId, c) EventValue is limited by possibilities in #E, and d) occurs within the time boundaries of T1
  3. Find the first of these rows, if available
  4. Calculate EventDelay as the times between the two timestamps

; WITH t1 AS
        (SELECT     VehicleId, 
                    EventTimestamp, 
                    EventId,
                    EventValue,
                    COALESCE(LEAD(EventTimestamp, 1) OVER (PARTITION BY VehicleID ORDER BY EventTimestamp), DATEADD(day, 1, getdate())) AS NextT1_EventTimeStamp
            FROM    #T1
        ),
   ord AS
        (SELECT     t1.VehicleId, 
                    t1.EventTimestamp AS first, 
                    t2.EventTimestamp AS last,
                    t1.EventId, 
                    t2.EventValue,
                    ROW_NUMBER() OVER (PARTITION BY t1.VehicleId, t1.EventTimestamp, t1.EventId ORDER BY t2.EventTimestamp) AS rn
            FROM    t1
                    LEFT OUTER JOIN #E  AS e ON t1.EventId = e.EventId 
                                       AND t1.EventValue = e.FirstValue
                    LEFT OUTER JOIN #T2 AS t2 ON t1.VehicleID = t2.VehicleID 
                                       AND t1.EventID = t2.EventID
                                       AND t2.eventId = e.EventId 
                                       AND t2.EventValue = e.LastValue
                                       AND t2.EventTimestamp > t1.EventTimestamp
                                       AND t2.EventTimestamp < NextT1_EventTimeStamp
        )
    SELECT      VehicleId, first, last, EventId, EventValue,
                DATEDIFF(second, first, last) AS EventDelay
        FROM    ord
        WHERE   rn = 1

The ever-growing DB<>fiddle has the latest updates, as well as original posts and previous suggestions.