更新时间:2023-01-29 11:32:43
忽略 LineNumber 为空.在这种情况下应该如何处理 IsRepeat?
Ignores LineNumber is null. How should IsRepeat be handled in that case?
它适用于测试数据.它的效率是否足以满足生产量的需求?
It works for test data. Whether it will be efficient enough for production volumes?
在pair上有重复(LineNumber, CreatedOn)的情况下,任意选择一个.(JobId 最小的那个)
In the case of duplicate (LineNumber, CreatedOn) on pairs, arbitrarily choose one. (The one with minimum JobId)
基本思路:
; with AllPairsByLineNumberAtLeast7DaysApart (LineNumber
, LeftJobId
, RightJobId
, BeginCreatedOn
, EndCreatedOn) as
(select l.LineNumber
, l.JobId
, r.JobId
, dateadd(day, 7, l.CreatedOn)
, r.CreatedOn
from Job l
inner join Job r
on l.LineNumber = r.LineNumber
and dateadd(day, 7, l.CreatedOn) < r.CreatedOn
and l.JobId <> r.JobId)
-- Count the number of rows within from BeginCreatedOn
-- up to and including EndCreatedOn
-- In the case of CreatedOn = EndCreatedOn,
-- include only jobId <= jobid, to handle ties in CreatedOn
, AllPairsCount(LineNumber, LeftJobId, RightJobId, Cnt) as
(select ap.LineNumber, ap.LeftJobId, ap.RightJobId, count(*)
from AllPairsByLineNumberAtLeast7DaysApart ap
inner join Job j
on j.LineNumber = ap.LineNumber
and ap.BeginCreatedOn <= j.createdOn
and (j.CreatedOn < ap.EndCreatedOn
or (j.CreatedOn = ap.EndCreatedOn
and j.JobId <= ap.RightJobId))
group by ap.LineNumber, ap.LeftJobId, ap.RightJobId)
, Step1 (LineNumber, JobId, CreatedOn, RN) as
(select LineNumber, JobId, CreatedOn
, row_number() over
(partition by LineNumber order by CreatedOn, JobId)
from Job)
, Results (JobId, LineNumber, CreatedOn) as
-- Start with the first rows.
(select JobId, LineNumber, CreatedOn
from Step1
where RN = 1
and LineNumber is not null
-- get the next row
union all
select j.JobId, j.LineNumber, j.CreatedOn
from Results r
inner join AllPairsCount apc on apc.LeftJobId = r.JobId
inner join Job j
on j.JobId = apc.RightJobId
and apc.CNT = 1)
update j
set IsRepeat = case when R.JobId is not null then 0 else 1 end
from Job j
left outer join Results r
on j.JobId = R.JobId
where j.LineNumber is not null
昨晚我关掉电脑后,我意识到我让事情变得比他们需要的更复杂.一个更直接(在测试数据上,稍微更有效)的查询:
After I turned off the computer last night I realized I had made things more complicated than they needed to be. A more straightforward (and on the test data, slightly more effecient) query:
基本思路:
; with PotentialSteps (FromJobId, ToJobId) as
(select FromJobId, ToJobId
from (select f.JobId as FromJobId
, t.JobId as ToJobId
, row_number() over
(partition by f.LineNumber order by t.CreatedOn, t.JobId) as RN
from Job f
inner join Job t
on f.LineNumber = t.LineNumber
and dateadd(day, 7, f.CreatedOn) < t.CreatedOn) t
where RN = 1)
, NonRepeats (JobId) as
(select JobId
from (select JobId
, row_number() over
(partition by LineNumber order by CreatedOn, JobId) as RN
from Job) Start
where RN = 1
union all
select J.JobId
from NonRepeats NR
inner join PotentialSteps PS
on NR.JobId = PS.FromJobId
inner join Job J
on PS.ToJobId = J.JobId)
update J
set IsRepeat = case when NR.JobId is not null then 0 else 1 end
from Job J
left outer join NonRepeats NR
on J.JobId = NR.JobId
where J.LineNumber is not null