且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用BigQuery和Apache Beam将SQL表转换为行序列列表?

更新时间:2023-11-21 23:20:52

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
SELECT trip.vehicle_id, ARRAY_AGG(trip ORDER BY trip.start_time) trips
FROM (
  SELECT trip, DIV(ROW_NUMBER() OVER(PARTITION BY vehicle_id ORDER BY start_time) - 1, 4) grp   
  FROM `project.dataset.table` trip
)
GROUP BY trip.vehicle_id, grp

以上假设行程按start_time排序,序列长度= 4
另外,它会在数组中将vehicle_id作为行程信息的一部分返回-如下面的示例

Above assumes ordering of trips by start_time and sequence length = 4
Also, it returns vehicle_id as a part of trip info in array - like in below example

Row vehicle_id  trips.vehicle_id    trips.trip_id   trips.start_time    trips.stop_time  
1   Vehicle1    Vehicle1            Trip1           1                   2    
                Vehicle1            Trip2           2                   3    
                Vehicle1            Trip3           3                   4    
                Vehicle1            Trip4           4                   5    
2   Vehicle1    Vehicle1            Trip5           5                   6    
                Vehicle1            Trip6           6                   6    
                Vehicle1            Trip7           7                   6    
3   Vehicle2    Vehicle2            Trip1           2                   3    
                Vehicle2            Trip2           3                   4    
                Vehicle2            Trip3           4                   5    
                Vehicle2            Trip4           5                   6    

要消除这种情况-请尝试以下

To eliminate this - try below

#standardSQL
SELECT vehicle_id,
  ARRAY( 
    SELECT AS STRUCT * EXCEPT(vehicle_id)
    FROM UNNEST(trips)
    ORDER BY start_time
  ) trips
FROM (
  SELECT trip.vehicle_id, ARRAY_AGG(trip ORDER BY trip.start_time) trips
  FROM (
    SELECT trip, DIV(ROW_NUMBER() OVER(PARTITION BY vehicle_id ORDER BY start_time) - 1, 4) grp   
    FROM `project.dataset.table` trip
  )
  GROUP BY trip.vehicle_id, grp
)


Row vehicle_id  trips.trip_id   trips.start_time    trips.stop_time  
1   Vehicle1    Trip1           1                   2    
                Trip2           2                   3    
                Trip3           3                   4    
                Trip4           4                   5    
2   Vehicle1    Trip5           5                   6    
                Trip6           6                   6    
                Trip7           7                   6    
3   Vehicle2    Trip1           2                   3    
                Trip2           3                   4    
                Trip3           4                   5    
                Trip4           5                   6