更新时间:2023-01-29 22:00:05
It could be that you have many duplicate values for a particular id
, so computing row numbers takes a long time. If you want to check for whether this is the case, you can try:
#standardSQL
SELECT id, COUNT(*) AS id_count
FROM rawData.stock_movement
GROUP BY id
ORDER BY id_count DESC LIMIT 5;
With that said, it may be faster to remove duplicates with this query instead:
#standardSQL
SELECT latest_row.*
FROM (
SELECT ARRAY_AGG(t ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)] AS latest_row
FROM rawData.stock_movement AS t
GROUP BY t.id
);
Here is an example:
#standardSQL
WITH T AS (
SELECT 1 AS id, 'foo' AS x, TIMESTAMP '2017-04-01' AS timestamp UNION ALL
SELECT 2, 'bar', TIMESTAMP '2017-04-02' UNION ALL
SELECT 1, 'baz', TIMESTAMP '2017-04-03')
SELECT latest_row.*
FROM (
SELECT ARRAY_AGG(t ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)] AS latest_row
FROM rawData.stock_movement AS t
GROUP BY t.id
);
The reason that this may be faster is that BigQuery will only keep the row with the largest timestamp in memory at any given point in time.