且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

T-SQL 的“ORDER BY RAND()"的官方文档在哪里?和“按 NEWID() 订购"?

更新时间:2023-02-03 19:39:09

If you're trying to determine why these behave differently, the reason is simple: one is evaluated once, and treated as a runtime constant (RAND()), while the other is evaluated for every single row (NEWID()). Observe this simple example:

SELECT TOP (5) RAND(), NEWID() FROM sys.objects;

Results:

0.240705716465209        8D5D2B55-E5DE-4FF9-BA84-BC82F37B8F3A
0.240705716465209        C4CBF1CA-E6D0-4076-B6A6-5048EA612048
0.240705716465209        9BFAE5BB-B5B9-47DE-B8F9-77AAEFA5F9DB
0.240705716465209        89FFD8A1-AC73-4CEB-A5C0-00A76D040382
0.240705716465209        BCC89923-735E-43B3-9ECA-622A8C98AD7D

Now, if you apply an order by to the left column, SQL Server says, ok, but every single value is the same, so I'm basically just to ignore your request and move on to the next ORDER BY column. If there isn't one, then SQL Server will default to returning the rows in whatever order it deems most efficient.

If you apply an order by to the right column, now SQL Server actually has to sort all of the values. This introduces a Sort (or a TopN Sort if TOP is used) operator into the plan, and is likely going to take more CPU (though overall duration may not be substantially affected, depending on the size of the set and other factors).

Let's compare the plans for these two queries:

SELECT RAND() FROM sys.all_columns ORDER BY RAND();

The plan:

There is no sort operator going on, and both of the scans are Ordered = False - this means that SQL Server has not decided to explicitly implement any ordering, but this certainly does not mean that the order will be any different on each execution - it just means that the order is non-deterministic (unless you add a secondary ORDER BY - but even in that case, the RAND() ordering is still ignored because, well, it's the same value on every row).

And now NEWID():

SELECT NEWID() FROM sys.all_columns ORDER BY NEWID();

The plan:

There is a new Sort operator there, which means that SQL Server must reorder all the rows to be returned in the order of the generated GUID values on each row. The scans of course are still unordered, but the Sort ultimately applies the order.

I don't know that this specific implementation detail is officially documented anywhere, though I did find this article which includes an explicit ORDER BY NEWID(). I doubt you'll find anything official that documents ORDER BY RAND() in any way, because that just doesn't make any sense to do, officially supported or not.

Re: the comment that SQL Server assigns a seed value at random - this should not be interpreted as a seed value **per row** at random. Demonstration:

SELECT MAX(r), MIN(r) FROM 
(
  SELECT RAND() FROM sys.all_columns AS s1 
  CROSS JOIN sys.all_columns AS s2
) AS x(r);

Results:

0.4866202638872        0.4866202638872

On my machine, this took about 15 seconds to run, and the results were always the same for both MIN and MAX. Keep increasing the number of rows returned and the amount of time it takes, and I guarantee you will continue to see the exact same value for RAND() on every row. It is calculated exactly once, and that is not because SQL Server is wise to the fact that I am not returning all of the rows. This also yielded the same result (and it took just under 2 minutes to populate the entire table with 72 million rows):

SELECT RAND() AS r INTO #x 
      FROM sys.all_columns AS s1 
CROSS JOIN sys.all_columns AS s2
CROSS JOIN sys.all_columns AS s3;

SELECT MAX(r), MIN(r) FROM #x;

(In fact the SELECT took almost as long as the initial population. Do not try this on a single-core laptop with 4GB of RAM.)

The result:

0.302690214345828        0.302690214345828