更新时间:2023-10-23 14:15:16
问题的核心不是为什么这个顺序与LINQ有关系? LINQ只是翻译字面上没有重新排序。真正的问题是为什么这两个SQL查询具有不同的性能?
我只能插入100k行才能重现问题。在这种情况下,优化器的弱点正在被触发:由于复杂的条件,它不能识别它可以在 Color
上进行搜索。在第一个查询中,优化器确实识别了模式并创建了一个索引搜索。
没有什么语义的原因。即使在 NULL
上寻求索引时也可以进行搜索。这是优化器中的弱点/错误。以下是两个计划:
EF尝试在这里有用,因为它假定列和过滤器变量都可以为空。在这种情况下,它试图给你一个匹配(根据C#语义是正确的)。
我尝试通过添加以下过滤器来撤销:
颜色IS NOT NULL AND @ p__linq__0 IS NOT NULL
AND Size is NOT NULL AND @ p__linq__1 IS NOT NULL
希望优化器现在使用这些知识来简化复合EF滤波器表达式。它没有这样做。如果这样做的话,EF查询就可以将相同的过滤器添加到一个简单的修复程序。
这是我按照你应该尝试的顺序推荐的修复:
颜色,大小
和/或尺寸,颜色
。 INTERSECT
/ Queryable.Intersect
组合过滤器。这通常会导致不同的计划形状。所有这些都是解决方法,而不是根本原因修复。
最后我在这里,我不满意SQL Server和EF。两种产品都应该固定。唉,他们可能不会,你也等不及了。
这是索引脚本:
CREATE NONCLUSTERED INDEX IX_Widget_Colour_Size ON dbo.Widget
(
颜色,大小
)WITH(STATISTICS_NORECOMPUTE = OFF ,IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON)ON [PRIMARY]
CREATE NONCLUSTERED INDEX IX_Widget_Size_Colour ON dbo.Widget
(
大小,颜色
)WITH(STATISTICS_NORECOMPUTE = OFF,IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON)ON [PRIMARY]
I'm using Entity Framework (code first) and finding the order I specify clauses in my LINQ queries is having a huge performance impact, so for example:
using (var db = new MyDbContext())
{
var mySize = "medium";
var myColour = "vermilion";
var list1 = db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}
Where the (rare) colour clause precedes the (common) size clause it's fast, but the other way round it's orders of magnitude slower. The table has a couple of million rows and the two fields in question are nvarchar(50), so not normalised but they are each indexed. The fields are specified in a code first fashion as follows:
[StringLength(50)]
public string Colour { get; set; }
[StringLength(50)]
public string Size { get; set; }
Am I really supposed to have to worry about such things in my LINQ queries, I thought that was the database's job?
System specs are:
Right, to any gluttons for punishment the effect can be replicated as below. The issue seems to be tremendously sensitive to a number of factors so please bear with the contrived nature of some of this:
Install EntityFramework 6.0.0-beta1 via nuget, then generate code first style with:
public class Widget
{
[Key]
public int WidgetId { get; set; }
[StringLength(50)]
public string Size { get; set; }
[StringLength(50)]
public string Colour { get; set; }
}
public class MyDbContext : DbContext
{
public MyDbContext()
: base("DefaultConnection")
{
}
public DbSet<Widget> Widgets { get; set; }
}
Generate the dummy data with the following SQL:
insert into gadget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
Add one index each for Colour and Size, then query with:
string mySize = "99 is the name is this size";
string myColour = "9999 is the name of this colour";
using (var db = new WebDbContext())
{
var list1= db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
}
using (var db = new WebDbContext())
{
var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}
The issue seems connected with the obtuse collection of NULL comparisons in the generated SQL, as below.
exec sp_executesql N'SELECT
[Extent1].[WidgetId] AS [WidgetId],
[Extent1].[Size] AS [Size],
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0)
AND ( NOT ([Extent1].[Size] IS NULL OR @p__linq__0 IS NULL)))
OR (([Extent1].[Size] IS NULL) AND (@p__linq__0 IS NULL)))
AND ((([Extent1].[Colour] = @p__linq__1) AND ( NOT ([Extent1].[Colour] IS NULL
OR @p__linq__1 IS NULL))) OR (([Extent1].[Colour] IS NULL)
AND (@p__linq__1 IS NULL)))',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
go
Changing the equality operator in the LINQ to StartWith() makes the problem go away, as does changing either one of the two fields to be non nullable at the database.
I despair!
Some assistance for any bounty hunters, the issue can be reproduced on SQL Server 2008 R2 Web (64 bit) in a clean database, as follows:
CREATE TABLE [dbo].[Widget](
[WidgetId] [int] IDENTITY(1,1) NOT NULL,
[Size] [nvarchar](50) NULL,
[Colour] [nvarchar](50) NULL,
CONSTRAINT [PK_dbo.Widget] PRIMARY KEY CLUSTERED
(
[WidgetId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Size ON dbo.Widget
(
Size
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Colour ON dbo.Widget
(
Colour
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
insert into Widget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
GO
and then compare the relative performance of the following two queries (you may need to adjust the parameter test values in order to get a query which returns a couple of rows in order to observe the effect, i.e. the second query id much slower).
exec sp_executesql N'SELECT
[Extent1].[WidgetId] AS [WidgetId],
[Extent1].[Size] AS [Size],
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Colour] = @p__linq__0)
AND ( NOT ([Extent1].[Colour] IS NULL
OR @p__linq__0 IS NULL)))
OR (([Extent1].[Colour] IS NULL)
AND (@p__linq__0 IS NULL)))
AND ((([Extent1].[Size] = @p__linq__1)
AND ( NOT ([Extent1].[Size] IS NULL
OR @p__linq__1 IS NULL)))
OR (([Extent1].[Size] IS NULL) AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'9999 is the name of this colour',
@p__linq__1=N'99 is the name is this size'
go
exec sp_executesql N'SELECT
[Extent1].[WidgetId] AS [WidgetId],
[Extent1].[Size] AS [Size],
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0)
AND ( NOT ([Extent1].[Size] IS NULL
OR @p__linq__0 IS NULL)))
OR (([Extent1].[Size] IS NULL)
AND (@p__linq__0 IS NULL)))
AND ((([Extent1].[Colour] = @p__linq__1)
AND ( NOT ([Extent1].[Colour] IS NULL
OR @p__linq__1 IS NULL)))
OR (([Extent1].[Colour] IS NULL)
AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
You may also find, as I do, that if you rerun the dummy data insert so that there are now two million rows, the problem goes away.
The core of the question is not "why does the order matter with LINQ?". LINQ just translates literally without reordering. The real question is "why do the two SQL queries have different performance?".
I was able to reproduce the problem by only inserting 100k rows. In that case a weakness in the optimizer is being triggered: it does not recognize that it can do a seek on Colour
due to the complex condition. In the first query the optimizer does recognize the pattern and creates an index seek.
There is no semantic reason why this should be. A seek on an index is possible even when seeking on NULL
. This is a weakness/bug in the optimizer. Here are the two plans:
EF tries to be helpful here because it assumes that both the column and the filter variable can be null. In that case it tries to give you a match (which according to C# semantics is the right thing).
I tried undoing that by adding the following filter:
Colour IS NOT NULL AND @p__linq__0 IS NOT NULL
AND Size IS NOT NULL AND @p__linq__1 IS NOT NULL
Hoping that the optimizer now uses that knowledge to simplify the complex EF filter expression. It did not manage to do so. If this had worked the same filter could have been added to the EF query providing an easy fix.
Here are the fixes the I recommend in the order that you should try them:
Colour, Size
and/or Size, Colour
. They also remove them problem.INTERSECT
/Queryable.Intersect
to combine the filters. This often results in different plan shapes.All of these are workarounds, not root cause fixes.
In the end I am not happy with both SQL Server and EF here. Both products should be fixed. Alas, they likely won't be and you can't wait for that either.
Here are the index scripts:
CREATE NONCLUSTERED INDEX IX_Widget_Colour_Size ON dbo.Widget
(
Colour, Size
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX IX_Widget_Size_Colour ON dbo.Widget
(
Size, Colour
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]