且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

SQL Server CONTAINSTABLE不适用于单个数字的数字

更新时间:2023-02-10 15:02:57

Finding the cause of the issue

I tried many things such as changing the query to the:

SELECT * FROM CONTAINSTABLE (Address, FullAddress, 'NEAR((1, YONGE), 5, TRUE)') 
-- or this
SELECT * FROM CONTAINSTABLE (Address, FullAddress, '1 YON*')

but without any luck.

After some searching online, I started thinking (since the issue only happens with single digits) that it may have something to do with Stopwords:

Stopwords. A stopword can be a word with meaning in a specific language. For example, in the English language, words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a search. A stopword can also be a token that does not have linguistic meaning.

Then with the help of this SO Answer, I was able to figure out how SQL Server was interpreting my search. Here is the query and the result of the query:

select * from sys.dm_fts_parser('"1" NEAR "Yon*"',2057, 0, 0)

Notice how the search term 1 is treated as Noise. This was the issue. Then running this query helped me find all the noise words and sure enough the numbers 0-9 were all there:

SELECT ssw.*, ssw.stopword, slg.name
      FROM sys.fulltext_system_stopwords ssw
      JOIN sys.fulltext_languages slg
      ON slg.lcid = ssw.language_id
      WHERE slg.lcid = 1033 -- English

Solution

One solution would be to remove the single digit numbers from the noise words. But I could not find how to do that. Actually, in my case that will not be idea anyhow since the users of my system will only be searching for addresses so if they type is or the, I do not want the system to treat it as noise since they may be searching for a street which starts with is.

I removed the stoplist altogether using the query below and now everything works as expected:

ALTER FULLTEXT INDEX ON [Address] SET STOPLIST = off

Hopefully this helps someone else.