且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

MySQL REGEXP查询-重音不敏感搜索

更新时间:2022-05-15 23:11:40

您不走运:

警告

REGEXP和RLIKE运算符以字节方式工作,因此它们是 不是多字节安全的,使用多字节可能会产生意外的结果 字符集.此外,这些运算符还通过 它们的字节值和重音字符的比较结果可能不相等 即使给定的归类将它们视为相等.

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.

[[:<:]][[:>:]]正则表达式运算符是单词边界的标记.使用LIKE运算符可以达到的最接近的结果是在这行上:

The [[:<:]] and [[:>:]] regexp operators are markers for word boundaries. The closest you can achieve with the LIKE operator is something on this line:

SELECT *
FROM `table`
WHERE wine_name = 'Faugères'
   OR wine_name LIKE 'Faugères %'
   OR wine_name LIKE '% Faugères'

如您所见,它并不完全等效,因为我将单词边界的概念限制为空格.为其他边界添加更多子句将是一团糟.

As you can see it's not fully equivalent because I've restricted the concept of word boundary to spaces. Adding more clauses for other boundaries would be a mess.

您也可以使用全文本搜索(尽管不一样),但是您无法在InnoDB表中定义全文本索引(至今).

You could also use full text searches (although it isn't the same) but you can't define full text indexes in InnoDB tables (yet).

您当然不走运:)

附录:此已更改从MySQL 8.0开始:

Addendum: this has changed as of MySQL 8.0:

MySQL使用Unicode国际组件(ICU)实现了正则表达式支持,该组件提供了完整的Unicode支持并且是多字节安全的. (在MySQL 8.0.4之前,MySQL使用Henry Spencer的正则表达式实现,该实现以字节方式运行,并且不是多字节安全的.

MySQL implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. (Prior to MySQL 8.0.4, MySQL used Henry Spencer's implementation of regular expressions, which operates in byte-wise fashion and is not multibyte safe.