更新时间:2023-02-11 14:00:18
重点是您使用的 ElasticSearch 正则表达式 需要完整字符串匹配:
The point is that the ElasticSearch regex you are using requires a full string match:
Lucene 的模式总是固定的.提供的模式必须匹配整个字符串.
因此,要匹配任何字符(除了换行符),您可以使用 .*
模式:
Thus, to match any character (but a newline), you can use .*
pattern:
match: { text: '.*google.*'}
^^ ^^
另一种变体适用于您的字符串可以有换行符的情况:match: { text: '(.|
)*google(.|
)*'}
.这种糟糕的 (.|
)*
在 ElasticSearch 中是必须的,因为这种正则表达式风格不允许任何 [sS]
变通方法,也不允许任何 DOTALL/Singleline 标志."Lucene 正则表达式引擎与 Perl 不兼容,但支持的运算符范围更小."
One more variation is for cases when your string can have newlines: match: { text: '(.|
)*google(.|
)*'}
. This awful (.|
)*
is a must in ElasticSearch because this regex flavor does not allow any [sS]
workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."
但是,如果您不打算匹配任何复杂的模式并且不需要词边界检查,那么使用纯粹的通配符搜索可以更好地执行对纯子字符串的正则表达式搜索:
However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:
{
"query": {
"wildcard": {
"text": {
"value": "*google*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
参见通配符搜索 了解更多详情.
See Wildcard search for more details.
注意:通配符模式也需要匹配整个输入字符串,因此
NOTE: The wildcard pattern also needs to match the whole input string, thus
google*
查找所有以开头的字符串 google
*google*
查找所有包含 google
*google
查找所有以结尾的字符串 google
google*
finds all strings starting with google
*google*
finds all strings containing google
*google
finds all strings ending with google
另外,请记住通配符模式中唯一的一对特殊字符:
Also, bear in mind the only pair of special characters in wildcard patterns:
?, which matches any single character
*, which can match zero or more characters, including an empty one