且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

用于索引和搜索的Lucene分析器

更新时间:2023-02-26 12:31:39

您的查询解析器使用标准分析器,因此您的查询将使用标准分析器。只需切换到使用关键字分析器:

pre $ MultiFieldQueryParser parser = new MultiFieldQueryParser新的KeywordAnalyzer(Version.LUCENE_30));

您可能需要使用 PerFieldAnalyzerWrapper 如果您的其他字段不是关键字。


I have a field that I am indexing with Lucene like so:

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer, the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not".

If I change the index to index=Index.UN_TOKENIZED, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY, as expected.

My search API has 1 "search" method that constructs the Query like so:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")

My problem is with the latter scenario. Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry.

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time.

Any advice would be appreciated. Thanks!

You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. Just switch to using a keyword analyzer:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords.