且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

通过SolrNet提供非英语语言支持

更新时间:2023-12-04 22:25:04

ASCIIFoldingFilterFactory 添加到要存储西班牙语的Solr字段中话。

举例来说,如果您使用Solr示例中定义的 text_general fieldType,其设置如下在schema.xml文件中:

 < fieldType name =text_generalclass =solr.TextFieldpositionIncrementGap =100 &GT; 
< analyzer type =index>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
<! - 在这个例子中,我们只会在查询时使用同义词
< filter class =solr.SynonymFilterFactorysynonyms同义词=index_synonyms.txtignoreCase =trueexpand =假/&GT;
- >
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
< filter class =solr.SynonymFilterFactorysynonyms同义词=同义词.txtignoreCase =trueexpand =true/>
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>

我建议修改它,如下所示将ASCIIFoldingFilterFactory添加到索引和查询分析器中。

 < fieldType name =text_generalclass =solr.TextFieldpositionIncrementGap =100> 
< analyzer type =index>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
<! - 在这个例子中,我们只会在查询时使用同义词
< filter class =solr.SynonymFilterFactorysynonyms同义词=index_synonyms.txtignoreCase =trueexpand =假/&GT;
- >
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.ASCIIFoldingFilterFactory/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
< filter class =solr.SynonymFilterFactorysynonyms同义词=同义词.txtignoreCase =trueexpand =true/>
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.ASCIIFoldingFilterFactory/>
< / analyzer>
< / fieldType>

另外,请注意,在将此架构更改为更改后,您需要重新索引数据体现在索引中。


I am using SolrNet to search over Solr from an .NET application. Everything works fine when I search over English words. However if I use spanish words like español, I get no search result though I have indexed them. When I debugged over Solr, I found that the query was parsed as espaA+ol.

Do I have to do some UTF-8 encoding or does SolrNet supports search over only ASCII characters?

This is not a SolrNet issue, it is related to how Solr handles characters that are not in the first 127 ASCII character set. The best recommendation is add the ASCIIFoldingFilterFactory to your Solr field where you are storing the Spanish words.

As an example, if you were using the text_general fieldType as defined in the Solr example which is setup as follows in the schema.xml file:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

I would recommend modifying it as follows adding the ASCIIFoldingFilterFactory to the index and query analyzers.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
  </analyzer>
</fieldType>

Also, please note that you will need to reindex your data after making this schema change for the changes to be reflected in the index.