且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

建立指数时提升Lucene条款

更新时间:2023-11-10 15:33:46

对某个字段设置提升会影响其中的所有条款因此,这不适用于你的ca. se。



但它应该是可用的Lucene有效载荷(可以为每个术语设置的字节数组)。您可以使用它们来设置术语特定的提升(例如,对于文档1,车辆为0.5)。然后你将实现自己的 Similarity 并覆盖 scorePayload()方法来解码该提升,然后使用 PayloadTermQuery ,它允许您根据该术语的有效负载中的引导为分数做出贡献。


Is it possible to determine that specific terms are more important then other when creating the index (not when querying it) ?

Consider for example a synonym filter:
doc 1: "this is a nice car"
doc 2: "this is a nice vehicle"

I want to add the term vehicle to the first doc and the term car to the second doc, but I want that if later the index is queried with the word car then the first document will be scored higher then the second one and if queried for vehicle it will be the other way around.

Will calling setBoost on the fields before adding them to their respective documents do the trick?

Or maybe I should add the synonyms to a different field name?

Or am I looking at this from a wrong point of view ?

Thanks

Setting boost on a filed affects all terms in that field so this wouldn't work in your case.

But it should be posible using Lucene payloads (a byte array that can be set for every term). You would use them to set term specific boosts (vehicle to 0.5 for doc 1, for example). Then you'll implement your own Similarity and override scorePayload() method to decode that boost and then use PayloadTermQuery which allows you to contribute to the score based on the boots you have in the payload for that term.