且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

查找字典单词

更新时间:2022-11-13 12:21:25

我不知道你有多少时间或频率要做到这一点(每日它是一个一次性操作?周?),但你显然会希望有一个快速,加权字典查询。

I'm not sure how much time or frequency you have to do this (is it a one-time operation? daily? weekly?) but you're obviously going to want a quick, weighted dictionary lookup.

您还需要有一个解决冲突的机制,也​​许侧面队列手动解决冲突上有多个可能的含义元组。

You'll also want to have a conflict resolution mechanism, perhaps a side-queue to manually resolve conflicts on tuples that have multiple possible meanings.

我会去了解一下尝试。使用一个你可以高效地找到(重量)的prefixes,这是$ P $你将寻找pcisely什么。

I would look into Tries. Using one you can efficiently find (and weight) your prefixes, which are precisely what you will be looking for.

您将不得不从一本好字典源码构建自己的尝试次数和重量上满语中的节点提供自己一个良好的质量的机制,以供参考。

You'll have to build the Tries yourself from a good dictionary source, and weight the nodes on full words to provide yourself a good quality mechanism for reference.

只是集思广益这里,但如果你知道你的数据集主要由小芯片或三胞胎,你很可能逃脱多个特里查找,比如查找'穗',然后'ejet,然后发现两个结果有一个低分数,弃入'香料和喷气,其中两个尝试次数将产生两者之间的良好的综合结果。

Just brainstorming here, but if you know your dataset consists primarily of duplets or triplets, you could probably get away with multiple Trie lookups, for example looking up 'Spic' and then 'ejet' and then finding that both results have a low score, abandon into 'Spice' and 'Jet', where both Tries would yield a good combined result between the two.

此外,我会考虑使用频率分析最常见的prefixes长达一个武断的或动态的限制,例如:过滤'的'或'在'联合国'或与相应的权重。

Also I would consider utilizing frequency analysis on the most common prefixes up to an arbitrary or dynamic limit, e.g. filtering 'the' or 'un' or 'in' and weighting those accordingly.

听起来像一个有趣的问题,祝你好运!

Sounds like a fun problem, good luck!