更新时间:2023-02-26 13:44:55
几乎可以确定,斯坦福大学COreNLP目前没有语言标识. '几乎'-因为不存在很难证明.
Almost certainly there is no language identification in Stanford COreNLP at this moment. 'almost' - because nonexistence is much harder to prove.
不过,以下是间接证据:
Nevertheless, below are circumstantial evidences:
Language
类,但与语言识别无关-您可以
手动检查所有84个出现的语言"单词
Language
classes, but nothing related to language identification - you can
check manually for all 84 occurrence of 'language' word here尝试 TIKA 或 Java语言检测库(他们报告"53种语言的精度提高了99%").
Try TIKA, or TextCat, or Language Detection Library for Java (they report "99% over precision for 53 languages").
通常,质量取决于输入文本的大小:如果输入文本足够长(例如,至少几个单词并且没有特别选择),则精度可以很好-约为95%.
In general, quality depends on the size of input text: if it is long enough (say, at least several words and not specially chosen), then precision can be pretty good - about 95%.