且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

我如何在SOLR中索引文档?

更新时间:2023-01-29 15:52:12

查看 Solr wiki ,这是一个非常全面的文档。



特别参阅 ExtractingRequestHandler ,它允许您索引二进制文件,如Word和PDF文档。 下面是该主题的介绍

如果wiki对您来说不够用,还有一个关于Solr的优秀书


Im running Solr 1.4 on Ubuntu 10.04 (installed via apt-get solr-tomcat) and it seems to be working fine. Im having some difficulty finding any coherent info on how to index documents though. Im new to SOLR so bear with me! I have a folder (/mnt/folder) that is a mounted windows share, which contains Word and PDF files that I would like indexed, whats the easiest way to get SOLR to index the entire folder?

The documentation for SOLR is pretty poor, its impossilbe to find any decent tutorials on getting things done with it so any help is greatly appreciated!

S

Take a look at the Solr wiki, it's a pretty thorough documentation.

In particular see the ExtractingRequestHandler, which allows you to index binary files like Word and PDF documents. Here's an introduction to the topic.

If the wiki isn't enough for you, there's also a great book about Solr.