You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by alessio crisantemi <al...@gmail.com> on 2012/02/05 13:02:48 UTC

indexing data on solr

dear all,
I indexing data on Solr following the SolrCell tutorial *(
http://wiki.apache.org/solr/ExtractingRequestHandler)*
**
I use the 'curl' command for each file.
I read that that's not the better solutions, but on tutorial I don't see an
another mode...

So: can index a complete directory with any pdf files inside?
wich is the method?

(if I index a complete pdf 'zip' archive I can find with solr onl the title
of the pdf files and I can't read the pdf contents)
thank U.
alessio

Re: indexing data on solr

Posted by alessio crisantemi <al...@gmail.com>.
ok, I try.
but I think:

If I Index a zip archive containing any pdf files and after, i search on
solr a query, I see only the list of the pdf title into my archive, but it
can't search into the single document..

I read on Tika documentation that "Package formats can contain multiple
separate documents inside one file. In such a case the Tika extracted
content will include content from all of the included documents".

why, in your opinion?
best
a.

2012/2/5 O. Klein <kl...@octoweb.nl>

> Read http://wiki.apache.org/solr/DataImportHandler for better method. The
> FileListEntityProcessor is what you are looking for.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/indexing-data-on-solr-tp3717111p3717208.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: indexing data on solr

Posted by "O. Klein" <kl...@octoweb.nl>.
Read http://wiki.apache.org/solr/DataImportHandler for better method. The
FileListEntityProcessor is what you are looking for.

--
View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-on-solr-tp3717111p3717208.html
Sent from the Solr - User mailing list archive at Nabble.com.