You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sean Gilhuly <gi...@gmail.com> on 2017/12/13 13:36:35 UTC

Recursive archive indexing using solr cell

Hello,

I have been successfully able to index archive files (zip, tar, and the
like) using solr cell, but the archive is returned as a single document
when I do queries. Is there a way to configure it so that files are
extracted recursively, and indexed separately?

I know that if I set the extractOnly flag to true, I can parse the returned
xml and send it back. Is there an easier way?

Regards,
Sean

Re: Recursive archive indexing using solr cell

Posted by Erick Erickson <er...@gmail.com>.
You can do this with DIH, but that has some problems. I'd strongly
recommend you think about using Tika in an independent client code.
Here's a program that gets you started:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Dec 13, 2017 at 5:36 AM, Sean Gilhuly <gi...@gmail.com> wrote:
> Hello,
>
> I have been successfully able to index archive files (zip, tar, and the
> like) using solr cell, but the archive is returned as a single document
> when I do queries. Is there a way to configure it so that files are
> extracted recursively, and indexed separately?
>
> I know that if I set the extractOnly flag to true, I can parse the returned
> xml and send it back. Is there an easier way?
>
> Regards,
> Sean