You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by an...@daimler.com on 2016/06/07 02:57:05 UTC

Using Solr to index zip files

Hi,

I have an use case where I need to search zip files quickly in HDFS. I intend to use Solr but not finding any relevant information about whether it can be done for zip files.
These are nested zip files i.e. zips within a zip file. Any help/information is much appreciated.

Thank you,
Regards,
Anupama


If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.


RE: Using Solr to index zip files

Posted by "BURN, James" <Ja...@oup.com>.
Hi
I think you'll need to do some unzipping of your zip files using an unzip application before you post to Solr. If you do this via a OS level batch script you can apply logic there to deal with nested zips. Then post your unzipped files to Solr via Curl.

James

-----Original Message-----
From: anupama.gangadhar@daimler.com [mailto:anupama.gangadhar@daimler.com] 
Sent: 07 June 2016 03:57
To: solr-user@lucene.apache.org
Subject: Using Solr to index zip files

Hi,

I have an use case where I need to search zip files quickly in HDFS. I intend to use Solr but not finding any relevant information about whether it can be done for zip files.
These are nested zip files i.e. zips within a zip file. Any help/information is much appreciated.

Thank you,
Regards,
Anupama


If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.

Oxford University Press (UK) Disclaimer

This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. OUP does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of OUP. If this email has come to you in error, please delete it, along with any attachments. Please note that OUP may intercept incoming and outgoing email communications.

RE: Using Solr to index zip files

Posted by an...@daimler.com.
Hi,

The nesting level is fixed. Outerzip has many inner zip files(i.e. 1.zip has many zip files).
Currently the outer zip path and inner zip name is stored in a Hive table for reference.
I use a Hive query to find the zip for me.

I intend to index the outer zip file and store all the inner zips as fields(search criteria) for this index.

Thank you,
Regards,
Anupama

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
Sent: Tuesday, June 07, 2016 7:44 PM
To: solr-user
Subject: Re: Using Solr to index zip files

I _think_ DataImportHandler could handle zip files with fixed level of nesting, but not read from HDFS.

I don't think anything else in Solr will. So, doing it outside of Solr is probably best. Especially, since you would need to decide how you actually want to map these files (e.g. do you keep the path for zip within zip, etc).

Regards,
    Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 7 June 2016 at 12:57,  <an...@daimler.com> wrote:
> Hi,
>
> I have an use case where I need to search zip files quickly in HDFS. I intend to use Solr but not finding any relevant information about whether it can be done for zip files.
> These are nested zip files i.e. zips within a zip file. Any help/information is much appreciated.
>
> Thank you,
> Regards,
> Anupama
>
>
> If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.
>

If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.


Re: Using Solr to index zip files

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I _think_ DataImportHandler could handle zip files with fixed level of
nesting, but not read from HDFS.

I don't think anything else in Solr will. So, doing it outside of Solr
is probably best. Especially, since you would need to decide how you
actually want to map these files (e.g. do you keep the path for zip
within zip, etc).

Regards,
    Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 7 June 2016 at 12:57,  <an...@daimler.com> wrote:
> Hi,
>
> I have an use case where I need to search zip files quickly in HDFS. I intend to use Solr but not finding any relevant information about whether it can be done for zip files.
> These are nested zip files i.e. zips within a zip file. Any help/information is much appreciated.
>
> Thank you,
> Regards,
> Anupama
>
>
> If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.
>