You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Munish Kumar Arora <mu...@gmail.com> on 2017/12/06 05:09:35 UTC

Does apache solr stores the file?

Hi guys,

Hope you all are well. I could not find any place to ask my question so I
am dropping you guys a mail. If you can help me, that would be great.

I am currently working on Apache Solr 7. There is a POC I need to complete
as I have less time so putting this question here. I have setup SOLR on my
windows machine. I have created core and uploaded a PDF document using
/update/extract from Admin UI. After uploading, I can see the metadata of
the file if I query from the Admin UI using query button. I was wondering
if I can get the actual content of the PDF as well. I can see there is one
tlog file gets generated under /data/tlog/tlog000... with raw PDF data but
not the actual file.

So the questions are,
1. Can I get the PDF content?
2. does Solr stores the actual file somewhere?
           a. If it stores then where it does?
            b. If it does not store then, is there a way to store THE FILE?


I hope, someone would answer my question.

Regards,
Munish Arora

Re: Does apache solr stores the file?

Posted by Charlie Hull <ch...@flax.co.uk>.

On 06/12/2017 10:10, Gora Mohanty wrote:
> On 6 December 2017 at 10:39, Munish Kumar Arora
> <mu...@gmail.com> wrote:
>>
>> So the questions are,
>> 1. Can I get the PDF content?
>> 2. does Solr stores the actual file somewhere?
>>             a. If it stores then where it does?
>>              b. If it does not store then, is there a way to store THE FILE?
> 
> Normal practice would be to store the PDF file somewhere on the file
> system where it can be served through a HTTP request. Then, store the
> filesystem path to the PDF file in Solr so that it can be returned in
> a Solr search request.
> 
> Regards,
> Gora
> 
Yes you *can* store the entire contents of an indexed file in Solr. No, 
you really, really shouldn't. Always make sure you can regenerate your 
index from the original sources if you need to - a search engine is not 
a database.

I'll just write that again: a search engine is not a database.

The method described above is the usual way to deal with this situation.

Best

Charlie
-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Does apache solr stores the file?

Posted by Gora Mohanty <go...@mimirtech.com>.

On 6 December 2017 at 10:39, Munish Kumar Arora
<mu...@gmail.com> wrote:
>
> So the questions are,
> 1. Can I get the PDF content?
> 2. does Solr stores the actual file somewhere?
>            a. If it stores then where it does?
>             b. If it does not store then, is there a way to store THE FILE?

Normal practice would be to store the PDF file somewhere on the file
system where it can be served through a HTTP request. Then, store the
filesystem path to the PDF file in Solr so that it can be returned in
a Solr search request.

Regards,
Gora