You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Augusto Camarotti <au...@prpb.mpf.gov.br> on 2012/02/06 17:09:51 UTC

Re: SolrCell maximum file size

Thanks for the tips Erick, i'm really talking about 2.5GB files full of data to be indexed. Like .csv files or .xls, .ods and so on. I guess I will try to do a great increase on the memory the JVM will be able to use. 
 
Regards,
 
Augusto

>>> Erick Erickson <er...@gmail.com> 1/27/2012 1:22 pm >>>
Hmmm, I'd go considerably higher than 2.5G. Problem is you the Tika
processing will need memory, I have no idea how much. Then you'll
have a bunch of stuff for Solr to index it etc.

But I also suspect that this will be about useless to index (assuming
you're talking lots of data, not say just the meta-data associated
with a video or something). How do you provide a meaningful snippet
of such a huge amount of data?

If it *is* say a video or whatever where almost all of the data won't
make it into the index anyway, you're probably better off using
tika directly on the client and only sending the bits to Solr that
you need in the form of a SolrInputDocument (I'm thinking that
you'll be doing this in SolrJ) rather than transmit 2.5G over the
network and throwing almost all of it away....

If the entire 2.5G is data to be indexed, you'll probably want to
consider breaking it up into smaller chunks in order to make it
useful.

Best
Erick

On Fri, Jan 27, 2012 at 3:43 AM, Augusto Camarotti
<au...@prpb.mpf.gov.br> wrote:
> I'm talking about 2 GB files. It means that I'll have to allocate something bigger than that for the JVM? Something like 2,5 GB?
>
> Thanks,
>
> Augusto Camarotti
>
>>>> Erick Erickson <er...@gmail.com> 1/25/2012 1:48 pm >>>
> Mostly it depends on your container settings, quite often that's
> where the limits are. I don't think Solr imposes any restrictions.
>
> What size are we talking about anyway? There are implicit
> issues with how much memory parsing the file requires, but you
> can allocate lots of memory to the JVM to handle that.
>
> Best
> Erick
>
> On Tue, Jan 24, 2012 at 10:24 AM, Augusto Camarotti
> <au...@prpb.mpf.gov.br> wrote:
>> Hi everybody
>>
>> Does anyone knows if there is a maximum file size that can be uploaded to the extractingrequesthandler via http request?
>>
>> Thanks in advance,
>>
>> Augusto Camarotti