You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pravin Karne <pr...@persistent.co.in> on 2009/10/08 12:47:54 UTC

how to post(index) large file of 5 GB or greater than this

Hi,
I am new to solr. I am able to index, search and update with small size(around 500mb)
But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
While investigation I found that post jar or post.sh load whole file in memory.

I use one work around with dividing small file in small files..and it's working

Is there any other way to post large file as above work around is not feasible for 1 TB file

Thanks
-Pravin


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: how to post(index) large file of 5 GB or greater than this

Posted by Yonik Seeley <yo...@lucidimagination.com>.
What is this huge file?  Solr XML? CSV?

Anyway, if it's a local file, you can get Solr to directly read/stream
it via stream.file
Examples in http://wiki.apache.org/solr/UpdateCSV
but it should work for any update format, not just CSV.

-Yonik
http://www.lucidimagination.com



On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
<pr...@persistent.co.in> wrote:
> Hi,
> I am new to solr. I am able to index, search and update with small size(around 500mb)
> But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
> While investigation I found that post jar or post.sh load whole file in memory.
>
> I use one work around with dividing small file in small files..and it's working
>
> Is there any other way to post large file as above work around is not feasible for 1 TB file
>
> Thanks
> -Pravin
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>

Re: how to post(index) large file of 5 GB or greater than this

Posted by Walter Underwood <wu...@wunderwood.org>.
Are you are indexing multiple documents? If so, split them into  
multiple files.
A single XML file with all documents is not a good idea. Solr is  
designed to
use batches for indexing.

It will be extremely hard to index a 1TB XML file. I would guess that  
would need
a JVM heap of well over 1TB.

wunder

On Oct 8, 2009, at 6:56 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> you can write a simple program which streams the file from the disk to
> post it to Solr
>
>
> On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li <el...@gmail.com>  
> wrote:
>> You can increase the java heap size, e.g. java -Xms128m -Xmx8192m - 
>> jar <*.xml>
>> Or i split the file if it is too big.
>>
>> Elaine
>>
>> On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
>> <pr...@persistent.co.in> wrote:
>>> Hi,
>>> I am new to solr. I am able to index, search and update with small  
>>> size(around 500mb)
>>> But if I try to index file with 5 to 10 or more that (500mb) it  
>>> gives memory heap exception.
>>> While investigation I found that post jar or post.sh load whole  
>>> file in memory.
>>>
>>> I use one work around with dividing small file in small files..and  
>>> it's working
>>>
>>> Is there any other way to post large file as above work around is  
>>> not feasible for 1 TB file
>>>
>>> Thanks
>>> -Pravin
>>>
>>>
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information  
>>> which is the property of Persistent Systems Ltd. It is intended  
>>> only for the use of the individual or entity to which it is  
>>> addressed. If you are not the intended recipient, you are not  
>>> authorized to read, retain, copy, print, distribute or use this  
>>> message. If you have received this communication in error, please  
>>> notify the sender and delete all copies of this message.  
>>> Persistent Systems Ltd. does not accept any liability for virus  
>>> infected mails.
>>>
>>
>
>
>
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: how to post(index) large file of 5 GB or greater than this

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
you can write a simple program which streams the file from the disk to
post it to Solr


On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li <el...@gmail.com> wrote:
> You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar <*.xml>
> Or i split the file if it is too big.
>
> Elaine
>
> On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
> <pr...@persistent.co.in> wrote:
>> Hi,
>> I am new to solr. I am able to index, search and update with small size(around 500mb)
>> But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
>> While investigation I found that post jar or post.sh load whole file in memory.
>>
>> I use one work around with dividing small file in small files..and it's working
>>
>> Is there any other way to post large file as above work around is not feasible for 1 TB file
>>
>> Thanks
>> -Pravin
>>
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: how to post(index) large file of 5 GB or greater than this

Posted by Elaine Li <el...@gmail.com>.
You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar <*.xml>
Or i split the file if it is too big.

Elaine

On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
<pr...@persistent.co.in> wrote:
> Hi,
> I am new to solr. I am able to index, search and update with small size(around 500mb)
> But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
> While investigation I found that post jar or post.sh load whole file in memory.
>
> I use one work around with dividing small file in small files..and it's working
>
> Is there any other way to post large file as above work around is not feasible for 1 TB file
>
> Thanks
> -Pravin
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>