You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pravin Karne <pr...@persistent.co.in> on 2009/10/08 12:47:54 UTC
how to post(index) large file of 5 GB or greater than this
Hi,
I am new to solr. I am able to index, search and update with small size(around 500mb)
But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
While investigation I found that post jar or post.sh load whole file in memory.
I use one work around with dividing small file in small files..and it's working
Is there any other way to post large file as above work around is not feasible for 1 TB file
Thanks
-Pravin
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: how to post(index) large file of 5 GB or greater than this
Posted by Yonik Seeley <yo...@lucidimagination.com>.
What is this huge file? Solr XML? CSV?
Anyway, if it's a local file, you can get Solr to directly read/stream
it via stream.file
Examples in http://wiki.apache.org/solr/UpdateCSV
but it should work for any update format, not just CSV.
-Yonik
http://www.lucidimagination.com
On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
<pr...@persistent.co.in> wrote:
> Hi,
> I am new to solr. I am able to index, search and update with small size(around 500mb)
> But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
> While investigation I found that post jar or post.sh load whole file in memory.
>
> I use one work around with dividing small file in small files..and it's working
>
> Is there any other way to post large file as above work around is not feasible for 1 TB file
>
> Thanks
> -Pravin
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
Re: how to post(index) large file of 5 GB or greater than this
Posted by Walter Underwood <wu...@wunderwood.org>.
Are you are indexing multiple documents? If so, split them into
multiple files.
A single XML file with all documents is not a good idea. Solr is
designed to
use batches for indexing.
It will be extremely hard to index a 1TB XML file. I would guess that
would need
a JVM heap of well over 1TB.
wunder
On Oct 8, 2009, at 6:56 AM, Noble Paul നോബിള്
नोब्ळ् wrote:
> you can write a simple program which streams the file from the disk to
> post it to Solr
>
>
> On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li <el...@gmail.com>
> wrote:
>> You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -
>> jar <*.xml>
>> Or i split the file if it is too big.
>>
>> Elaine
>>
>> On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
>> <pr...@persistent.co.in> wrote:
>>> Hi,
>>> I am new to solr. I am able to index, search and update with small
>>> size(around 500mb)
>>> But if I try to index file with 5 to 10 or more that (500mb) it
>>> gives memory heap exception.
>>> While investigation I found that post jar or post.sh load whole
>>> file in memory.
>>>
>>> I use one work around with dividing small file in small files..and
>>> it's working
>>>
>>> Is there any other way to post large file as above work around is
>>> not feasible for 1 TB file
>>>
>>> Thanks
>>> -Pravin
>>>
>>>
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information
>>> which is the property of Persistent Systems Ltd. It is intended
>>> only for the use of the individual or entity to which it is
>>> addressed. If you are not the intended recipient, you are not
>>> authorized to read, retain, copy, print, distribute or use this
>>> message. If you have received this communication in error, please
>>> notify the sender and delete all copies of this message.
>>> Persistent Systems Ltd. does not accept any liability for virus
>>> infected mails.
>>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
Re: how to post(index) large file of 5 GB or greater than this
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
you can write a simple program which streams the file from the disk to
post it to Solr
On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li <el...@gmail.com> wrote:
> You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar <*.xml>
> Or i split the file if it is too big.
>
> Elaine
>
> On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
> <pr...@persistent.co.in> wrote:
>> Hi,
>> I am new to solr. I am able to index, search and update with small size(around 500mb)
>> But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
>> While investigation I found that post jar or post.sh load whole file in memory.
>>
>> I use one work around with dividing small file in small files..and it's working
>>
>> Is there any other way to post large file as above work around is not feasible for 1 TB file
>>
>> Thanks
>> -Pravin
>>
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Re: how to post(index) large file of 5 GB or greater than this
Posted by Elaine Li <el...@gmail.com>.
You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar <*.xml>
Or i split the file if it is too big.
Elaine
On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
<pr...@persistent.co.in> wrote:
> Hi,
> I am new to solr. I am able to index, search and update with small size(around 500mb)
> But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception.
> While investigation I found that post jar or post.sh load whole file in memory.
>
> I use one work around with dividing small file in small files..and it's working
>
> Is there any other way to post large file as above work around is not feasible for 1 TB file
>
> Thanks
> -Pravin
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>