You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Harald Kirsch <Ha...@raytion.com> on 2013/08/16 16:09:17 UTC

Share splitting at 23 million documents -> OOM

Hi all.

Using the example setup of solr-4.4.0, I was able to easily feed 23 
million documents from ClueWeb09.

The I tried to split the one shard into tqo. The size on disk is:

% du -sh collection1
118G    collection1

I started Solr with 8GB for the JVM:

java -Xmx8000m -DzkRun -DnumShards=2 
-Dbootstrap_confdir=./solr/collection1/conf 
-Dcollection.configName=myconf -jar start.jar

Then I asked for the split

http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1

After a while I got the OOM in the logs:

841168 [qtp614872954-17] ERROR 
org.apache.solr.servlet.SolrDispatchFilter  – 
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

My question: is it to be expected that the split needs huge amounts of 
RAM or is there a chance that some configuration or procedure change 
could get me past this?

Regards,
Harald.
-- 
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com

Re: Share splitting at 23 million documents -> OOM

Posted by Bastian Mathes <ba...@raytion.com>.
Hi Greg,

I am a colleague of Harald and had a look at his experiments last week.

You are right, unpacking a fresh Solr 4.4, feeding a small number of
documents (in my case 144) and trying to split the shard is not working.
I get the same error message ("maxValue must be non-negative") that was
discussed at Aug-13th on this list. The result of this discussion seems
to have been that there is no workaround until 4.5.

I then downloaded the 4.5 nightly build and tried the same, but get a
NullPointerException (OverseerCollectionProcessor.java:494), as it is
not a release but just a nightly build I guess that may happen. However
I think we have to conclude that there is a useful feature growing (and
it may already work under certain circumstances), but it is not ready to
use yet (maybe in 4.5/4.6, maybe 5.x).

Best regards,

Bastian



On 08/16/2013 06:31 PM, Greg Preston wrote:
> Have you tried it with a smaller number of documents?  I haven't been able
> to successfully split a shard with 4.4.0 with even a handful of docs.
> 
> 
> -Greg
> 
> 
> On Fri, Aug 16, 2013 at 7:09 AM, Harald Kirsch <Ha...@raytion.com>wrote:
> 
>> Hi all.
>>
>> Using the example setup of solr-4.4.0, I was able to easily feed 23
>> million documents from ClueWeb09.
>>
>> The I tried to split the one shard into tqo. The size on disk is:
>>
>> % du -sh collection1
>> 118G    collection1
>>
>> I started Solr with 8GB for the JVM:
>>
>> java -Xmx8000m -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/**collection1/conf
>> -Dcollection.configName=myconf -jar start.jar
>>
>> Then I asked for the split
>>
>> http://localhost:8983/solr/**admin/collections?action=**
>> SPLITSHARD&collection=**collection1&shard=shard1<http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1>
>>
>> After a while I got the OOM in the logs:
>>
>> 841168 [qtp614872954-17] ERROR org.apache.solr.servlet.**SolrDispatchFilter
>>  – null:java.lang.**RuntimeException: java.lang.OutOfMemoryError: Java
>> heap space
>>
>> My question: is it to be expected that the split needs huge amounts of RAM
>> or is there a chance that some configuration or procedure change could get
>> me past this?
>>
>> Regards,
>> Harald.
>> --
>> Harald Kirsch
>> Raytion GmbH
>> Kaiser-Friedrich-Ring 74
>> 40547 Duesseldorf
>> Fon +49-211-550266-0
>> Fax +49-211-550266-19
>> http://www.raytion.com
>>
> 

-- 
Bastian Mathes
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com

Re: Share splitting at 23 million documents -> OOM

Posted by Greg Preston <gp...@marinsoftware.com>.
Have you tried it with a smaller number of documents?  I haven't been able
to successfully split a shard with 4.4.0 with even a handful of docs.


-Greg


On Fri, Aug 16, 2013 at 7:09 AM, Harald Kirsch <Ha...@raytion.com>wrote:

> Hi all.
>
> Using the example setup of solr-4.4.0, I was able to easily feed 23
> million documents from ClueWeb09.
>
> The I tried to split the one shard into tqo. The size on disk is:
>
> % du -sh collection1
> 118G    collection1
>
> I started Solr with 8GB for the JVM:
>
> java -Xmx8000m -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/**collection1/conf
> -Dcollection.configName=myconf -jar start.jar
>
> Then I asked for the split
>
> http://localhost:8983/solr/**admin/collections?action=**
> SPLITSHARD&collection=**collection1&shard=shard1<http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1>
>
> After a while I got the OOM in the logs:
>
> 841168 [qtp614872954-17] ERROR org.apache.solr.servlet.**SolrDispatchFilter
>  – null:java.lang.**RuntimeException: java.lang.OutOfMemoryError: Java
> heap space
>
> My question: is it to be expected that the split needs huge amounts of RAM
> or is there a chance that some configuration or procedure change could get
> me past this?
>
> Regards,
> Harald.
> --
> Harald Kirsch
> Raytion GmbH
> Kaiser-Friedrich-Ring 74
> 40547 Duesseldorf
> Fon +49-211-550266-0
> Fax +49-211-550266-19
> http://www.raytion.com
>