You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Kalle Aaltonen <ka...@zemanta.com> on 2013/10/08 08:43:44 UTC

SolrCloud shard splitting keeps failing

I have a test system where I have a index of 15M documents in one shard
that I would like to split in two. I've tried it four times now. I have a
stand-alone zookeeper running on the same machine.

The end result is that I have two new shards with state "construction", and
each has one replica which is down.

Two of the attempts failed because of heapspace. Now the heap size is 24GB.
I can't figure out from the logs what is going on.

I've attached a log of the latest attempt. Any help would be much
appreciated.

- Kalle Aaltonen

Re: SolrCloud shard splitting keeps failing

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

I was wrong in saying that we don't need to open a searcher, we do. I
committed a fix in SOLR-5314 to use soft commits instead of hard commits. I
also increased the read time out value. Both of these together will reduce
the likelyhood of such a thing happening.

https://issues.apache.org/jira/browse/SOLR-5314


On Tue, Oct 8, 2013 at 1:24 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Hi Kalle,
>
> The problem here is that certain actions are taking too long causing the
> split process to terminate in between. For example, a commit on the parent
> shard leader took 83 seconds in your case but the read timeout value is set
> to 60 seconds only. We actually do not need to open a searcher during this
> commit. I'll open an issue and attach a fix.
>
> Longer term we need to introduce asynchronous commands so that status can
> be reported in a better way.
>
>
> On Tue, Oct 8, 2013 at 12:13 PM, Kalle Aaltonen <
> kalle.aaltonen@zemanta.com> wrote:
>
>>
>> I have a test system where I have a index of 15M documents in one shard
>> that I would like to split in two. I've tried it four times now. I have a
>> stand-alone zookeeper running on the same machine.
>>
>> The end result is that I have two new shards with state "construction",
>> and each has one replica which is down.
>>
>> Two of the attempts failed because of heapspace. Now the heap size is
>> 24GB. I can't figure out from the logs what is going on.
>>
>> I've attached a log of the latest attempt. Any help would be much
>> appreciated.
>>
>> - Kalle Aaltonen
>>
>>
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: SolrCloud shard splitting keeps failing

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Hi Kalle,

The problem here is that certain actions are taking too long causing the
split process to terminate in between. For example, a commit on the parent
shard leader took 83 seconds in your case but the read timeout value is set
to 60 seconds only. We actually do not need to open a searcher during this
commit. I'll open an issue and attach a fix.

Longer term we need to introduce asynchronous commands so that status can
be reported in a better way.

On Tue, Oct 8, 2013 at 12:13 PM, Kalle Aaltonen
<ka...@zemanta.com>wrote:

>
> I have a test system where I have a index of 15M documents in one shard
> that I would like to split in two. I've tried it four times now. I have a
> stand-alone zookeeper running on the same machine.
>
> The end result is that I have two new shards with state "construction",
> and each has one replica which is down.
>
> Two of the attempts failed because of heapspace. Now the heap size is
> 24GB. I can't figure out from the logs what is going on.
>
> I've attached a log of the latest attempt. Any help would be much
> appreciated.
>
> - Kalle Aaltonen
>
>
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: SolrCloud shard splitting keeps failing

Posted by Harald Kirsch <Ha...@raytion.com>.

Hello Kalle,

we noticed the same problem some weeks ago:

http://lucene.472066.n3.nabble.com/Share-splitting-at-23-million-documents-gt-OOM-td4085064.html

Would be interesting to hear if there is more positive feedback this time.

We finally concluded that it may be worth to start with many shards 
right away. And as they grow, they can be distributed to other machines. 
This works, as we have tested (yet not in production).

Regards,
Harald.

On 08.10.2013 08:43, Kalle Aaltonen wrote:
>
> I have a test system where I have a index of 15M documents in one shard
> that I would like to split in two. I've tried it four times now. I have
> a stand-alone zookeeper running on the same machine.
>
> The end result is that I have two new shards with state "construction",
> and each has one replica which is down.
>
> Two of the attempts failed because of heapspace. Now the heap size is
> 24GB. I can't figure out from the logs what is going on.
>
> I've attached a log of the latest attempt. Any help would be much
> appreciated.
>
> - Kalle Aaltonen
>
>
>