You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/11/02 14:45:33 UTC

SolrCloud indexing blocks if node is recovering

Hi,

We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was unsuccessful. I restarted the node again but that wasn't very helpful either. Finally i decided to stop the node completely and see what happens - indexing resumed.

Why or how won't the other nodes accept incoming documents when one node behaves really bad? The dying node wasn't the node we were sending documents to and we are not using CloudSolrServer yet (see other thread). Is this known behavior? Is it a bug? 

Thanks,
Markus

RE: SolrCloud indexing blocks if node is recovering

Posted by Markus Jelsma <ma...@openindex.io>.

https://issues.apache.org/jira/browse/SOLR-4038
Still trying to gather the logs
 
 
-----Original message-----
> From:Mark Miller <ma...@gmail.com>
> Sent: Sat 03-Nov-2012 14:17
> To: Markus Jelsma <ma...@openindex.io>
> Cc: solr-user@lucene.apache.org
> Subject: Re: SolrCloud indexing blocks if node is recovering
> 
> The OOM machine and any surrounding if possible (eg especially the leader of the shard).
> 
> Not sure what I'm looking for yet, so the more info the better.
> 
> - Mark
> 
> On Nov 3, 2012, at 5:23 AM, Markus Jelsma <ma...@openindex.io> wrote:
> 
> > Hi - yes, i should be able to make sense out of them next monday. I assume you're not too interested in the OOM machine but all surrounding nodes that blocked instead? 
> > 
> > 
> > 
> > -----Original message-----
> >> From:Mark Miller <ma...@gmail.com>
> >> Sent: Sat 03-Nov-2012 03:14
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: SolrCloud indexing blocks if node is recovering
> >> 
> >> Doesn't sound right. Still have the logs?
> >> 
> >> - Mark
> >> 
> >> On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma
> >> <ma...@openindex.io> wrote:
> >>> Hi,
> >>> 
> >>> We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was unsuccessful. I restarted the node again but that wasn't very helpful either. Finally i decided to stop the node completely and see what happens - indexing resumed.
> >>> 
> >>> Why or how won't the other nodes accept incoming documents when one node behaves really bad? The dying node wasn't the node we were sending documents to and we are not using CloudSolrServer yet (see other thread). Is this known behavior? Is it a bug?
> >>> 
> >>> Thanks,
> >>> Markus
> >> 
> >> 
> >> 
> >> -- 
> >> - Mark
> >> 
> 
>

Re: SolrCloud indexing blocks if node is recovering

Posted by Mark Miller <ma...@gmail.com>.

The OOM machine and any surrounding if possible (eg especially the leader of the shard).

Not sure what I'm looking for yet, so the more info the better.

- Mark

On Nov 3, 2012, at 5:23 AM, Markus Jelsma <ma...@openindex.io> wrote:

> Hi - yes, i should be able to make sense out of them next monday. I assume you're not too interested in the OOM machine but all surrounding nodes that blocked instead? 
> 
> 
> 
> -----Original message-----
>> From:Mark Miller <ma...@gmail.com>
>> Sent: Sat 03-Nov-2012 03:14
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud indexing blocks if node is recovering
>> 
>> Doesn't sound right. Still have the logs?
>> 
>> - Mark
>> 
>> On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma
>> <ma...@openindex.io> wrote:
>>> Hi,
>>> 
>>> We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was unsuccessful. I restarted the node again but that wasn't very helpful either. Finally i decided to stop the node completely and see what happens - indexing resumed.
>>> 
>>> Why or how won't the other nodes accept incoming documents when one node behaves really bad? The dying node wasn't the node we were sending documents to and we are not using CloudSolrServer yet (see other thread). Is this known behavior? Is it a bug?
>>> 
>>> Thanks,
>>> Markus
>> 
>> 
>> 
>> -- 
>> - Mark
>>

RE: SolrCloud indexing blocks if node is recovering

Posted by Markus Jelsma <ma...@openindex.io>.

Hi - yes, i should be able to make sense out of them next monday. I assume you're not too interested in the OOM machine but all surrounding nodes that blocked instead? 
 


-----Original message-----
> From:Mark Miller <ma...@gmail.com>
> Sent: Sat 03-Nov-2012 03:14
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud indexing blocks if node is recovering
> 
> Doesn't sound right. Still have the logs?
> 
> - Mark
> 
> On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma
> <ma...@openindex.io> wrote:
> > Hi,
> >
> > We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was unsuccessful. I restarted the node again but that wasn't very helpful either. Finally i decided to stop the node completely and see what happens - indexing resumed.
> >
> > Why or how won't the other nodes accept incoming documents when one node behaves really bad? The dying node wasn't the node we were sending documents to and we are not using CloudSolrServer yet (see other thread). Is this known behavior? Is it a bug?
> >
> > Thanks,
> > Markus
> 
> 
> 
> -- 
> - Mark
>

Re: SolrCloud indexing blocks if node is recovering

Posted by Mark Miller <ma...@gmail.com>.

Doesn't sound right. Still have the logs?

- Mark

On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was unsuccessful. I restarted the node again but that wasn't very helpful either. Finally i decided to stop the node completely and see what happens - indexing resumed.
>
> Why or how won't the other nodes accept incoming documents when one node behaves really bad? The dying node wasn't the node we were sending documents to and we are not using CloudSolrServer yet (see other thread). Is this known behavior? Is it a bug?
>
> Thanks,
> Markus



-- 
- Mark