You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Olivier <ol...@gmail.com> on 2015/11/19 21:43:07 UTC

Large multivalued field and overseer problem

Hi,

We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per node).
We have 3 shards per node and the replication factor is 3. We host 3
collections, the biggest is about 40K documents only.
The most important thing is a multivalued field with about 200K to 300K
values per document (each value is a kind of reference product of type
String).
We have some very big issues with our SolrCloud cluster. It crashes
entirely very frequently at the indexation time. It starts with an overseer
issue :

Session expired de l’overseer : KeeperErrorCode = Session expired for
/overseer_elect/leader

Then an another node is elected overseer. But the recovery phase seems to
failed indefinitely. It seems that the communication between the overseer
and ZK is impossible.
And after a short period of time, all the cluster is unavailable (out of
memory JVM error). And we have to restart it.

So I wanted to know if we can continue to use huge multivalued field with
SolrCloud.
We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5,
with an overseer per collection it can fix our issues ?
Or do we have to rethink the schema to avoid this very large multivalued
field ?

Thanks,
Best,

Olivier

Re: Large multivalued field and overseer problem

Posted by Erick Erickson <er...@gmail.com>.
In addition to Anshum's excellent points:

bq: And after a short period of time, all the cluster is unavailable (out of
memory JVM error).

This is where I'd focus my efforts. I suspect your memory-bound and are
actually seeing OOM errors about the time this problem manifests itself. Or
you're getting long GC pauses that make Zookeeper think the Solr instance is
gone.

I'd turn on GC logging and analyze that as a first step.

Best,
Erick

On Thu, Nov 19, 2015 at 1:19 PM, Anshum Gupta <an...@anshumgupta.net> wrote:
> Hi Olivier,
>
> A few things that you should know:
> 1. The Overseer is at a per cluster level and not at a per-collection level.
> 2. Also, documents/fields/etc. should have zero impact on the Overseer
> itself.
>
> So, while the upgrade to a more recent Solr version comes with a lot of
> good stuff, the cluster state or the Overseer are not what you should be
> looking at. Also, failing recovery also has nothing to do with the Overseer.
>
> Now, the problem that might help people here to help you better.
>
> Can you tell something about your zookeeper ? version, #nodes ?
>
> Also, is the network between the Solr nodes and zk fine ?
>
> You mention that you're seeing this issue while indexing. How are you
> indexing (CloudSolrClient ? ) and what are your indexing settings
> (auto-commit etc.).
>
> Most importantly, what is the heap size of the Solr processes?
>
>
> On Thu, Nov 19, 2015 at 12:43 PM, Olivier <ol...@gmail.com> wrote:
>
>> Hi,
>>
>> We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per
>> node).
>> We have 3 shards per node and the replication factor is 3. We host 3
>> collections, the biggest is about 40K documents only.
>> The most important thing is a multivalued field with about 200K to 300K
>> values per document (each value is a kind of reference product of type
>> String).
>> We have some very big issues with our SolrCloud cluster. It crashes
>> entirely very frequently at the indexation time. It starts with an overseer
>> issue :
>>
>> Session expired de l’overseer : KeeperErrorCode = Session expired for
>> /overseer_elect/leader
>>
>> Then an another node is elected overseer. But the recovery phase seems to
>> failed indefinitely. It seems that the communication between the overseer
>> and ZK is impossible.
>> And after a short period of time, all the cluster is unavailable (out of
>> memory JVM error). And we have to restart it.
>>
>> So I wanted to know if we can continue to use huge multivalued field with
>> SolrCloud.
>> We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5,
>> with an overseer per collection it can fix our issues ?
>> Or do we have to rethink the schema to avoid this very large multivalued
>> field ?
>>
>> Thanks,
>> Best,
>>
>> Olivier
>>
>
>
>
> --
> Anshum Gupta

Re: Large multivalued field and overseer problem

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Olivier,

A few things that you should know:
1. The Overseer is at a per cluster level and not at a per-collection level.
2. Also, documents/fields/etc. should have zero impact on the Overseer
itself.

So, while the upgrade to a more recent Solr version comes with a lot of
good stuff, the cluster state or the Overseer are not what you should be
looking at. Also, failing recovery also has nothing to do with the Overseer.

Now, the problem that might help people here to help you better.

Can you tell something about your zookeeper ? version, #nodes ?

Also, is the network between the Solr nodes and zk fine ?

You mention that you're seeing this issue while indexing. How are you
indexing (CloudSolrClient ? ) and what are your indexing settings
(auto-commit etc.).

Most importantly, what is the heap size of the Solr processes?


On Thu, Nov 19, 2015 at 12:43 PM, Olivier <ol...@gmail.com> wrote:

> Hi,
>
> We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per
> node).
> We have 3 shards per node and the replication factor is 3. We host 3
> collections, the biggest is about 40K documents only.
> The most important thing is a multivalued field with about 200K to 300K
> values per document (each value is a kind of reference product of type
> String).
> We have some very big issues with our SolrCloud cluster. It crashes
> entirely very frequently at the indexation time. It starts with an overseer
> issue :
>
> Session expired de l’overseer : KeeperErrorCode = Session expired for
> /overseer_elect/leader
>
> Then an another node is elected overseer. But the recovery phase seems to
> failed indefinitely. It seems that the communication between the overseer
> and ZK is impossible.
> And after a short period of time, all the cluster is unavailable (out of
> memory JVM error). And we have to restart it.
>
> So I wanted to know if we can continue to use huge multivalued field with
> SolrCloud.
> We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5,
> with an overseer per collection it can fix our issues ?
> Or do we have to rethink the schema to avoid this very large multivalued
> field ?
>
> Thanks,
> Best,
>
> Olivier
>



-- 
Anshum Gupta