You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mads Tomasgård Bjørgan <mt...@dips.no> on 2016/07/05 07:45:55 UTC

Memory issues when indexing

Hello,
We're struggling with memory-issues when posting documents to Solr - and unsure for which reason the problem occurs.

The documents are indexed in a SolrCloud running Solr 6.1.0 on top of Zookeeper 3.4.8, utilizing three VMs running CentOS 7 and JRE 1.8.0.

After various attempts with different configurations the heap always got full on one, and only one, of the machines (let's call this machine 1) - and in the end yielding the following exception:
(....) o.a.s.s.HttpSolrCall null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: Async exception during distributed update: Cannot assign requested address
The remaining two machines always has a lot of free memory compared with machine 1.

Thus, we decided to only index a small fraction of the documents to see whether the exception was due to memory limitations or not. We stopped the indexation when the memory of machine 1 reached 2,5GB of a total of 4GB. As seen on the picture from JConsole was machine 2 only using 1,4GB of the available memory at the same time (same goes for machine 3). The indexation stopped - and both machine 2 and 3 had most of their memory emptied when performing a Garbage Collection. However - machine 1 was unaffected, and very little memory was freed which means Solr still used around 2,5GB of the memory. I would assume the memory of machine 1 would be emptied in a similar manner as with machine 2 and 3 as the indexation was stopped. Most of the memory belonged to the memory pool of "CMS Old Gen"  (well above 2GB).

Indexing until the memory is full for machine 1 gives a count of 50 000 in "File Descriptor Count" - while the number of files in the index folder is around 150 for each node. I was told that the number of files in the index folder and the file descriptor count should be matching? Machine 1 has an enormous amount of TCP-connections stalling at CLOSE_WAIT - while machine 2 and 3 doesn't have their respective FIN_WAITs even tough machine 1 has almost all of his TCP-connections pointing at those machines.

[cid:image001.png@01D1D6A1.C6B71120][cid:image002.png@01D1D6A1.C6B71120]
JConsole pictures for machine 1 and 2, respectively. At 08:45 did we resume indexation - the same exception as shown above was given around 08:52. Machine 2 cleans most of the memory at GC - in contrast to machine 1.


We have no idea whether this is a bug or fault in the configuration - and was hoping someone could provide aid to our problem.

Greetings,
Mads

RE: Memory issues when indexing

Posted by Mads Tomasgård Bjørgan <mt...@dips.no>.
Another update:

After creating a new certificate, properly specified for its use of context, do we still end up in the described situation. Thus, it seems SSL itself is the underlying reason for the leak - 

-----Original Message-----
From: Mads Tomasgård Bjørgan [mailto:mtb@dips.no] 
Sent: tirsdag 5. juli 2016 10.36
To: solr-user@lucene.apache.org
Subject: RE: Memory issues when indexing

Hi again,
We turned off SSL - and now everything works as normal.

The certificate is not originally meant for being used on the current servers- but we would like to keep it as the certificate has been deployed already and used by our customers. Thus we need to launch the cloud with "-Dsolr.ssl.checkPeerName=false" - but it seems quite obvious that the nodes still can't communicate properly.

Our last resort is to replace the certificate - so the questions is now whether it is possible to tweak the configuration so that we can deploy the configuration so that we can deploy a SolrCloud with the same certificate.

Thanks,
Mads

From: Mads Tomasgård Bjørgan [mailto:mtb@dips.no]
Sent: tirsdag 5. juli 2016 09.46
To: solr-user@lucene.apache.org
Subject: Memory issues when indexing

Hello,
We're struggling with memory-issues when posting documents to Solr - and unsure for which reason the problem occurs.

The documents are indexed in a SolrCloud running Solr 6.1.0 on top of Zookeeper 3.4.8, utilizing three VMs running CentOS 7 and JRE 1.8.0.

After various attempts with different configurations the heap always got full on one, and only one, of the machines (let's call this machine 1) - and in the end yielding the following exception:
(....) o.a.s.s.HttpSolrCall null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: Async exception during distributed update: Cannot assign requested address The remaining two machines always has a lot of free memory compared with machine 1.

Thus, we decided to only index a small fraction of the documents to see whether the exception was due to memory limitations or not. We stopped the indexation when the memory of machine 1 reached 2,5GB of a total of 4GB. As seen on the picture from JConsole was machine 2 only using 1,4GB of the available memory at the same time (same goes for machine 3). The indexation stopped - and both machine 2 and 3 had most of their memory emptied when performing a Garbage Collection. However - machine 1 was unaffected, and very little memory was freed which means Solr still used around 2,5GB of the memory. I would assume the memory of machine 1 would be emptied in a similar manner as with machine 2 and 3 as the indexation was stopped. Most of the memory belonged to the memory pool of "CMS Old Gen"  (well above 2GB).

Indexing until the memory is full for machine 1 gives a count of 50 000 in "File Descriptor Count" - while the number of files in the index folder is around 150 for each node. I was told that the number of files in the index folder and the file descriptor count should be matching? Machine 1 has an enormous amount of TCP-connections stalling at CLOSE_WAIT - while machine 2 and 3 doesn't have their respective FIN_WAITs even tough machine 1 has almost all of his TCP-connections pointing at those machines.

[cid:image001.png@01D1D6A1.C6B71120][cid:image002.png@01D1D6A1.C6B71120]
JConsole pictures for machine 1 and 2, respectively. At 08:45 did we resume indexation - the same exception as shown above was given around 08:52. Machine 2 cleans most of the memory at GC - in contrast to machine 1.


We have no idea whether this is a bug or fault in the configuration - and was hoping someone could provide aid to our problem.

Greetings,
Mads

RE: Memory issues when indexing

Posted by Mads Tomasgård Bjørgan <mt...@dips.no>.
Hi again,
We turned off SSL - and now everything works as normal.

The certificate is not originally meant for being used on the current servers- but we would like to keep it as the certificate has been deployed already and used by our customers. Thus we need to launch the cloud with "-Dsolr.ssl.checkPeerName=false" - but it seems quite obvious that the nodes still can't communicate properly.

Our last resort is to replace the certificate - so the questions is now whether it is possible to tweak the configuration so that we can deploy the configuration so that we can deploy a SolrCloud with the same certificate.

Thanks,
Mads

From: Mads Tomasgård Bjørgan [mailto:mtb@dips.no]
Sent: tirsdag 5. juli 2016 09.46
To: solr-user@lucene.apache.org
Subject: Memory issues when indexing

Hello,
We're struggling with memory-issues when posting documents to Solr - and unsure for which reason the problem occurs.

The documents are indexed in a SolrCloud running Solr 6.1.0 on top of Zookeeper 3.4.8, utilizing three VMs running CentOS 7 and JRE 1.8.0.

After various attempts with different configurations the heap always got full on one, and only one, of the machines (let's call this machine 1) - and in the end yielding the following exception:
(....) o.a.s.s.HttpSolrCall null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: Async exception during distributed update: Cannot assign requested address
The remaining two machines always has a lot of free memory compared with machine 1.

Thus, we decided to only index a small fraction of the documents to see whether the exception was due to memory limitations or not. We stopped the indexation when the memory of machine 1 reached 2,5GB of a total of 4GB. As seen on the picture from JConsole was machine 2 only using 1,4GB of the available memory at the same time (same goes for machine 3). The indexation stopped - and both machine 2 and 3 had most of their memory emptied when performing a Garbage Collection. However - machine 1 was unaffected, and very little memory was freed which means Solr still used around 2,5GB of the memory. I would assume the memory of machine 1 would be emptied in a similar manner as with machine 2 and 3 as the indexation was stopped. Most of the memory belonged to the memory pool of "CMS Old Gen"  (well above 2GB).

Indexing until the memory is full for machine 1 gives a count of 50 000 in "File Descriptor Count" - while the number of files in the index folder is around 150 for each node. I was told that the number of files in the index folder and the file descriptor count should be matching? Machine 1 has an enormous amount of TCP-connections stalling at CLOSE_WAIT - while machine 2 and 3 doesn't have their respective FIN_WAITs even tough machine 1 has almost all of his TCP-connections pointing at those machines.

[cid:image001.png@01D1D6A1.C6B71120][cid:image002.png@01D1D6A1.C6B71120]
JConsole pictures for machine 1 and 2, respectively. At 08:45 did we resume indexation - the same exception as shown above was given around 08:52. Machine 2 cleans most of the memory at GC - in contrast to machine 1.


We have no idea whether this is a bug or fault in the configuration - and was hoping someone could provide aid to our problem.

Greetings,
Mads