You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Raji N <ra...@gmail.com> on 2020/03/29 08:17:35 UTC

Solrcloud 7.6 OOM due to unable to create native threads

Hi All,

We running solrcloud 7.6  (with the patch #
https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
production on 7 hosts in  containers. The container memory is 48GB , heap
is 24GB.
ulimit -v

unlimited

ulimit -m

unlimited
 We don't have any custom code in solr. We have set up  bidirectional CDCR
between primary and secondary Datacenter. Our secondary DC is very unstable
and many times many instances are down.

We get below exception quite often. Is this because the CDCR connection is
broken.

 WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception

java.lang.OutOfMemoryError: unable to create new native thread

               at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]

               at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]

               at
org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
~[httpclient-4.5.3.jar:4.5.3]

               at
org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
~[httpclient-4.5.3.jar:4.5.3]

               at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

               at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

               at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

               at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

               at
org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

               at
org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
- nknize - 2018-12-07 14:47:53]

               at
org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
[solr-core-7.6.0.jar:7.6.0-SNAPSHOT
34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
14:02:46]

               at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_211]

               at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[?:1.8.0_211]

               at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_211]

               at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[?:1.8.0_211]

               at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_211]

               at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_211]

               at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]

 Thanks,
 Raji

Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by Walter Underwood <wu...@wunderwood.org>.
We have defined a “search feed” as a file of JSONL objects, one per line.
The feed files can be stored in S3, reloaded, sent to two clusters, etc.
Each destination can keep its own log of failures and retries. We’ve been
doing this for full batch feeds and incrementals for a few years.

We’ve been using a Java program to load those, but I just wrote a
multi-threaded Python thingy that uses the JSON update handlers.
That is pretty simple code. 

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 31, 2020, at 11:19 PM, S G <sg...@gmail.com> wrote:
> 
> One approach could be to buffer the messages in Kafka before pushing to
> Solr.
> And then use "Kafka mirror" to replicate the messages to the other DC.
> Now both DCs' Kafka pipelines are in sync by the mirror and you can run
> storm/spark/flink etc jobs to consume local Kafka and publish to local Solr
> clusters.
> This moves the responsibility of DR-sync to something designed specifically
> for this purpose - Kafka mirror.
> However do not use more than an year old version of Kafka as they had lot
> of issues with mirroring.
> 
> 
> On Mon, Mar 30, 2020 at 11:43 PM Raji N <ra...@gmail.com> wrote:
> 
>> Hi Eric,
>> 
>> What are you recommendations for SolrCloud DR strategy.
>> 
>> Thanks,
>> Raji
>> 
>> On Sun, Mar 29, 2020 at 6:25 PM Erick Erickson <er...@gmail.com>
>> wrote:
>> 
>>> I don’t recommend CDCR at this point, I think there better approaches.
>>> 
>>> The root problem is that CDCR uses tlog files as a queueing mechanism.
>>> If the connection between the DCs is broken for any reason, the tlogs
>> grow
>>> without limit. This could probably be fixed, but a better alternative is
>> to
>>> use something designed to insure messages (updates) are delivered to
>>> separate DCs rathe than try to have CDCR re-invent that wheel.
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Mar 29, 2020, at 6:47 PM, S G <sg...@gmail.com> wrote:
>>>> 
>>>> Is CDCR even recommended to be used in production?
>>>> Or it was abandoned before it could become production ready ?
>>>> 
>>>> Thanks
>>>> SG
>>>> 
>>>> 
>>>> On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <
>> erickerickson@gmail.com>
>>>> wrote:
>>>> 
>>>>> What that error usually means is that there are a zillion threads
>>> running.
>>>>> 
>>>>> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
>>>>> take a look at the thread dump to see if you have lots of
>>>>> threads that are running. Any by “lots” here, I mean 100s of threads
>>>>> that reference the same component, in this case that have cdcr in
>>>>> the stack trace.
>>>>> 
>>>>> CDCR is not getting active work at this point, you might want to
>>>>> consider another replication strategy if you’re not willing to fix
>>>>> the code.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>>> On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> We running solrcloud 7.6  (with the patch #
>>>>>> 
>>>>> 
>>> 
>> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
>>>>>> production on 7 hosts in  containers. The container memory is 48GB ,
>>> heap
>>>>>> is 24GB.
>>>>>> ulimit -v
>>>>>> 
>>>>>> unlimited
>>>>>> 
>>>>>> ulimit -m
>>>>>> 
>>>>>> unlimited
>>>>>> We don't have any custom code in solr. We have set up  bidirectional
>>> CDCR
>>>>>> between primary and secondary Datacenter. Our secondary DC is very
>>>>> unstable
>>>>>> and many times many instances are down.
>>>>>> 
>>>>>> We get below exception quite often. Is this because the CDCR
>> connection
>>>>> is
>>>>>> broken.
>>>>>> 
>>>>>> WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
>>>>>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
>>>>>> 
>>>>>> java.lang.OutOfMemoryError: unable to create new native thread
>>>>>> 
>>>>>>             at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
>>>>>> 
>>>>>>             at java.lang.Thread.start(Thread.java:717)
>> ~[?:1.8.0_211]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
>>>>>> ~[httpclient-4.5.3.jar:4.5.3]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
>>>>>> ~[httpclient-4.5.3.jar:4.5.3]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
>>>>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>>>>> - nknize - 2018-12-07 14:47:53]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
>>>>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>>>>> - nknize - 2018-12-07 14:47:53]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
>>>>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>>>>> - nknize - 2018-12-07 14:47:53]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
>>>>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>>>>> - nknize - 2018-12-07 14:47:53]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
>>>>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>>>>> - nknize - 2018-12-07 14:47:53]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
>>>>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>>>>> - nknize - 2018-12-07 14:47:53]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
>>>>>> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
>>>>>> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
>>>>>> 14:02:46]
>>>>>> 
>>>>>>             at
>>>>>> 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>> [?:1.8.0_211]
>>>>>> 
>>>>>>             at
>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>>> [?:1.8.0_211]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>>> [?:1.8.0_211]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>>> [?:1.8.0_211]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>> [?:1.8.0_211]
>>>>>> 
>>>>>>             at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>> [?:1.8.0_211]
>>>>>> 
>>>>>>             at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
>>>>>> 
>>>>>> Thanks,
>>>>>> Raji
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by S G <sg...@gmail.com>.
One approach could be to buffer the messages in Kafka before pushing to
Solr.
And then use "Kafka mirror" to replicate the messages to the other DC.
Now both DCs' Kafka pipelines are in sync by the mirror and you can run
storm/spark/flink etc jobs to consume local Kafka and publish to local Solr
clusters.
This moves the responsibility of DR-sync to something designed specifically
for this purpose - Kafka mirror.
However do not use more than an year old version of Kafka as they had lot
of issues with mirroring.


On Mon, Mar 30, 2020 at 11:43 PM Raji N <ra...@gmail.com> wrote:

> Hi Eric,
>
> What are you recommendations for SolrCloud DR strategy.
>
> Thanks,
> Raji
>
> On Sun, Mar 29, 2020 at 6:25 PM Erick Erickson <er...@gmail.com>
> wrote:
>
> > I don’t recommend CDCR at this point, I think there better approaches.
> >
> > The root problem is that CDCR uses tlog files as a queueing mechanism.
> > If the connection between the DCs is broken for any reason, the tlogs
> grow
> > without limit. This could probably be fixed, but a better alternative is
> to
> > use something designed to insure messages (updates) are delivered to
> > separate DCs rathe than try to have CDCR re-invent that wheel.
> >
> > Best,
> > Erick
> >
> > > On Mar 29, 2020, at 6:47 PM, S G <sg...@gmail.com> wrote:
> > >
> > > Is CDCR even recommended to be used in production?
> > > Or it was abandoned before it could become production ready ?
> > >
> > > Thanks
> > > SG
> > >
> > >
> > > On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <
> erickerickson@gmail.com>
> > > wrote:
> > >
> > >> What that error usually means is that there are a zillion threads
> > running.
> > >>
> > >> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
> > >> take a look at the thread dump to see if you have lots of
> > >> threads that are running. Any by “lots” here, I mean 100s of threads
> > >> that reference the same component, in this case that have cdcr in
> > >> the stack trace.
> > >>
> > >> CDCR is not getting active work at this point, you might want to
> > >> consider another replication strategy if you’re not willing to fix
> > >> the code.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
> > >>>
> > >>> Hi All,
> > >>>
> > >>> We running solrcloud 7.6  (with the patch #
> > >>>
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> > >>> production on 7 hosts in  containers. The container memory is 48GB ,
> > heap
> > >>> is 24GB.
> > >>> ulimit -v
> > >>>
> > >>> unlimited
> > >>>
> > >>> ulimit -m
> > >>>
> > >>> unlimited
> > >>> We don't have any custom code in solr. We have set up  bidirectional
> > CDCR
> > >>> between primary and secondary Datacenter. Our secondary DC is very
> > >> unstable
> > >>> and many times many instances are down.
> > >>>
> > >>> We get below exception quite often. Is this because the CDCR
> connection
> > >> is
> > >>> broken.
> > >>>
> > >>> WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> > >>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> > >>>
> > >>> java.lang.OutOfMemoryError: unable to create new native thread
> > >>>
> > >>>              at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> > >>>
> > >>>              at java.lang.Thread.start(Thread.java:717)
> ~[?:1.8.0_211]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > >>> ~[httpclient-4.5.3.jar:4.5.3]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > >>> ~[httpclient-4.5.3.jar:4.5.3]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > >>> - nknize - 2018-12-07 14:47:53]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > >>> - nknize - 2018-12-07 14:47:53]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > >>> - nknize - 2018-12-07 14:47:53]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > >>> - nknize - 2018-12-07 14:47:53]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
> > >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > >>> - nknize - 2018-12-07 14:47:53]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > >>> - nknize - 2018-12-07 14:47:53]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > >>> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> > >>> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> > >>> 14:02:46]
> > >>>
> > >>>              at
> > >>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > >>> [?:1.8.0_211]
> > >>>
> > >>>              at
> > >>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > >>> [?:1.8.0_211]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > >>> [?:1.8.0_211]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > >>> [?:1.8.0_211]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > >>> [?:1.8.0_211]
> > >>>
> > >>>              at
> > >>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > >>> [?:1.8.0_211]
> > >>>
> > >>>              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
> > >>>
> > >>> Thanks,
> > >>> Raji
> > >>
> > >>
> >
> >
>

Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by Raji N <ra...@gmail.com>.
Hi Eric,

What are you recommendations for SolrCloud DR strategy.

Thanks,
Raji

On Sun, Mar 29, 2020 at 6:25 PM Erick Erickson <er...@gmail.com>
wrote:

> I don’t recommend CDCR at this point, I think there better approaches.
>
> The root problem is that CDCR uses tlog files as a queueing mechanism.
> If the connection between the DCs is broken for any reason, the tlogs grow
> without limit. This could probably be fixed, but a better alternative is to
> use something designed to insure messages (updates) are delivered to
> separate DCs rathe than try to have CDCR re-invent that wheel.
>
> Best,
> Erick
>
> > On Mar 29, 2020, at 6:47 PM, S G <sg...@gmail.com> wrote:
> >
> > Is CDCR even recommended to be used in production?
> > Or it was abandoned before it could become production ready ?
> >
> > Thanks
> > SG
> >
> >
> > On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> What that error usually means is that there are a zillion threads
> running.
> >>
> >> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
> >> take a look at the thread dump to see if you have lots of
> >> threads that are running. Any by “lots” here, I mean 100s of threads
> >> that reference the same component, in this case that have cdcr in
> >> the stack trace.
> >>
> >> CDCR is not getting active work at this point, you might want to
> >> consider another replication strategy if you’re not willing to fix
> >> the code.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> We running solrcloud 7.6  (with the patch #
> >>>
> >>
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> >>> production on 7 hosts in  containers. The container memory is 48GB ,
> heap
> >>> is 24GB.
> >>> ulimit -v
> >>>
> >>> unlimited
> >>>
> >>> ulimit -m
> >>>
> >>> unlimited
> >>> We don't have any custom code in solr. We have set up  bidirectional
> CDCR
> >>> between primary and secondary Datacenter. Our secondary DC is very
> >> unstable
> >>> and many times many instances are down.
> >>>
> >>> We get below exception quite often. Is this because the CDCR connection
> >> is
> >>> broken.
> >>>
> >>> WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> >>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> >>>
> >>> java.lang.OutOfMemoryError: unable to create new native thread
> >>>
> >>>              at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> >>>
> >>>              at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
> >>>
> >>>              at
> >>>
> >>
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> >>> ~[httpclient-4.5.3.jar:4.5.3]
> >>>
> >>>              at
> >>>
> >>
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> >>> ~[httpclient-4.5.3.jar:4.5.3]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> >>> - nknize - 2018-12-07 14:47:53]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> >>> - nknize - 2018-12-07 14:47:53]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> >>> - nknize - 2018-12-07 14:47:53]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> >>> - nknize - 2018-12-07 14:47:53]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
> >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> >>> - nknize - 2018-12-07 14:47:53]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> >>> - nknize - 2018-12-07 14:47:53]
> >>>
> >>>              at
> >>>
> >>
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> >>> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> >>> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> >>> 14:02:46]
> >>>
> >>>              at
> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >>> [?:1.8.0_211]
> >>>
> >>>              at
> >>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >>> [?:1.8.0_211]
> >>>
> >>>              at
> >>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >>> [?:1.8.0_211]
> >>>
> >>>              at
> >>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >>> [?:1.8.0_211]
> >>>
> >>>              at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>> [?:1.8.0_211]
> >>>
> >>>              at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>> [?:1.8.0_211]
> >>>
> >>>              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
> >>>
> >>> Thanks,
> >>> Raji
> >>
> >>
>
>

Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by Erick Erickson <er...@gmail.com>.
I don’t recommend CDCR at this point, I think there better approaches.

The root problem is that CDCR uses tlog files as a queueing mechanism.
If the connection between the DCs is broken for any reason, the tlogs grow
without limit. This could probably be fixed, but a better alternative is to
use something designed to insure messages (updates) are delivered to
separate DCs rathe than try to have CDCR re-invent that wheel.

Best,
Erick

> On Mar 29, 2020, at 6:47 PM, S G <sg...@gmail.com> wrote:
> 
> Is CDCR even recommended to be used in production?
> Or it was abandoned before it could become production ready ?
> 
> Thanks
> SG
> 
> 
> On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> What that error usually means is that there are a zillion threads running.
>> 
>> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
>> take a look at the thread dump to see if you have lots of
>> threads that are running. Any by “lots” here, I mean 100s of threads
>> that reference the same component, in this case that have cdcr in
>> the stack trace.
>> 
>> CDCR is not getting active work at this point, you might want to
>> consider another replication strategy if you’re not willing to fix
>> the code.
>> 
>> Best,
>> Erick
>> 
>>> On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> We running solrcloud 7.6  (with the patch #
>>> 
>> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
>>> production on 7 hosts in  containers. The container memory is 48GB , heap
>>> is 24GB.
>>> ulimit -v
>>> 
>>> unlimited
>>> 
>>> ulimit -m
>>> 
>>> unlimited
>>> We don't have any custom code in solr. We have set up  bidirectional CDCR
>>> between primary and secondary Datacenter. Our secondary DC is very
>> unstable
>>> and many times many instances are down.
>>> 
>>> We get below exception quite often. Is this because the CDCR connection
>> is
>>> broken.
>>> 
>>> WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
>>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
>>> 
>>> java.lang.OutOfMemoryError: unable to create new native thread
>>> 
>>>              at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
>>> 
>>>              at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
>>> 
>>>              at
>>> 
>> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
>>> ~[httpclient-4.5.3.jar:4.5.3]
>>> 
>>>              at
>>> 
>> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
>>> ~[httpclient-4.5.3.jar:4.5.3]
>>> 
>>>              at
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>> - nknize - 2018-12-07 14:47:53]
>>> 
>>>              at
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>> - nknize - 2018-12-07 14:47:53]
>>> 
>>>              at
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>> - nknize - 2018-12-07 14:47:53]
>>> 
>>>              at
>>> 
>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>> - nknize - 2018-12-07 14:47:53]
>>> 
>>>              at
>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>> - nknize - 2018-12-07 14:47:53]
>>> 
>>>              at
>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
>>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
>>> - nknize - 2018-12-07 14:47:53]
>>> 
>>>              at
>>> 
>> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
>>> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
>>> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
>>> 14:02:46]
>>> 
>>>              at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> [?:1.8.0_211]
>>> 
>>>              at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>> [?:1.8.0_211]
>>> 
>>>              at
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>> [?:1.8.0_211]
>>> 
>>>              at
>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>> [?:1.8.0_211]
>>> 
>>>              at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> [?:1.8.0_211]
>>> 
>>>              at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> [?:1.8.0_211]
>>> 
>>>              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
>>> 
>>> Thanks,
>>> Raji
>> 
>> 


Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by S G <sg...@gmail.com>.
Is CDCR even recommended to be used in production?
Or it was abandoned before it could become production ready ?

Thanks
SG


On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <er...@gmail.com>
wrote:

> What that error usually means is that there are a zillion threads running.
>
> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
> take a look at the thread dump to see if you have lots of
> threads that are running. Any by “lots” here, I mean 100s of threads
> that reference the same component, in this case that have cdcr in
> the stack trace.
>
> CDCR is not getting active work at this point, you might want to
> consider another replication strategy if you’re not willing to fix
> the code.
>
> Best,
> Erick
>
> > On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
> >
> > Hi All,
> >
> > We running solrcloud 7.6  (with the patch #
> >
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> > production on 7 hosts in  containers. The container memory is 48GB , heap
> > is 24GB.
> > ulimit -v
> >
> > unlimited
> >
> > ulimit -m
> >
> > unlimited
> > We don't have any custom code in solr. We have set up  bidirectional CDCR
> > between primary and secondary Datacenter. Our secondary DC is very
> unstable
> > and many times many instances are down.
> >
> > We get below exception quite often. Is this because the CDCR connection
> is
> > broken.
> >
> > WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> >
> > java.lang.OutOfMemoryError: unable to create new native thread
> >
> >               at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> >
> >               at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
> >
> >               at
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >               at
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> > 14:02:46]
> >
> >               at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > [?:1.8.0_211]
> >
> >               at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > [?:1.8.0_211]
> >
> >               at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
> >
> > Thanks,
> > Raji
>
>

Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by Raji N <ra...@gmail.com>.
Thanks Eric. I don't seeing anywhere that CDCR is not recommended for
production use. Took the thread dump. Seeing about 140 CDCR threads


cdcr-replicator-219-thread-8" #787 prio=5 os_prio=0 tid=0x00007f7c34009000
nid=0x50a waiting on condition [0x00007f7ec871b000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000001da724ca0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

        at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)




cdcr-update-log-synchronizer-157-thread-1" #240 prio=5 os_prio=0
tid=0x00007f8782543800 nid=0x2e5 waiting on condition [0x00007f82ad99c000]

   java.lang.Thread.State: WAITING (parking)

        at sun.misc.Unsafe.park(Native Method)

        - parking to wait for  <0x00000001d7f9e8e8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

        at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)


Thanks,

Raji

On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <er...@gmail.com>
wrote:

> What that error usually means is that there are a zillion threads running.
>
> Try taking a thread dump. It’s _probable_ that it’s CDCR, but
> take a look at the thread dump to see if you have lots of
> threads that are running. Any by “lots” here, I mean 100s of threads
> that reference the same component, in this case that have cdcr in
> the stack trace.
>
> CDCR is not getting active work at this point, you might want to
> consider another replication strategy if you’re not willing to fix
> the code.
>
> Best,
> Erick
>
> > On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
> >
> > Hi All,
> >
> > We running solrcloud 7.6  (with the patch #
> >
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> > production on 7 hosts in  containers. The container memory is 48GB , heap
> > is 24GB.
> > ulimit -v
> >
> > unlimited
> >
> > ulimit -m
> >
> > unlimited
> > We don't have any custom code in solr. We have set up  bidirectional CDCR
> > between primary and secondary Datacenter. Our secondary DC is very
> unstable
> > and many times many instances are down.
> >
> > We get below exception quite often. Is this because the CDCR connection
> is
> > broken.
> >
> > WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> >
> > java.lang.OutOfMemoryError: unable to create new native thread
> >
> >               at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> >
> >               at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
> >
> >               at
> >
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >               at
> >
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> > ~[httpclient-4.5.3.jar:4.5.3]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> > - nknize - 2018-12-07 14:47:53]
> >
> >               at
> >
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> > 14:02:46]
> >
> >               at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > [?:1.8.0_211]
> >
> >               at
> > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > [?:1.8.0_211]
> >
> >               at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > [?:1.8.0_211]
> >
> >               at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
> >
> > Thanks,
> > Raji
>
>

Re: Solrcloud 7.6 OOM due to unable to create native threads

Posted by Erick Erickson <er...@gmail.com>.
What that error usually means is that there are a zillion threads running.

Try taking a thread dump. It’s _probable_ that it’s CDCR, but
take a look at the thread dump to see if you have lots of
threads that are running. Any by “lots” here, I mean 100s of threads
that reference the same component, in this case that have cdcr in
the stack trace.

CDCR is not getting active work at this point, you might want to
consider another replication strategy if you’re not willing to fix
the code.

Best,
Erick

> On Mar 29, 2020, at 4:17 AM, Raji N <ra...@gmail.com> wrote:
> 
> Hi All,
> 
> We running solrcloud 7.6  (with the patch #
> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon
> production on 7 hosts in  containers. The container memory is 48GB , heap
> is 24GB.
> ulimit -v
> 
> unlimited
> 
> ulimit -m
> 
> unlimited
> We don't have any custom code in solr. We have set up  bidirectional CDCR
> between primary and secondary Datacenter. Our secondary DC is very unstable
> and many times many instances are down.
> 
> We get below exception quite often. Is this because the CDCR connection is
> broken.
> 
> WARN  (cdcr-update-log-synchronizer-80-thread-1) [   ]
> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
> 
> java.lang.OutOfMemoryError: unable to create new native thread
> 
>               at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211]
> 
>               at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211]
> 
>               at
> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)
> ~[httpclient-4.5.3.jar:4.5.3]
> 
>               at
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219)
> ~[httpclient-4.5.3.jar:4.5.3]
> 
>               at
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
> 
>               at
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
> 
>               at
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
> 
>               at
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
> 
>               at
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
> 
>               at
> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957)
> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f
> - nknize - 2018-12-07 14:47:53]
> 
>               at
> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139)
> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT
> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30
> 14:02:46]
> 
>               at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_211]
> 
>               at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [?:1.8.0_211]
> 
>               at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [?:1.8.0_211]
> 
>               at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [?:1.8.0_211]
> 
>               at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_211]
> 
>               at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_211]
> 
>               at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
> 
> Thanks,
> Raji