You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Dominique Bejean <do...@eolya.fr> on 2021/09/01 07:18:31 UTC

Re: Solr heap memory

Hi,

As previously said, long GC pauses should be the cause of Solr/Zookeeper
communication issues. Analyse your GC logs with gceasy.io in order to
confirm this. After this, you need to investigate what is causing so much
heap memory consumption. Maybe you will discover misconceptions in
your shema and some of your queries (facets, leading wildcards on
misconfigured fields, huge boolean queries, ...) and you will be able to
drastically divide memory requirements. Maybe you have some much data that
you will need to reconsider your sharding and infrastructure. Anyway,
increasing heap size indefinitely is not the solution and in any case never
over 31g.

Dominique


Le lun. 30 août 2021 à 15:15, HariBabu kuruva <ha...@gmail.com> a
écrit :

> In logs I could see this WARN.
>
> 2021-08-30 13:15:52.301 WARN  (zkCallback-12-thread-3) [c:quoteStore
> s:shard1 r:core_node6 x:quoteStore_shard1_replica_n5]
> o.a.s.c.RecoveryStrategy Stopping recovery for
> core=[quoteStore_shard1_replica_n5] coreNodeName=[core_node6]
>
> On Mon, Aug 30, 2021 at 6:43 PM HariBabu kuruva <hari2708.kuruva@gmail.com
> >
> wrote:
>
> > Hi Zisis,
> >
> > Thanks for your email.
> >
> > We are suspecting the issue with one particular solr collection(or
> > store).  Wherever the replicas of that store are present that nodes are
> > going down.
> >
> > Also now that shard is in recovery mode and Leader is not elected. Could
> > you please suggest something to bring up this store.
> >
> > On Mon, Aug 30, 2021 at 1:55 PM Zisis Tachtsidis <zi...@runbox.com>
> > wrote:
> >
> >> My guess is that the Solr/Zookeeper communication issues are due to GC
> >> pauses. You are saying that you end up with OOM problems. High memory
> usage
> >> puts pressure on GC. Long GC pauses lead to timeouts in Solr/Zookeeper
> >> communication. We've seen that happening.
> >>
> >> First thing I'd do is to get a heap dump once the OOM is triggered and
> >> analyze that to see what is occupying the memory. Otherwise we are blind
> >> here.  Is it due to heavy indexing? Heavy querying? Both of them? Do you
> >> have customizations in the analysis chain that might generate more
> objects
> >> than usual?
> >>
> >> Zisis
> >>
> >> On Mon, 30 Aug 2021 03:44:13 -0400, Dave <ha...@gmail.com>
> >> wrote:
> >>
> >> > I can’t help beyond such, I don’t like solr cloud nor zookeeper, I
> will
> >> always, if I can help it, stick to standalone solr instance.
> >> >
> >> > > On Aug 30, 2021, at 3:23 AM, HariBabu kuruva <
> >> hari2708.kuruva@gmail.com> wrote:
> >> > >
> >> > > Hi Dave
> >> > >
> >> > > We tried setting the memory as per your suggestions.
> >> > >
> >> > > But still I see that the solr is going down in a couple of minutes
> >> with an
> >> > > OOM error. Also in the solr logs it says below connectivity issue
> >> between
> >> > > solr and zookeeper.  Please advise.
> >> > >
> >> > > Zookeeper is running fine.
> >> > >
> >> > >
> >> > > 2021-08-30 06:24:13.070 WARN  (main-SendThread(
> >> > > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Client
> >> session
> >> > > timed out, have not heard from server in 65584ms for session id
> >> > > 0x1000019354b021b
> >> > > 2021-08-30 06:24:13.071 WARN  (main-SendThread(
> >> > > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Session
> >> > > 0x1000019354b021b for sever
> >> lxeisprdas06.corp.equinix.com/10.**.*.*:2181,
> >> > > Closing socket connection. Attempting reconnect except it is a
> >> > > SessionExpiredException. =>
> >> > > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client
> >> session
> >> > > timed out, have not heard from server in 65584ms for session id
> >> > > 0x1000019354b021b
> >> > >        at
> >> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
> >> > > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client
> >> session
> >> > > timed out, have not heard from server in 65584ms for session id
> >> > > 0x1000019354b021b
> >> > >        at
> >> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
> >> > > ~[zookeeper-3.6.2.jar:3.6.2]
> >> > > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-540) [   ]
> >> > > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> >> > > 1630304577209 , received timestamp: 1630304666181 , TTL: 15000
> >> > > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-531) [   ]
> >> > > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> >> > > 1630304527726 , received timestamp: 1630304600766 , TTL: 15000
> >> > > 2021-08-30 06:26:36.014 WARN
> >> (zkConnectionManagerCallback-13-thread-1) [
> >> > > ] o.a.s.c.c.ConnectionManager Watcher
> >> > > org.apache.solr.common.cloud.ConnectionManager@e31302e name:
> >> > > ZooKeeperConnection Watcher:zookeeper2.corp.equinix.com:2181,
> >> > > zookeeper1.corp.equinix.com:2182,zookeeper3.corp.equinix.com:2183,
> >> > > zookeeper4.corp.equinix.com:2184,zookeeper5.corp.equinix.com:2185
> >> got event
> >> > > WatchedEvent state:Disconnected type:None path:null path: null type:
> >> None
> >> > > 2021-08-30 06:26:36.014 WARN
> >> (zkConnectionManagerCallback-13-thread-1) [
> >> > > ] o.a.s.c.c.ConnectionManager zkClient has disconnected
> >> > > 2021-08-30 07:06:32.484 WARN  (main-SendThread(
> >> zookeeper5.corp.equ.com:2185))
> >> > > [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
> >> server
> >> > > in 1851316ms for session id 0x1000019354b021b
> >> > >
> >> > >> On Sun, Aug 29, 2021 at 11:38 PM Dave <
> hastings.recursive@gmail.com>
> >> wrote:
> >> > >>
> >> > >> Yes. Don’t set those memory restrictions,  just xms and xmx, both
> to
> >> 31
> >> > >> gigs. Java has problems past that line and will make the gc go into
> >> a bad
> >> > >> loop. I can send you a link as to why
> >> > >>
> >>
> https://community.datastax.com/questions/3661/why-is-a-32-gb-heap-allocation-not-recommended.html
> >> > >>
> >> > >> But this is almost like a protected secret
> >> > >>
> >> > >>>> On Aug 29, 2021, at 1:52 PM, Shawn Heisey <ap...@elyograg.org>
> >> wrote:
> >> > >>>
> >> > >>> On 8/29/2021 2:38 AM, HariBabu kuruva wrote:
> >> > >>>> Is it required to define both the parameters SOLR_HEAP and
> >> > >> SOLR_JAVA_MEM.
> >> > >>>> or can i comment SOLR_HEAP  and only define SOLR_JAVA_MEM.
> >> > >>>> Also  what highest value of Xmx value i can go if i receive OOM
> >> with
> >> > >> 31gb.
> >> > >>>> I have only solr running on that node.
> >> > >>>
> >> > >>> If both are defined, I do not know which one will actually take
> >> effect.
> >> > >> Figuring that out would require looking at the startup script and
> >> doing
> >> > >> some experiments to see what Java actually does.
> >> > >>>
> >> > >>> I would personally remove SOLR_JAVA_MEM and only go with
> SOLR_HEAP.
> >> Then
> >> > >> you can do something very simple like the following, and the Solr
> >> startup
> >> > >> script will set both -Xms and -Xmx java options to that value:
> >> > >>>
> >> > >>> SOLR_HEAP=4g
> >> > >>>
> >> > >>>> And could you please let me know the reason to disable swap
> memory.
> >> > >>>
> >> > >>> If a system starts actively swapping, its performance in general
> >> will be
> >> > >> extremely low.  If that happens, it is an indication that there is
> >> not
> >> > >> enough physical memory and the system needs more, or that
> >> configurations
> >> > >> need to be adjusted to require less memory.
> >> > >>>
> >> > >>> Disabling swap makes it impossible for the OS to try and use disk
> >> space
> >> > >> as memory.  In situations where programs are asking for too much
> >> memory and
> >> > >> you have swap completely disabled, either Java or the OS will
> simply
> >> kill
> >> > >> the process that's asking for too much memory, rather than letting
> >> it run
> >> > >> and destroy overall performance.
> >> > >>>
> >> > >>> ---
> >> > >>>
> >> > >>> Responding to something in the OP:
> >> > >>>
> >> > >>> It is completely normal to see 100 percent memory utilization on
> >> just
> >> > >> about any server, whether it's running Solr or not.  The OS will
> use
> >> all
> >> > >> available memory for caching purposes, to speed everything up.  The
> >> only
> >> > >> time you won't see 100 percent memory usage is when you have far
> more
> >> > >> memory than the system actually needs.  For instance, if you had
> >> 512GB of
> >> > >> memory on a system that only handles megabytes of data.
> >> > >>>
> >> > >>>
> >>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
> >> > >>>
> >> > >>> (disclaimer: I wrote the wiki page linked here.  Any errors are
> >> mine.)
> >> > >>>
> >> > >>> Thanks,
> >> > >>> Shawn
> >> > >>
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > > Thanks and Regards,
> >> > > Hari
> >> > > Mobile:9790756568
> >>
> >>
> >>
> >
> > --
> >
> > Thanks and Regards,
> >  Hari
> > Mobile:9790756568
> >
>
>
> --
>
> Thanks and Regards,
>  Hari
> Mobile:9790756568
>