You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by mvkarp <li...@hotmail.com> on 2019/11/01 00:24:59 UTC

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg> 

I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
(there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

Sounds like a bug. Would be great to have a ticket with reproducer.

пт, 1 нояб. 2019 г. в 03:25, mvkarp <li...@hotmail.com>:
>
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg>
>
> I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
> (there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

Thank you for digging deeper! No good ideas about problems in curCrd.local().

Have you tried to reproduce a leak with starting/stopping huge amount
of client nodes?

вс, 17 нояб. 2019 г. в 13:16, mvkarp <li...@hotmail.com>:
>
> Only other thing I can think of if it's through onDiscovery() is that
> curCrd.local() somehow is returning true. However I am unable to find
> exactly how local() is determined since there appears to be a big chain.
>
> I know that the node uuid on the leaking server is on a different physical
> node as well as has a completely different node ID
> (bbbbb-bbbbb-bbbbb-bbbbb-bbbbb-bbbbb) to what the MVCC coordinator is
> (mvccCrd=aaaaa-aaaa-aaaaa-aaaaa-aaaaa)
>
> Is there any way that the curCrd.local() could be returning True on the
> leaking server JVM? I am trying to investigate how local() is determined and
> what could cause it to be true.
>
>
> Ivan Pavlukhin wrote
> > But currently I suspect that you faced a leak in
> > MvccProcessorImpl.onDiscovery on non MVCC coordinator nodes. Do you
> > think that there is other reason in you case?
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Only other thing I can think of if it's through onDiscovery() is that
curCrd.local() somehow is returning true. However I am unable to find
exactly how local() is determined since there appears to be a big chain. 

I know that the node uuid on the leaking server is on a different physical
node as well as has a completely different node ID
(bbbbb-bbbbb-bbbbb-bbbbb-bbbbb-bbbbb) to what the MVCC coordinator is
(mvccCrd=aaaaa-aaaa-aaaaa-aaaaa-aaaaa)

Is there any way that the curCrd.local() could be returning True on the
leaking server JVM? I am trying to investigate how local() is determined and
what could cause it to be true.


Ivan Pavlukhin wrote
> But currently I suspect that you faced a leak in
> MvccProcessorImpl.onDiscovery on non MVCC coordinator nodes. Do you
> think that there is other reason in you case?





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ivan Pavlukhin <vo...@gmail.com>.

> But the population might be through processRecoveryFinishedMessage() - which does not do any check for isLocal() and goes straight to processing message since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always returns false?

All nodes sends messages intended for handling in
processRecoveryFinishedMessage() to MVCC coordinator node. So, we
assume that only coordinator receives such messages. If you are
interested in, you can find other side code in
IgniteTxManager.NodeFailureTimeoutObject, "recovery finished" messages
are issued there.

But currently I suspect that you faced a leak in
MvccProcessorImpl.onDiscovery on non MVCC coordinator nodes. Do you
think that there is other reason in you case?

сб, 16 нояб. 2019 г. в 12:30, mvkarp <li...@hotmail.com>:
>
> 1 & 2. Actually, looking at latest Master on the release of 2.7.5 and the
> current Master version, it is 'pickMvccCoordinator' function which returns
> the coordinator (this is same function that selects node that is not Client
> and Ignite version >= 2.7). curCrd is then assigned the return variable of
> pickMvccCoordinator, which becomes the active Mvcc coordinator. So looks
> like it does become active, but not sure the effect of that yet.
>
> 3. Assuming it is then active, looks like there are two entry points into
> recoveryBallotBoxes. Through  onDiscovery() and via
> processRecoveryFinishedMessage().
>
> Is it possible that onDiscovery() does not populate recoveryBallotBoxes as
> there is curCrd0.local() check - so processing will only be done if MVCC
> coordinator is local - thus a node that is actually a MVCC coordinator will
> clear out the recoveryBallotBoxes (which is the explicit check that you
> mentioned).
>
> But the population might be through processRecoveryFinishedMessage() - which
> does not do any check for isLocal() and goes straight to processing message
> since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always
> returns false?
>
>
> Ivan Pavlukhin wrote
> > 1. MVCC coordinator should not do anything when there is no MVCC
> > caches, actually it should not be active in such case. Basically, MVCC
> > coordinator is needed to have a consistent order between transactions.
> > 2. In 2.7.5 "assigned coordinator" is always selected but it does not
> > mean that it is active. MvccProcessorImpl.curCrd variable corresponds
> > to active MVCC coordinator.
> > 3. If that statement is true, then it should be rather easy to
> > reproduce the problem by starting and stopping client nodes
> > frequently. recoveryBallotBoxes was not assumed to be populated on
> > nodes other than MVCC coordinator. If it happens than we found a bug.
> > Actually, the code in master is different and has an explicit check
> > that recoveryBallotBoxes are populated only on MVCC coordinator.
> >
> > чт, 14 нояб. 2019 г. в 15:42, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >>
> >> Hi, after investigating I have few questions regarding this issue.
> >>
> >> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
> >> able to shed some light on the role and purpose of the MVCC coordinator?
> >> What does the MVCC coordinator do, why is one selected? Should an MVCC
> >> coordinator be selected regardless of MVCC being disabled? (i.e. is it
> >> used
> >> for any other base features and is it just the way Ignite is meant to
> >> work)
> >>
> >> 2. Following on from this, after looking at the code of the
> >> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an
> >> MVCC
> >> coordinator is ALWAYS selected and assigns one of the server nodes as the
> >> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or
> >> not
> >> (mvccEnabled can be false but a MVCC coordinator is still be selected).
> >>
> >> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
> >>
> >> On Line 861, in assignMvccCoordinator method, it loops through all nodes
> >> in
> >> the cluster with only these two conditions.
> >>
> >> *if (!node.isClient() && supportsMvcc(node))*
> >>
> >> It only checks if the node is not a client, and that is supportsMvcc
> >> (which
> >> is true for all versions > 2.7). It does not check mvccEnabled at all.
> >>
> >>
> >> Can you confirm the above is intentional/expected or if there is another
> >> piece of code I am missing?
> >>
> >>
> >> 3. As extra information, the node that happens to be selected as MVCC
> >> coordinator does not get the leak. But every other client/server gets the
> >> leak.
> >>
> >>
> >>
> >> Ivan Pavlukhin wrote
> >> > Hi,
> >> >
> >> > I suspect a following here. Some node treats itself as a MVCC
> >> > coordinator and creates a new RecoveryBallotBox when each client node
> >> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> >> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> >> > coordinator. Consequently a memory leak.
> >> >
> >> > A following could be done:
> >> > 1. Figure out why some node treats itself MVCC coordinator and others
> >> > think that MVCC is disabled.
> >> > 2. Try to introduce some defensive matters in Ignite code to protect
> >> > from the leak in a long running cluster.
> >> >
> >> > As a last chance workaround I can suggest writing custom code, which
> >> > cleans recoveryBallotBoxes map from time to time (most likely using
> >> > reflection).
> >> >
> >> > пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;
> >>
> >> > liquid_ninja2k@
> >>
> >> > &gt;:
> >> >>
> >> >> We have frequently stopping and starting clients in short lived client
> >> >> JVM
> >> >> processes as required for our purposes, this seems to lead to a huge
> >> >> bunch
> >> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
> >> >>
> >> >> Still can not figure out why this map won't clear (there are no
> >> >> exceptions
> >> >> or err at all in the entire log)
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Ivan Pavlukhin
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

1 & 2. Actually, looking at latest Master on the release of 2.7.5 and the
current Master version, it is 'pickMvccCoordinator' function which returns
the coordinator (this is same function that selects node that is not Client
and Ignite version >= 2.7). curCrd is then assigned the return variable of
pickMvccCoordinator, which becomes the active Mvcc coordinator. So looks
like it does become active, but not sure the effect of that yet.

3. Assuming it is then active, looks like there are two entry points into
recoveryBallotBoxes. Through  onDiscovery() and via
processRecoveryFinishedMessage().

Is it possible that onDiscovery() does not populate recoveryBallotBoxes as
there is curCrd0.local() check - so processing will only be done if MVCC
coordinator is local - thus a node that is actually a MVCC coordinator will
clear out the recoveryBallotBoxes (which is the explicit check that you
mentioned).

But the population might be through processRecoveryFinishedMessage() - which
does not do any check for isLocal() and goes straight to processing message
since waitForCoordinatorInit() on a MvccRecoveryFinishedMessage always
returns false?


Ivan Pavlukhin wrote
> 1. MVCC coordinator should not do anything when there is no MVCC
> caches, actually it should not be active in such case. Basically, MVCC
> coordinator is needed to have a consistent order between transactions.
> 2. In 2.7.5 "assigned coordinator" is always selected but it does not
> mean that it is active. MvccProcessorImpl.curCrd variable corresponds
> to active MVCC coordinator.
> 3. If that statement is true, then it should be rather easy to
> reproduce the problem by starting and stopping client nodes
> frequently. recoveryBallotBoxes was not assumed to be populated on
> nodes other than MVCC coordinator. If it happens than we found a bug.
> Actually, the code in master is different and has an explicit check
> that recoveryBallotBoxes are populated only on MVCC coordinator.
> 
> чт, 14 нояб. 2019 г. в 15:42, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>>
>> Hi, after investigating I have few questions regarding this issue.
>>
>> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
>> able to shed some light on the role and purpose of the MVCC coordinator?
>> What does the MVCC coordinator do, why is one selected? Should an MVCC
>> coordinator be selected regardless of MVCC being disabled? (i.e. is it
>> used
>> for any other base features and is it just the way Ignite is meant to
>> work)
>>
>> 2. Following on from this, after looking at the code of the
>> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an
>> MVCC
>> coordinator is ALWAYS selected and assigns one of the server nodes as the
>> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or
>> not
>> (mvccEnabled can be false but a MVCC coordinator is still be selected).
>>
>> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
>>
>> On Line 861, in assignMvccCoordinator method, it loops through all nodes
>> in
>> the cluster with only these two conditions.
>>
>> *if (!node.isClient() && supportsMvcc(node))*
>>
>> It only checks if the node is not a client, and that is supportsMvcc
>> (which
>> is true for all versions > 2.7). It does not check mvccEnabled at all.
>>
>>
>> Can you confirm the above is intentional/expected or if there is another
>> piece of code I am missing?
>>
>>
>> 3. As extra information, the node that happens to be selected as MVCC
>> coordinator does not get the leak. But every other client/server gets the
>> leak.
>>
>>
>>
>> Ivan Pavlukhin wrote
>> > Hi,
>> >
>> > I suspect a following here. Some node treats itself as a MVCC
>> > coordinator and creates a new RecoveryBallotBox when each client node
>> > leaves. Some (may be all) other nodes think that MVCC is disabled and
>> > do not send a vote (assumed for aforementioned ballot box) to MVCC
>> > coordinator. Consequently a memory leak.
>> >
>> > A following could be done:
>> > 1. Figure out why some node treats itself MVCC coordinator and others
>> > think that MVCC is disabled.
>> > 2. Try to introduce some defensive matters in Ignite code to protect
>> > from the leak in a long running cluster.
>> >
>> > As a last chance workaround I can suggest writing custom code, which
>> > cleans recoveryBallotBoxes map from time to time (most likely using
>> > reflection).
>> >
>> > пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;
>>
>> > liquid_ninja2k@
>>
>> > &gt;:
>> >>
>> >> We have frequently stopping and starting clients in short lived client
>> >> JVM
>> >> processes as required for our purposes, this seems to lead to a huge
>> >> bunch
>> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>> >>
>> >> Still can not figure out why this map won't clear (there are no
>> >> exceptions
>> >> or err at all in the entire log)
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Ivan Pavlukhin
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ivan Pavlukhin <vo...@gmail.com>.

1. MVCC coordinator should not do anything when there is no MVCC
caches, actually it should not be active in such case. Basically, MVCC
coordinator is needed to have a consistent order between transactions.
2. In 2.7.5 "assigned coordinator" is always selected but it does not
mean that it is active. MvccProcessorImpl.curCrd variable corresponds
to active MVCC coordinator.
3. If that statement is true, then it should be rather easy to
reproduce the problem by starting and stopping client nodes
frequently. recoveryBallotBoxes was not assumed to be populated on
nodes other than MVCC coordinator. If it happens than we found a bug.
Actually, the code in master is different and has an explicit check
that recoveryBallotBoxes are populated only on MVCC coordinator.

чт, 14 нояб. 2019 г. в 15:42, mvkarp <li...@hotmail.com>:
>
> Hi, after investigating I have few questions regarding this issue.
>
> 1. Having lack of knowledge in what MVCC coordinator is used for, are you
> able to shed some light on the role and purpose of the MVCC coordinator?
> What does the MVCC coordinator do, why is one selected? Should an MVCC
> coordinator be selected regardless of MVCC being disabled? (i.e. is it used
> for any other base features and is it just the way Ignite is meant to work)
>
> 2. Following on from this, after looking at the code of the
> MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an MVCC
> coordinator is ALWAYS selected and assigns one of the server nodes as the
> MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or not
> (mvccEnabled can be false but a MVCC coordinator is still be selected).
>
> https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java
>
> On Line 861, in assignMvccCoordinator method, it loops through all nodes in
> the cluster with only these two conditions.
>
> *if (!node.isClient() && supportsMvcc(node))*
>
> It only checks if the node is not a client, and that is supportsMvcc (which
> is true for all versions > 2.7). It does not check mvccEnabled at all.
>
>
> Can you confirm the above is intentional/expected or if there is another
> piece of code I am missing?
>
>
> 3. As extra information, the node that happens to be selected as MVCC
> coordinator does not get the leak. But every other client/server gets the
> leak.
>
>
>
> Ivan Pavlukhin wrote
> > Hi,
> >
> > I suspect a following here. Some node treats itself as a MVCC
> > coordinator and creates a new RecoveryBallotBox when each client node
> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> > coordinator. Consequently a memory leak.
> >
> > A following could be done:
> > 1. Figure out why some node treats itself MVCC coordinator and others
> > think that MVCC is disabled.
> > 2. Try to introduce some defensive matters in Ignite code to protect
> > from the leak in a long running cluster.
> >
> > As a last chance workaround I can suggest writing custom code, which
> > cleans recoveryBallotBoxes map from time to time (most likely using
> > reflection).
> >
> > пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >>
> >> We have frequently stopping and starting clients in short lived client
> >> JVM
> >> processes as required for our purposes, this seems to lead to a huge
> >> bunch
> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
> >>
> >> Still can not figure out why this map won't clear (there are no
> >> exceptions
> >> or err at all in the entire log)
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Hi, after investigating I have few questions regarding this issue.

1. Having lack of knowledge in what MVCC coordinator is used for, are you
able to shed some light on the role and purpose of the MVCC coordinator?
What does the MVCC coordinator do, why is one selected? Should an MVCC
coordinator be selected regardless of MVCC being disabled? (i.e. is it used
for any other base features and is it just the way Ignite is meant to work)

2. Following on from this, after looking at the code of the
MvccProcessorImpl.java class in Ignite 2.7.5 Github, it looks like an MVCC
coordinator is ALWAYS selected and assigns one of the server nodes as the
MVCC coordinator, regardless of having TRANSACTIONAL_SNAPSHOT cache or not
(mvccEnabled can be false but a MVCC coordinator is still be selected).

https://github.com/apache/ignite/blob/ignite-2.7.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java

On Line 861, in assignMvccCoordinator method, it loops through all nodes in
the cluster with only these two conditions. 

*if (!node.isClient() && supportsMvcc(node))*

It only checks if the node is not a client, and that is supportsMvcc (which
is true for all versions > 2.7). It does not check mvccEnabled at all.


Can you confirm the above is intentional/expected or if there is another
piece of code I am missing?


3. As extra information, the node that happens to be selected as MVCC
coordinator does not get the leak. But every other client/server gets the
leak.



Ivan Pavlukhin wrote
> Hi,
> 
> I suspect a following here. Some node treats itself as a MVCC
> coordinator and creates a new RecoveryBallotBox when each client node
> leaves. Some (may be all) other nodes think that MVCC is disabled and
> do not send a vote (assumed for aforementioned ballot box) to MVCC
> coordinator. Consequently a memory leak.
> 
> A following could be done:
> 1. Figure out why some node treats itself MVCC coordinator and others
> think that MVCC is disabled.
> 2. Try to introduce some defensive matters in Ignite code to protect
> from the leak in a long running cluster.
> 
> As a last chance workaround I can suggest writing custom code, which
> cleans recoveryBallotBoxes map from time to time (most likely using
> reflection).
> 
> пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>>
>> We have frequently stopping and starting clients in short lived client
>> JVM
>> processes as required for our purposes, this seems to lead to a huge
>> bunch
>> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>>
>> Still can not figure out why this map won't clear (there are no
>> exceptions
>> or err at all in the entire log)
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

My first thought is deploying a service [1] (remotely dynamically
Ignite.services().deploy() or statically
IgniteConfiguration.setServiceConfiguration()) clearing problematic
map periodically.

[1] https://apacheignite.readme.io/docs/service-grid

пн, 11 нояб. 2019 г. в 13:20, mvkarp <li...@hotmail.com>:
>
> Hi,
>
> Would you have any suggestion on how to implement a last chance workaround
> for this issue for the server JVM?
>
>
> Ivan Pavlukhin wrote
> > Hi,
> >
> > I suspect a following here. Some node treats itself as a MVCC
> > coordinator and creates a new RecoveryBallotBox when each client node
> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> > coordinator. Consequently a memory leak.
> >
> > A following could be done:
> > 1. Figure out why some node treats itself MVCC coordinator and others
> > think that MVCC is disabled.
> > 2. Try to introduce some defensive matters in Ignite code to protect
> > from the leak in a long running cluster.
> >
> > As a last chance workaround I can suggest writing custom code, which
> > cleans recoveryBallotBoxes map from time to time (most likely using
> > reflection).
> >
> > пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >>
> >> We have frequently stopping and starting clients in short lived client
> >> JVM
> >> processes as required for our purposes, this seems to lead to a huge
> >> bunch
> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
> >>
> >> Still can not figure out why this map won't clear (there are no
> >> exceptions
> >> or err at all in the entire log)
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Hi,

Would you have any suggestion on how to implement a last chance workaround
for this issue for the server JVM?


Ivan Pavlukhin wrote
> Hi,
> 
> I suspect a following here. Some node treats itself as a MVCC
> coordinator and creates a new RecoveryBallotBox when each client node
> leaves. Some (may be all) other nodes think that MVCC is disabled and
> do not send a vote (assumed for aforementioned ballot box) to MVCC
> coordinator. Consequently a memory leak.
> 
> A following could be done:
> 1. Figure out why some node treats itself MVCC coordinator and others
> think that MVCC is disabled.
> 2. Try to introduce some defensive matters in Ignite code to protect
> from the leak in a long running cluster.
> 
> As a last chance workaround I can suggest writing custom code, which
> cleans recoveryBallotBoxes map from time to time (most likely using
> reflection).
> 
> пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>>
>> We have frequently stopping and starting clients in short lived client
>> JVM
>> processes as required for our purposes, this seems to lead to a huge
>> bunch
>> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>>
>> Still can not figure out why this map won't clear (there are no
>> exceptions
>> or err at all in the entire log)
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi,

I suspect a following here. Some node treats itself as a MVCC
coordinator and creates a new RecoveryBallotBox when each client node
leaves. Some (may be all) other nodes think that MVCC is disabled and
do not send a vote (assumed for aforementioned ballot box) to MVCC
coordinator. Consequently a memory leak.

A following could be done:
1. Figure out why some node treats itself MVCC coordinator and others
think that MVCC is disabled.
2. Try to introduce some defensive matters in Ignite code to protect
from the leak in a long running cluster.

As a last chance workaround I can suggest writing custom code, which
cleans recoveryBallotBoxes map from time to time (most likely using
reflection).

пн, 11 нояб. 2019 г. в 08:53, mvkarp <li...@hotmail.com>:
>
> We have frequently stopping and starting clients in short lived client JVM
> processes as required for our purposes, this seems to lead to a huge bunch
> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>
> Still can not figure out why this map won't clear (there are no exceptions
> or err at all in the entire log)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

We have frequently stopping and starting clients in short lived client JVM
processes as required for our purposes, this seems to lead to a huge bunch
of PME (but no rebalancing) and topology changes (topVer=300,000+)

Still can not figure out why this map won't clear (there are no exceptions
or err at all in the entire log)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Hi,

There are no more exceptions or err in the logs, only hundreds of thousands
of these logs - heap usage is still increasing steeply and leak is still
present.

[13:46:17,632][INFO][disco-event-worker-#102][GridDiscoveryManager] Topology
snapshot [ver=366003, locNode=6a9db3c2, servers=2, clients=17, state=ACTIVE,
CPUs=64, offheap=960.0GB, heap=46.0GB]
[13:46:17,632][INFO][disco-event-worker-#102][GridDiscoveryManager]   ^--
Baseline [id=0, size=2, online=0, offline=2]
[13:46:17,683][INFO][exchange-worker-#103][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=99624746-b624-49d6-9e36-bb6d648e9c3b,
crdVer=1571956920778, topVer=AffinityTopologyVersion [topVer=315751,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_LEFT,
evtNode=824dca07-a847-4fd7-81a5-ac0aa8644b26, customEvt=null,
allowMerge=true]
[13:46:17,685][INFO][exchange-worker-#103][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=366003,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=366003,
minorTopVer=0], err=null]
[13:46:17,708][INFO][exchange-worker-#103][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=6a9db3c2-08df-4bc2-8a26-13df50b86207,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=366003, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=824dca07-a847-4fd7-81a5-ac0aa8644b26, addrs=[10.16.1.47, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, xxxxx.com.au/10.16.1.47:0], discPort=0,
order=365983, intOrder=183032, lastExchangeTime=1573393534700, loc=false,
ver=2.7.5#20190603-sha1:be4f2a15, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0],
durationFromInit=21]
[13:46:17,708][INFO][exchange-worker-#103][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0], crd=false]
[13:46:17,770][INFO][exchange-worker-#103][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=366003, minorTopVer=0], force=false, evt=NODE_LEFT,
node=824dca07-a847-4fd7-81a5-ac0aa8644b26]
[13:46:18,620][INFO][disco-event-worker-#102][GridDiscoveryManager] Added
new node to topology: TcpDiscoveryNode
[id=1115b6b7-7caf-4737-9c61-930e193468f6, addrs=[10.16.1.43, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, xxxxx/10.16.1.43:0], discPort=0, order=366004,
intOrder=183041, lastExchangeTime=1573393578569, loc=false,
ver=2.7.5#20190603-sha1:be4f2a15, isClient=true]


*Lots of these warnings *
[13:49:04,673][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 798 milliseconds.

* and sometimes this *
[13:49:04,863][INFO][exchange-worker-#103][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=366038, minorTopVer=0], force=false, evt=NODE_JOINED,
node=7b25d879-b674-4e7d-b5f7-d1c6619e0091]
[13:49:05,677][INFO][grid-nio-worker-tcp-comm-0-#72][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.16.1.47:47101,
rmtAddr=/10.16.1.48:50550]
[13:49:05,706][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.16.1.47, rmtPort=53836]
[13:49:05,706][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.16.1.47, rmtPort=53836]
[13:49:05,707][INFO][tcp-disco-sock-reader-#25013][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.16.1.47:53836, rmtPort=53836]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

You seem to have an awful lot of errors related to connectivity problems
between nodes, such as:

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=ult-s2-svr1.dataprocessors.com.au/10.16.1.47:47106,
err=Connection refused]

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address [addr=ult-s2-svr3/10.16.1.43:47102, err=Remote node ID
is not as expected [expected=d97b5e5d-fb46-4b5b-91ad-79a69fce738f,
rcvd=1dc23ebb-0997-4858-9433-d5d30c9b643e]]

I recommend figuring those errors out: it's possible that you have nodes in
your cluster which are not reachable by communication from server node(s),
but present in discovery. Such nodes will cause all kinds of problems in
cluster.

Regards,
-- 
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 17:12, mvkarp <li...@hotmail.com>:

> Ok, there are no exceptions in the ignite logs for the client JVMs but I've
> attached the log for one of the problem servers. Looks like a few errors
> but
> I am unable to determine the root cause.
> ignite-46073e05.zip
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/ignite-46073e05.zip>
>
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > This is very strange, since we expect this collection to be cleared on
> > exchange.
> >
> > Please make sure you don't have any stray exceptions during exchange in
> > your logs.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пт, 8 нояб. 2019 г. в 12:49, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >
> >> Hi,
> >>
> >> This is not the case. Always only a maximum total of two server nodes.
> >> One
> >> JVM server on each. However there are many client JVMs that start and
> >> stop
> >> caches with setClientMode=true. It looks like one of the server
> instances
> >> is
> >> immune to the issue, whilst the most newly created one gets the leak,
> >> with
> >> a
> >> lot of partition exchanges happening for EVT_NODE_JOINED and
> >> EVT_NODE_LEFT
> >> (one of the nodes don't get any of these partition exchanges, however
> the
> >> exact server node that gets this can alternate so its not linked to one
> >> node
> >> in particular but seems to be linked to the most newly launched server).
> >>
> >>
> >> ilya.kasnacheev wrote
> >> > Hello!
> >> >
> >> > How many nodes do you have in your cluster?
> >> >
> >> > From the dump it seems that the number of server nodes is in
> thousands.
> >> Is
> >> > this the case?
> >> >
> >> > Regards,
> >> > --
> >> > Ilya Kasnacheev
> >> >
> >> >
> >> > пт, 8 нояб. 2019 г. в 10:26, mvkarp &lt;
> >>
> >> > liquid_ninja2k@
> >>
> >> > &gt;:
> >> >
> >> >> Let me know if these help or if you need anything more specific.
> >> >> recoveryBallotBoxes.zip
> >> >> <
> >> >>
> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip
> >> >
> >> >>
> >> >>
> >> >>
> >> >> ilya.kasnacheev wrote
> >> >> > Hello!
> >> >> >
> >> >> > Can you please check whether there are any especially large objects
> >> >> inside
> >> >> > recoveryBallotBoxes object graph? Sorting by retained heap may help
> >> in
> >> >> > determining this. It would be nice to know what is the type
> >> histogram
> >> >> of
> >> >> > what's inside recoveryBallotBoxes and where the bulk of heap usage
> >> >> > resides.
> >> >> >
> >> >> > Regards,
> >> >> > --
> >> >> > Ilya Kasnacheev
> >> >> >
> >> >> >
> >> >> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
> >> >>
> >> >> > liquid_ninja2k@
> >> >>
> >> >> > &gt;:
> >> >> >
> >> >> >> I've attached another set of screenshots, might be more clear.
> >> >> >> heap.zip
> >> >> >> &lt;
> >> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >> >> >>
> >> >> >>
> >> >> >> mvkarp wrote
> >> >> >> > I've attached some extra screenshots showing what is inside
> these
> >> >> >> records
> >> >> >> > and path to GC roots. heap.zip
> >> >> >> > &lt;
> >> >> >>
> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Ok, there are no exceptions in the ignite logs for the client JVMs but I've
attached the log for one of the problem servers. Looks like a few errors but
I am unable to determine the root cause.
ignite-46073e05.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/ignite-46073e05.zip>  


ilya.kasnacheev wrote
> Hello!
> 
> This is very strange, since we expect this collection to be cleared on
> exchange.
> 
> Please make sure you don't have any stray exceptions during exchange in
> your logs.
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> пт, 8 нояб. 2019 г. в 12:49, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
> 
>> Hi,
>>
>> This is not the case. Always only a maximum total of two server nodes.
>> One
>> JVM server on each. However there are many client JVMs that start and
>> stop
>> caches with setClientMode=true. It looks like one of the server instances
>> is
>> immune to the issue, whilst the most newly created one gets the leak,
>> with
>> a
>> lot of partition exchanges happening for EVT_NODE_JOINED and
>> EVT_NODE_LEFT
>> (one of the nodes don't get any of these partition exchanges, however the
>> exact server node that gets this can alternate so its not linked to one
>> node
>> in particular but seems to be linked to the most newly launched server).
>>
>>
>> ilya.kasnacheev wrote
>> > Hello!
>> >
>> > How many nodes do you have in your cluster?
>> >
>> > From the dump it seems that the number of server nodes is in thousands.
>> Is
>> > this the case?
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > пт, 8 нояб. 2019 г. в 10:26, mvkarp &lt;
>>
>> > liquid_ninja2k@
>>
>> > &gt;:
>> >
>> >> Let me know if these help or if you need anything more specific.
>> >> recoveryBallotBoxes.zip
>> >> <
>> >>
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip
>> >
>> >>
>> >>
>> >>
>> >> ilya.kasnacheev wrote
>> >> > Hello!
>> >> >
>> >> > Can you please check whether there are any especially large objects
>> >> inside
>> >> > recoveryBallotBoxes object graph? Sorting by retained heap may help
>> in
>> >> > determining this. It would be nice to know what is the type
>> histogram
>> >> of
>> >> > what's inside recoveryBallotBoxes and where the bulk of heap usage
>> >> > resides.
>> >> >
>> >> > Regards,
>> >> > --
>> >> > Ilya Kasnacheev
>> >> >
>> >> >
>> >> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
>> >>
>> >> > liquid_ninja2k@
>> >>
>> >> > &gt;:
>> >> >
>> >> >> I've attached another set of screenshots, might be more clear.
>> >> >> heap.zip
>> >> >> &lt;
>> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >> >>
>> >> >>
>> >> >> mvkarp wrote
>> >> >> > I've attached some extra screenshots showing what is inside these
>> >> >> records
>> >> >> > and path to GC roots. heap.zip
>> >> >> > &lt;
>> >> >>
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

This is very strange, since we expect this collection to be cleared on
exchange.

Please make sure you don't have any stray exceptions during exchange in
your logs.

Regards,
-- 
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 12:49, mvkarp <li...@hotmail.com>:

> Hi,
>
> This is not the case. Always only a maximum total of two server nodes. One
> JVM server on each. However there are many client JVMs that start and stop
> caches with setClientMode=true. It looks like one of the server instances
> is
> immune to the issue, whilst the most newly created one gets the leak, with
> a
> lot of partition exchanges happening for EVT_NODE_JOINED and EVT_NODE_LEFT
> (one of the nodes don't get any of these partition exchanges, however the
> exact server node that gets this can alternate so its not linked to one
> node
> in particular but seems to be linked to the most newly launched server).
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > How many nodes do you have in your cluster?
> >
> > From the dump it seems that the number of server nodes is in thousands.
> Is
> > this the case?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пт, 8 нояб. 2019 г. в 10:26, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >
> >> Let me know if these help or if you need anything more specific.
> >> recoveryBallotBoxes.zip
> >> <
> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip
> >
> >>
> >>
> >>
> >> ilya.kasnacheev wrote
> >> > Hello!
> >> >
> >> > Can you please check whether there are any especially large objects
> >> inside
> >> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
> >> > determining this. It would be nice to know what is the type histogram
> >> of
> >> > what's inside recoveryBallotBoxes and where the bulk of heap usage
> >> > resides.
> >> >
> >> > Regards,
> >> > --
> >> > Ilya Kasnacheev
> >> >
> >> >
> >> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
> >>
> >> > liquid_ninja2k@
> >>
> >> > &gt;:
> >> >
> >> >> I've attached another set of screenshots, might be more clear.
> >> >> heap.zip
> >> >> &lt;
> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >> >>
> >> >>
> >> >> mvkarp wrote
> >> >> > I've attached some extra screenshots showing what is inside these
> >> >> records
> >> >> > and path to GC roots. heap.zip
> >> >> > &lt;
> >> >>
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Hi, 

This is not the case. Always only a maximum total of two server nodes. One
JVM server on each. However there are many client JVMs that start and stop
caches with setClientMode=true. It looks like one of the server instances is
immune to the issue, whilst the most newly created one gets the leak, with a
lot of partition exchanges happening for EVT_NODE_JOINED and EVT_NODE_LEFT
(one of the nodes don't get any of these partition exchanges, however the
exact server node that gets this can alternate so its not linked to one node
in particular but seems to be linked to the most newly launched server).


ilya.kasnacheev wrote
> Hello!
> 
> How many nodes do you have in your cluster?
> 
> From the dump it seems that the number of server nodes is in thousands. Is
> this the case?
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> пт, 8 нояб. 2019 г. в 10:26, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
> 
>> Let me know if these help or if you need anything more specific.
>> recoveryBallotBoxes.zip
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>
>>
>>
>>
>> ilya.kasnacheev wrote
>> > Hello!
>> >
>> > Can you please check whether there are any especially large objects
>> inside
>> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
>> > determining this. It would be nice to know what is the type histogram
>> of
>> > what's inside recoveryBallotBoxes and where the bulk of heap usage
>> > resides.
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
>>
>> > liquid_ninja2k@
>>
>> > &gt;:
>> >
>> >> I've attached another set of screenshots, might be more clear.
>> >> heap.zip
>> >> &lt;
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >>
>> >>
>> >> mvkarp wrote
>> >> > I've attached some extra screenshots showing what is inside these
>> >> records
>> >> > and path to GC roots. heap.zip
>> >> > &lt;
>> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

How many nodes do you have in your cluster?

From the dump it seems that the number of server nodes is in thousands. Is
this the case?

Regards,
-- 
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 10:26, mvkarp <li...@hotmail.com>:

> Let me know if these help or if you need anything more specific.
> recoveryBallotBoxes.zip
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>
>
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > Can you please check whether there are any especially large objects
> inside
> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
> > determining this. It would be nice to know what is the type histogram of
> > what's inside recoveryBallotBoxes and where the bulk of heap usage
> > resides.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >
> >> I've attached another set of screenshots, might be more clear.
> >> heap.zip
> >> &lt;
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >>
> >>
> >> mvkarp wrote
> >> > I've attached some extra screenshots showing what is inside these
> >> records
> >> > and path to GC roots. heap.zip
> >> > &lt;
> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

Let me know if these help or if you need anything more specific.
recoveryBallotBoxes.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>  


ilya.kasnacheev wrote
> Hello!
> 
> Can you please check whether there are any especially large objects inside
> recoveryBallotBoxes object graph? Sorting by retained heap may help in
> determining this. It would be nice to know what is the type histogram of
> what's inside recoveryBallotBoxes and where the bulk of heap usage
> resides.
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
> 
>> I've attached another set of screenshots, might be more clear.
>> heap.zip
>> &lt;http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>>
>>
>> mvkarp wrote
>> > I've attached some extra screenshots showing what is inside these
>> records
>> > and path to GC roots. heap.zip
>> > &lt;
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Can you please check whether there are any especially large objects inside
recoveryBallotBoxes object graph? Sorting by retained heap may help in
determining this. It would be nice to know what is the type histogram of
what's inside recoveryBallotBoxes and where the bulk of heap usage resides.

Regards,
-- 
Ilya Kasnacheev


чт, 7 нояб. 2019 г. в 06:23, mvkarp <li...@hotmail.com>:

> I've attached another set of screenshots, might be more clear.
> heap.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>
>
>
> mvkarp wrote
> > I've attached some extra screenshots showing what is inside these records
> > and path to GC roots. heap.zip
> > &lt;
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

I've attached another set of screenshots, might be more clear.
heap.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>  


mvkarp wrote
> I've attached some extra screenshots showing what is inside these records
> and path to GC roots. heap.zip
> &lt;http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;  
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by mvkarp <li...@hotmail.com>.

I've created ticket, not too sure about how to go about creating a reproducer
for this - https://issues.apache.org/jira/browse/IGNITE-12350

I've attached some extra screenshots showing what is inside these records
and path to GC roots. heap.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>  





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Can you please show contents of some of these records, as well as their
referential path to MvccProcessorImpl?

Regards,
-- 
Ilya Kasnacheev


пт, 1 нояб. 2019 г. в 03:25, mvkarp <li...@hotmail.com>:

> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg>
>
>
> I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
> (there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>