You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Roman Shtykh <rs...@yahoo.com.INVALID> on 2018/08/20 08:19:09 UTC

Unknown known issue on cache rebalancing delayed

Igniters,
I have found "Known issue, possible deadlock in case of low priority cache rebalancing delayed" comment in GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please explain when using rebalance delay can be an issue and why?

-- Roman

Re: Unknown known issue on cache rebalancing delayed

Posted by Roman Shtykh <rs...@yahoo.com.INVALID>.
Anton, Maxim, thanks for following up! Looks like a good enough trade-off.
Sorry, couldn't catch the conversation because of the different time zone ;)

    On Tuesday, September 4, 2018, 7:54:05 p.m. GMT+9, Anton Vinogradov <av...@apache.org> wrote:  
 
 Maxim,

Let's create a branch with 10 checks of Sync and 10 checks of Async.
Then, run it 20 times at TC.
This should be enough I think.

вт, 4 сент. 2018 г. в 13:09, Maxim Muzafarov <ma...@gmail.com>:

> Anton,
>
> I agree with you 20 time is not enough. I've checked the single run of the
> test class - it consumes ~7min per each execution.
> CacheSuite8 total execution timeout - 210 min, so we can perform only 30
> class execution in this suite. Our strategy here is
> to `20 times within single` and put into the TC queue 50 runs. Total ~7000
> min or 5 days.
>
> Not sure that we should perform exactly 1000 executions, hopefully, we will
> stop adding to the queue new tasks at some point.
>
> On Tue, 4 Sep 2018 at 12:59 Anton Vinogradov <av...@apache.org> wrote:
>
> > Maxim,
> > 20 is not 1k :)
> > Also, you forgot to check GridCacheRebalancingAsyncSelfTest
> >
> > I'm not sure we should have exactly 1k runs, but 20 is definitely not
> > enough.
> >
> > Roman,
> > I propose to use IDEA "run until failure" feature and perform test
> locally
> > (at your PC) while you're not using PC.
> >
> > вт, 4 сент. 2018 г. в 12:51, Maxim Muzafarov <ma...@gmail.com>:
> >
> > > Roman, Anton,
> > >
> > > I've already created additional PR [2] all and run it on TC [1].
> > > Please, follow up with the results.
> > >
> > > [1]
> > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Cache8&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F4676%2Fhead
> > > [2] https://github.com/apache/ignite/pull/4676/files
> > >
> > >
> > > On Tue, 4 Sep 2018 at 12:46 Roman Shtykh <rs...@yahoo.com.invalid>
> > > wrote:
> > >
> > > > Anton,
> > > > Thank you. I would like to recheck it. How can this (1_000 runs) be
> > done
> > > > in TC?
> > > >
> > > >
> > > >    On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton
> > Vinogradov <
> > > > av@apache.org> wrote:
> > > >
> > > >  Roman,
> > > >
> > > > I see you uncommented this line.
> > > > I do not remember deadlock detail, but I remember it was the
> extremely
> > > rare
> > > > case.
> > > > I found and "fixed" it some days before merge when I had 24x7 sanity
> > > check
> > > > week :)
> > > >
> > > > So, I propose to have at least 1_000 runs of this tests before
> keeping
> > > this
> > > > uncommented.
> > > >
> > > >
> > > >
> > > > вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:
> > > >
> > > > > Roman,
> > > > >
> > > > > I worked recently on rebalance improvements and haven't found any
> > > > problems
> > > > > with delayed cache rebalacne.
> > > > > Agree with you - let's uncomment this and remove scary comment.
> Will
> > > you
> > > > > create a ticket for it?
> > > > >
> > > > > In case of any problems we can easily detec deadlock with newly
> > > > configured
> > > > > `FailureHandler`.
> > > > >
> > > > > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com>
> wrote:
> > > > >
> > > > > > Hi Maxim,
> > > > > >
> > > > > > I have some issues with a cluster with rebalance delay enabled,
> but
> > > > need
> > > > > > to check more -- if I find it's related I'll share.
> > > > > > Just wanted to make sure it's not an issue anymore from someone
> > > working
> > > > > on
> > > > > > rebalancing. We should remove that comment then, it looks scary
> :)
> > > > > >
> > > > > > --
> > > > > > Roman Shtykh
> > > > > >
> > > > > >
> > > > > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim
> Muzafarov <
> > > > > > maxmuzaf@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Hello Roman,
> > > > > >
> > > > > > Did you faced with real issue of delayed rebalance or it's just
> > only
> > > > for
> > > > > > your personal interest?
> > > > > > If yes, please, share details and we will try to help you.
> > > > > >
> > > > > > As for this comment I don't think he is actual. That change was
> in
> > > > 2015.
> > > > > > Much has changed
> > > > > > within rebalance process since that time. I've uncommented it and
> > > > > > rechecked with that
> > > > > > cache configuration and haven't seen any failed tests or issues.
> > > > > >
> > > > > > Probably, that problem was about cache in SYNC mode does not
> start
> > > util
> > > > > it
> > > > > > loads all data
> > > > > > from other nodes. But currently delayed rebalance works the same
> > way
> > > as
> > > > > > IgniteCache#rebalance(),
> > > > > > so you can `setRebalanceDelay` to `-1` and call it manually to
> > check.
> > > > > >
> > > > > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh
> > <rshtykh@yahoo.com.invalid
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > Igniters,
> > > > > > I have found "Known issue, possible deadlock in case of low
> > priority
> > > > > cache
> > > > > > rebalancing delayed" comment in
> > > > > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please
> > > > explain
> > > > > > when using rebalance delay can be an issue and why?
> > > > > >
> > > > > > -- Roman
> > > > > >
> > > > > > --
> > > > > > --
> > > > > > Maxim Muzafarov
> > > > > >
> > > > > --
> > > > > --
> > > > > Maxim Muzafarov
> > > > >
> > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
> --
> --
> Maxim Muzafarov
>  

Re: Unknown known issue on cache rebalancing delayed

Posted by Anton Vinogradov <av...@apache.org>.
Maxim,

Let's create a branch with 10 checks of Sync and 10 checks of Async.
Then, run it 20 times at TC.
This should be enough I think.

вт, 4 сент. 2018 г. в 13:09, Maxim Muzafarov <ma...@gmail.com>:

> Anton,
>
> I agree with you 20 time is not enough. I've checked the single run of the
> test class - it consumes ~7min per each execution.
> CacheSuite8 total execution timeout - 210 min, so we can perform only 30
> class execution in this suite. Our strategy here is
> to `20 times within single` and put into the TC queue 50 runs. Total ~7000
> min or 5 days.
>
> Not sure that we should perform exactly 1000 executions, hopefully, we will
> stop adding to the queue new tasks at some point.
>
> On Tue, 4 Sep 2018 at 12:59 Anton Vinogradov <av...@apache.org> wrote:
>
> > Maxim,
> > 20 is not 1k :)
> > Also, you forgot to check GridCacheRebalancingAsyncSelfTest
> >
> > I'm not sure we should have exactly 1k runs, but 20 is definitely not
> > enough.
> >
> > Roman,
> > I propose to use IDEA "run until failure" feature and perform test
> locally
> > (at your PC) while you're not using PC.
> >
> > вт, 4 сент. 2018 г. в 12:51, Maxim Muzafarov <ma...@gmail.com>:
> >
> > > Roman, Anton,
> > >
> > > I've already created additional PR [2] all and run it on TC [1].
> > > Please, follow up with the results.
> > >
> > > [1]
> > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Cache8&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F4676%2Fhead
> > > [2] https://github.com/apache/ignite/pull/4676/files
> > >
> > >
> > > On Tue, 4 Sep 2018 at 12:46 Roman Shtykh <rs...@yahoo.com.invalid>
> > > wrote:
> > >
> > > > Anton,
> > > > Thank you. I would like to recheck it. How can this (1_000 runs) be
> > done
> > > > in TC?
> > > >
> > > >
> > > >     On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton
> > Vinogradov <
> > > > av@apache.org> wrote:
> > > >
> > > >  Roman,
> > > >
> > > > I see you uncommented this line.
> > > > I do not remember deadlock detail, but I remember it was the
> extremely
> > > rare
> > > > case.
> > > > I found and "fixed" it some days before merge when I had 24x7 sanity
> > > check
> > > > week :)
> > > >
> > > > So, I propose to have at least 1_000 runs of this tests before
> keeping
> > > this
> > > > uncommented.
> > > >
> > > >
> > > >
> > > > вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:
> > > >
> > > > > Roman,
> > > > >
> > > > > I worked recently on rebalance improvements and haven't found any
> > > > problems
> > > > > with delayed cache rebalacne.
> > > > > Agree with you - let's uncomment this and remove scary comment.
> Will
> > > you
> > > > > create a ticket for it?
> > > > >
> > > > > In case of any problems we can easily detec deadlock with newly
> > > > configured
> > > > > `FailureHandler`.
> > > > >
> > > > > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com>
> wrote:
> > > > >
> > > > > > Hi Maxim,
> > > > > >
> > > > > > I have some issues with a cluster with rebalance delay enabled,
> but
> > > > need
> > > > > > to check more -- if I find it's related I'll share.
> > > > > > Just wanted to make sure it's not an issue anymore from someone
> > > working
> > > > > on
> > > > > > rebalancing. We should remove that comment then, it looks scary
> :)
> > > > > >
> > > > > > --
> > > > > > Roman Shtykh
> > > > > >
> > > > > >
> > > > > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim
> Muzafarov <
> > > > > > maxmuzaf@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Hello Roman,
> > > > > >
> > > > > > Did you faced with real issue of delayed rebalance or it's just
> > only
> > > > for
> > > > > > your personal interest?
> > > > > > If yes, please, share details and we will try to help you.
> > > > > >
> > > > > > As for this comment I don't think he is actual. That change was
> in
> > > > 2015.
> > > > > > Much has changed
> > > > > > within rebalance process since that time. I've uncommented it and
> > > > > > rechecked with that
> > > > > > cache configuration and haven't seen any failed tests or issues.
> > > > > >
> > > > > > Probably, that problem was about cache in SYNC mode does not
> start
> > > util
> > > > > it
> > > > > > loads all data
> > > > > > from other nodes. But currently delayed rebalance works the same
> > way
> > > as
> > > > > > IgniteCache#rebalance(),
> > > > > > so you can `setRebalanceDelay` to `-1` and call it manually to
> > check.
> > > > > >
> > > > > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh
> > <rshtykh@yahoo.com.invalid
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > Igniters,
> > > > > > I have found "Known issue, possible deadlock in case of low
> > priority
> > > > > cache
> > > > > > rebalancing delayed" comment in
> > > > > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please
> > > > explain
> > > > > > when using rebalance delay can be an issue and why?
> > > > > >
> > > > > > -- Roman
> > > > > >
> > > > > > --
> > > > > > --
> > > > > > Maxim Muzafarov
> > > > > >
> > > > > --
> > > > > --
> > > > > Maxim Muzafarov
> > > > >
> > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
> --
> --
> Maxim Muzafarov
>

Re: Unknown known issue on cache rebalancing delayed

Posted by Maxim Muzafarov <ma...@gmail.com>.
Anton,

I agree with you 20 time is not enough. I've checked the single run of the
test class - it consumes ~7min per each execution.
CacheSuite8 total execution timeout - 210 min, so we can perform only 30
class execution in this suite. Our strategy here is
to `20 times within single` and put into the TC queue 50 runs. Total ~7000
min or 5 days.

Not sure that we should perform exactly 1000 executions, hopefully, we will
stop adding to the queue new tasks at some point.

On Tue, 4 Sep 2018 at 12:59 Anton Vinogradov <av...@apache.org> wrote:

> Maxim,
> 20 is not 1k :)
> Also, you forgot to check GridCacheRebalancingAsyncSelfTest
>
> I'm not sure we should have exactly 1k runs, but 20 is definitely not
> enough.
>
> Roman,
> I propose to use IDEA "run until failure" feature and perform test locally
> (at your PC) while you're not using PC.
>
> вт, 4 сент. 2018 г. в 12:51, Maxim Muzafarov <ma...@gmail.com>:
>
> > Roman, Anton,
> >
> > I've already created additional PR [2] all and run it on TC [1].
> > Please, follow up with the results.
> >
> > [1]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Cache8&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F4676%2Fhead
> > [2] https://github.com/apache/ignite/pull/4676/files
> >
> >
> > On Tue, 4 Sep 2018 at 12:46 Roman Shtykh <rs...@yahoo.com.invalid>
> > wrote:
> >
> > > Anton,
> > > Thank you. I would like to recheck it. How can this (1_000 runs) be
> done
> > > in TC?
> > >
> > >
> > >     On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton
> Vinogradov <
> > > av@apache.org> wrote:
> > >
> > >  Roman,
> > >
> > > I see you uncommented this line.
> > > I do not remember deadlock detail, but I remember it was the extremely
> > rare
> > > case.
> > > I found and "fixed" it some days before merge when I had 24x7 sanity
> > check
> > > week :)
> > >
> > > So, I propose to have at least 1_000 runs of this tests before keeping
> > this
> > > uncommented.
> > >
> > >
> > >
> > > вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:
> > >
> > > > Roman,
> > > >
> > > > I worked recently on rebalance improvements and haven't found any
> > > problems
> > > > with delayed cache rebalacne.
> > > > Agree with you - let's uncomment this and remove scary comment. Will
> > you
> > > > create a ticket for it?
> > > >
> > > > In case of any problems we can easily detec deadlock with newly
> > > configured
> > > > `FailureHandler`.
> > > >
> > > > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com> wrote:
> > > >
> > > > > Hi Maxim,
> > > > >
> > > > > I have some issues with a cluster with rebalance delay enabled, but
> > > need
> > > > > to check more -- if I find it's related I'll share.
> > > > > Just wanted to make sure it's not an issue anymore from someone
> > working
> > > > on
> > > > > rebalancing. We should remove that comment then, it looks scary :)
> > > > >
> > > > > --
> > > > > Roman Shtykh
> > > > >
> > > > >
> > > > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> > > > > maxmuzaf@gmail.com> wrote:
> > > > >
> > > > >
> > > > > Hello Roman,
> > > > >
> > > > > Did you faced with real issue of delayed rebalance or it's just
> only
> > > for
> > > > > your personal interest?
> > > > > If yes, please, share details and we will try to help you.
> > > > >
> > > > > As for this comment I don't think he is actual. That change was in
> > > 2015.
> > > > > Much has changed
> > > > > within rebalance process since that time. I've uncommented it and
> > > > > rechecked with that
> > > > > cache configuration and haven't seen any failed tests or issues.
> > > > >
> > > > > Probably, that problem was about cache in SYNC mode does not start
> > util
> > > > it
> > > > > loads all data
> > > > > from other nodes. But currently delayed rebalance works the same
> way
> > as
> > > > > IgniteCache#rebalance(),
> > > > > so you can `setRebalanceDelay` to `-1` and call it manually to
> check.
> > > > >
> > > > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh
> <rshtykh@yahoo.com.invalid
> > >
> > > > > wrote:
> > > > >
> > > > > Igniters,
> > > > > I have found "Known issue, possible deadlock in case of low
> priority
> > > > cache
> > > > > rebalancing delayed" comment in
> > > > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please
> > > explain
> > > > > when using rebalance delay can be an issue and why?
> > > > >
> > > > > -- Roman
> > > > >
> > > > > --
> > > > > --
> > > > > Maxim Muzafarov
> > > > >
> > > > --
> > > > --
> > > > Maxim Muzafarov
> > > >
> >
> > --
> > --
> > Maxim Muzafarov
> >
>
-- 
--
Maxim Muzafarov

Re: Unknown known issue on cache rebalancing delayed

Posted by Anton Vinogradov <av...@apache.org>.
Maxim,
20 is not 1k :)
Also, you forgot to check GridCacheRebalancingAsyncSelfTest

I'm not sure we should have exactly 1k runs, but 20 is definitely not
enough.

Roman,
I propose to use IDEA "run until failure" feature and perform test locally
(at your PC) while you're not using PC.

вт, 4 сент. 2018 г. в 12:51, Maxim Muzafarov <ma...@gmail.com>:

> Roman, Anton,
>
> I've already created additional PR [2] all and run it on TC [1].
> Please, follow up with the results.
>
> [1]
>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Cache8&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F4676%2Fhead
> [2] https://github.com/apache/ignite/pull/4676/files
>
>
> On Tue, 4 Sep 2018 at 12:46 Roman Shtykh <rs...@yahoo.com.invalid>
> wrote:
>
> > Anton,
> > Thank you. I would like to recheck it. How can this (1_000 runs) be done
> > in TC?
> >
> >
> >     On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton Vinogradov <
> > av@apache.org> wrote:
> >
> >  Roman,
> >
> > I see you uncommented this line.
> > I do not remember deadlock detail, but I remember it was the extremely
> rare
> > case.
> > I found and "fixed" it some days before merge when I had 24x7 sanity
> check
> > week :)
> >
> > So, I propose to have at least 1_000 runs of this tests before keeping
> this
> > uncommented.
> >
> >
> >
> > вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:
> >
> > > Roman,
> > >
> > > I worked recently on rebalance improvements and haven't found any
> > problems
> > > with delayed cache rebalacne.
> > > Agree with you - let's uncomment this and remove scary comment. Will
> you
> > > create a ticket for it?
> > >
> > > In case of any problems we can easily detec deadlock with newly
> > configured
> > > `FailureHandler`.
> > >
> > > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com> wrote:
> > >
> > > > Hi Maxim,
> > > >
> > > > I have some issues with a cluster with rebalance delay enabled, but
> > need
> > > > to check more -- if I find it's related I'll share.
> > > > Just wanted to make sure it's not an issue anymore from someone
> working
> > > on
> > > > rebalancing. We should remove that comment then, it looks scary :)
> > > >
> > > > --
> > > > Roman Shtykh
> > > >
> > > >
> > > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> > > > maxmuzaf@gmail.com> wrote:
> > > >
> > > >
> > > > Hello Roman,
> > > >
> > > > Did you faced with real issue of delayed rebalance or it's just only
> > for
> > > > your personal interest?
> > > > If yes, please, share details and we will try to help you.
> > > >
> > > > As for this comment I don't think he is actual. That change was in
> > 2015.
> > > > Much has changed
> > > > within rebalance process since that time. I've uncommented it and
> > > > rechecked with that
> > > > cache configuration and haven't seen any failed tests or issues.
> > > >
> > > > Probably, that problem was about cache in SYNC mode does not start
> util
> > > it
> > > > loads all data
> > > > from other nodes. But currently delayed rebalance works the same way
> as
> > > > IgniteCache#rebalance(),
> > > > so you can `setRebalanceDelay` to `-1` and call it manually to check.
> > > >
> > > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rshtykh@yahoo.com.invalid
> >
> > > > wrote:
> > > >
> > > > Igniters,
> > > > I have found "Known issue, possible deadlock in case of low priority
> > > cache
> > > > rebalancing delayed" comment in
> > > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please
> > explain
> > > > when using rebalance delay can be an issue and why?
> > > >
> > > > -- Roman
> > > >
> > > > --
> > > > --
> > > > Maxim Muzafarov
> > > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
>
> --
> --
> Maxim Muzafarov
>

Re: Unknown known issue on cache rebalancing delayed

Posted by Maxim Muzafarov <ma...@gmail.com>.
Roman, Anton,

I've already created additional PR [2] all and run it on TC [1].
Please, follow up with the results.

[1]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Cache8&tab=buildTypeStatusDiv&branch_IgniteTests24Java8=pull%2F4676%2Fhead
[2] https://github.com/apache/ignite/pull/4676/files


On Tue, 4 Sep 2018 at 12:46 Roman Shtykh <rs...@yahoo.com.invalid> wrote:

> Anton,
> Thank you. I would like to recheck it. How can this (1_000 runs) be done
> in TC?
>
>
>     On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton Vinogradov <
> av@apache.org> wrote:
>
>  Roman,
>
> I see you uncommented this line.
> I do not remember deadlock detail, but I remember it was the extremely rare
> case.
> I found and "fixed" it some days before merge when I had 24x7 sanity check
> week :)
>
> So, I propose to have at least 1_000 runs of this tests before keeping this
> uncommented.
>
>
>
> вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:
>
> > Roman,
> >
> > I worked recently on rebalance improvements and haven't found any
> problems
> > with delayed cache rebalacne.
> > Agree with you - let's uncomment this and remove scary comment. Will you
> > create a ticket for it?
> >
> > In case of any problems we can easily detec deadlock with newly
> configured
> > `FailureHandler`.
> >
> > On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com> wrote:
> >
> > > Hi Maxim,
> > >
> > > I have some issues with a cluster with rebalance delay enabled, but
> need
> > > to check more -- if I find it's related I'll share.
> > > Just wanted to make sure it's not an issue anymore from someone working
> > on
> > > rebalancing. We should remove that comment then, it looks scary :)
> > >
> > > --
> > > Roman Shtykh
> > >
> > >
> > > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> > > maxmuzaf@gmail.com> wrote:
> > >
> > >
> > > Hello Roman,
> > >
> > > Did you faced with real issue of delayed rebalance or it's just only
> for
> > > your personal interest?
> > > If yes, please, share details and we will try to help you.
> > >
> > > As for this comment I don't think he is actual. That change was in
> 2015.
> > > Much has changed
> > > within rebalance process since that time. I've uncommented it and
> > > rechecked with that
> > > cache configuration and haven't seen any failed tests or issues.
> > >
> > > Probably, that problem was about cache in SYNC mode does not start util
> > it
> > > loads all data
> > > from other nodes. But currently delayed rebalance works the same way as
> > > IgniteCache#rebalance(),
> > > so you can `setRebalanceDelay` to `-1` and call it manually to check.
> > >
> > > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rs...@yahoo.com.invalid>
> > > wrote:
> > >
> > > Igniters,
> > > I have found "Known issue, possible deadlock in case of low priority
> > cache
> > > rebalancing delayed" comment in
> > > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please
> explain
> > > when using rebalance delay can be an issue and why?
> > >
> > > -- Roman
> > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> > --
> > --
> > Maxim Muzafarov
> >

-- 
--
Maxim Muzafarov

Re: Unknown known issue on cache rebalancing delayed

Posted by Roman Shtykh <rs...@yahoo.com.INVALID>.
Anton,
Thank you. I would like to recheck it. How can this (1_000 runs) be done in TC?


    On Tuesday, September 4, 2018, 5:42:01 p.m. GMT+9, Anton Vinogradov <av...@apache.org> wrote:  
 
 Roman,

I see you uncommented this line.
I do not remember deadlock detail, but I remember it was the extremely rare
case.
I found and "fixed" it some days before merge when I had 24x7 sanity check
week :)

So, I propose to have at least 1_000 runs of this tests before keeping this
uncommented.



вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:

> Roman,
>
> I worked recently on rebalance improvements and haven't found any problems
> with delayed cache rebalacne.
> Agree with you - let's uncomment this and remove scary comment. Will you
> create a ticket for it?
>
> In case of any problems we can easily detec deadlock with newly configured
> `FailureHandler`.
>
> On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com> wrote:
>
> > Hi Maxim,
> >
> > I have some issues with a cluster with rebalance delay enabled, but need
> > to check more -- if I find it's related I'll share.
> > Just wanted to make sure it's not an issue anymore from someone working
> on
> > rebalancing. We should remove that comment then, it looks scary :)
> >
> > --
> > Roman Shtykh
> >
> >
> > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> > maxmuzaf@gmail.com> wrote:
> >
> >
> > Hello Roman,
> >
> > Did you faced with real issue of delayed rebalance or it's just only for
> > your personal interest?
> > If yes, please, share details and we will try to help you.
> >
> > As for this comment I don't think he is actual. That change was in 2015.
> > Much has changed
> > within rebalance process since that time. I've uncommented it and
> > rechecked with that
> > cache configuration and haven't seen any failed tests or issues.
> >
> > Probably, that problem was about cache in SYNC mode does not start util
> it
> > loads all data
> > from other nodes. But currently delayed rebalance works the same way as
> > IgniteCache#rebalance(),
> > so you can `setRebalanceDelay` to `-1` and call it manually to check.
> >
> > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rs...@yahoo.com.invalid>
> > wrote:
> >
> > Igniters,
> > I have found "Known issue, possible deadlock in case of low priority
> cache
> > rebalancing delayed" comment in
> > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please explain
> > when using rebalance delay can be an issue and why?
> >
> > -- Roman
> >
> > --
> > --
> > Maxim Muzafarov
> >
> --
> --
> Maxim Muzafarov
>  

Re: Unknown known issue on cache rebalancing delayed

Posted by Anton Vinogradov <av...@apache.org>.
Roman,

I see you uncommented this line.
I do not remember deadlock detail, but I remember it was the extremely rare
case.
I found and "fixed" it some days before merge when I had 24x7 sanity check
week :)

So, I propose to have at least 1_000 runs of this tests before keeping this
uncommented.



вт, 21 авг. 2018 г. в 11:08, Maxim Muzafarov <ma...@gmail.com>:

> Roman,
>
> I worked recently on rebalance improvements and haven't found any problems
> with delayed cache rebalacne.
> Agree with you - let's uncomment this and remove scary comment. Will you
> create a ticket for it?
>
> In case of any problems we can easily detec deadlock with newly configured
> `FailureHandler`.
>
> On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com> wrote:
>
> > Hi Maxim,
> >
> > I have some issues with a cluster with rebalance delay enabled, but need
> > to check more -- if I find it's related I'll share.
> > Just wanted to make sure it's not an issue anymore from someone working
> on
> > rebalancing. We should remove that comment then, it looks scary :)
> >
> > --
> > Roman Shtykh
> >
> >
> > On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> > maxmuzaf@gmail.com> wrote:
> >
> >
> > Hello Roman,
> >
> > Did you faced with real issue of delayed rebalance or it's just only for
> > your personal interest?
> > If yes, please, share details and we will try to help you.
> >
> > As for this comment I don't think he is actual. That change was in 2015.
> > Much has changed
> > within rebalance process since that time. I've uncommented it and
> > rechecked with that
> > cache configuration and haven't seen any failed tests or issues.
> >
> > Probably, that problem was about cache in SYNC mode does not start util
> it
> > loads all data
> > from other nodes. But currently delayed rebalance works the same way as
> > IgniteCache#rebalance(),
> > so you can `setRebalanceDelay` to `-1` and call it manually to check.
> >
> > On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rs...@yahoo.com.invalid>
> > wrote:
> >
> > Igniters,
> > I have found "Known issue, possible deadlock in case of low priority
> cache
> > rebalancing delayed" comment in
> > GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please explain
> > when using rebalance delay can be an issue and why?
> >
> > -- Roman
> >
> > --
> > --
> > Maxim Muzafarov
> >
> --
> --
> Maxim Muzafarov
>

Re: Unknown known issue on cache rebalancing delayed

Posted by Maxim Muzafarov <ma...@gmail.com>.
Roman,

I worked recently on rebalance improvements and haven't found any problems
with delayed cache rebalacne.
Agree with you - let's uncomment this and remove scary comment. Will you
create a ticket for it?

In case of any problems we can easily detec deadlock with newly configured
`FailureHandler`.

On Tue, 21 Aug 2018 at 03:49 Roman Shtykh <rs...@yahoo.com> wrote:

> Hi Maxim,
>
> I have some issues with a cluster with rebalance delay enabled, but need
> to check more -- if I find it's related I'll share.
> Just wanted to make sure it's not an issue anymore from someone working on
> rebalancing. We should remove that comment then, it looks scary :)
>
> --
> Roman Shtykh
>
>
> On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <
> maxmuzaf@gmail.com> wrote:
>
>
> Hello Roman,
>
> Did you faced with real issue of delayed rebalance or it's just only for
> your personal interest?
> If yes, please, share details and we will try to help you.
>
> As for this comment I don't think he is actual. That change was in 2015.
> Much has changed
> within rebalance process since that time. I've uncommented it and
> rechecked with that
> cache configuration and haven't seen any failed tests or issues.
>
> Probably, that problem was about cache in SYNC mode does not start util it
> loads all data
> from other nodes. But currently delayed rebalance works the same way as
> IgniteCache#rebalance(),
> so you can `setRebalanceDelay` to `-1` and call it manually to check.
>
> On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rs...@yahoo.com.invalid>
> wrote:
>
> Igniters,
> I have found "Known issue, possible deadlock in case of low priority cache
> rebalancing delayed" comment in
> GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please explain
> when using rebalance delay can be an issue and why?
>
> -- Roman
>
> --
> --
> Maxim Muzafarov
>
-- 
--
Maxim Muzafarov

Re: Unknown known issue on cache rebalancing delayed

Posted by Roman Shtykh <rs...@yahoo.com.INVALID>.
Hi Maxim,

I have some issues with a cluster with rebalance delay enabled, but need to check more -- if I find it's related I'll share.Just wanted to make sure it's not an issue anymore from someone working on rebalancing. We should remove that comment then, it looks scary :)

--
Roman Shtykh 

    On Tuesday, August 21, 2018, 12:49:00 a.m. GMT+9, Maxim Muzafarov <ma...@gmail.com> wrote:  
 
 Hello Roman,
Did you faced with real issue of delayed rebalance or it's just only for your personal interest? If yes, please, share details and we will try to help you.
As for this comment I don't think he is actual. That change was in 2015. Much has changed within rebalance process since that time. I've uncommented it and rechecked with thatcache configuration and haven't seen any failed tests or issues. 
Probably, that problem was about cache in SYNC mode does not start util it loads all data from other nodes. But currently delayed rebalance works the same way as IgniteCache#rebalance(), so you can `setRebalanceDelay` to `-1` and call it manually to check.
On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rs...@yahoo.com.invalid> wrote:

Igniters,
I have found "Known issue, possible deadlock in case of low priority cache rebalancing delayed" comment in GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please explain when using rebalance delay can be an issue and why?

-- Roman

-- 
--
Maxim Muzafarov  

Re: Unknown known issue on cache rebalancing delayed

Posted by Maxim Muzafarov <ma...@gmail.com>.
Hello Roman,

Did you faced with real issue of delayed rebalance or it's just only for
your personal interest?
If yes, please, share details and we will try to help you.

As for this comment I don't think he is actual. That change was in 2015.
Much has changed
within rebalance process since that time. I've uncommented it and rechecked
with that
cache configuration and haven't seen any failed tests or issues.

Probably, that problem was about cache in SYNC mode does not start util it
loads all data
from other nodes. But currently delayed rebalance works the same way as
IgniteCache#rebalance(),
so you can `setRebalanceDelay` to `-1` and call it manually to check.

On Mon, 20 Aug 2018 at 11:19 Roman Shtykh <rs...@yahoo.com.invalid> wrote:

> Igniters,
> I have found "Known issue, possible deadlock in case of low priority cache
> rebalancing delayed" comment in
> GridCacheRebalancingSyncSelfTest#getConfiguration.Can you please explain
> when using rebalance delay can be an issue and why?
>
> -- Roman
>
-- 
--
Maxim Muzafarov