You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Akash Shinde <ak...@gmail.com> on 2020/06/06 13:15:37 UTC

Countdown latch issue with 2.6.0

Hi,
Issue: Countdown latched gets reinitialize to original value(4) when one or
more (but not all) node goes down. (Partition loss happened)

We are using ignite's distributed countdownlatch to make sure that cache
loading is completed on all server nodes. We do this to make sure that our
kafka consumers starts only after cache loading is complete on all server
nodes. This is the basic criteria which needs to be fulfilled before starts
actual processing


 We have 4 server nodes and countdownlatch is initialized to 4. We use
"cache.loadCache" method to start the cache loading. When each server
completes cache loading it reduces the count by 1 using countDown method.
So when all the nodes completes cache loading, the count reaches to zero.
When this count  reaches to zero we start kafka consumers on all server
nodes.

 But we saw weird behavior in prod env. The 3 server nodes were shut down
at the same time. But 1 node is still alive. When this happened the count
down was reinitialized to original value i.e. 4. But I am not able to
reproduce this in dev env.

 Is this a bug, when one or more (but not all) nodes goes down then count
re initializes back to original value?

Thanks,
Akash

Re: Countdown latch issue with 2.6.0

Posted by Alexandr Shapkin <le...@gmail.com>.

Hi,

Have you tried to reproduce this behaviour with the latest release - 2.8.1?
If I remember correctly, there were some fixes regarding the partition
counters mismatches, for sample:
https://issues.apache.org/jira/browse/IGNITE-10078



-----
Alex Shapkin
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Countdown latch issue with 2.6.0

Posted by Akash Shinde <ak...@gmail.com>.

Hi, I have created jira for this issue
https://issues.apache.org/jira/browse/IGNITE-13132

Thanks,
Akash

On Sun, Jun 7, 2020 at 9:29 AM Akash Shinde <ak...@gmail.com> wrote:

> Can someone please help me with this issue.
>
> On Sat, Jun 6, 2020 at 6:45 PM Akash Shinde <ak...@gmail.com> wrote:
>
>> Hi,
>> Issue: Countdown latched gets reinitialize to original value(4) when one
>> or more (but not all) node goes down. (Partition loss happened)
>>
>> We are using ignite's distributed countdownlatch to make sure that cache
>> loading is completed on all server nodes. We do this to make sure that our
>> kafka consumers starts only after cache loading is complete on all server
>> nodes. This is the basic criteria which needs to be fulfilled before starts
>> actual processing
>>
>>
>>  We have 4 server nodes and countdownlatch is initialized to 4. We use
>> "cache.loadCache" method to start the cache loading. When each server
>> completes cache loading it reduces the count by 1 using countDown method.
>> So when all the nodes completes cache loading, the count reaches to zero.
>> When this count  reaches to zero we start kafka consumers on all server
>> nodes.
>>
>>  But we saw weird behavior in prod env. The 3 server nodes were shut down
>> at the same time. But 1 node is still alive. When this happened the count
>> down was reinitialized to original value i.e. 4. But I am not able to
>> reproduce this in dev env.
>>
>>  Is this a bug, when one or more (but not all) nodes goes down then count
>> re initializes back to original value?
>>
>> Thanks,
>> Akash
>>
>

Re: Countdown latch issue with 2.6.0

Posted by Akash Shinde <ak...@gmail.com>.

Hi, I have created jira for this issue
https://issues.apache.org/jira/browse/IGNITE-13132

Thanks,
Akash

On Sun, Jun 7, 2020 at 9:29 AM Akash Shinde <ak...@gmail.com> wrote:

> Can someone please help me with this issue.
>
> On Sat, Jun 6, 2020 at 6:45 PM Akash Shinde <ak...@gmail.com> wrote:
>
>> Hi,
>> Issue: Countdown latched gets reinitialize to original value(4) when one
>> or more (but not all) node goes down. (Partition loss happened)
>>
>> We are using ignite's distributed countdownlatch to make sure that cache
>> loading is completed on all server nodes. We do this to make sure that our
>> kafka consumers starts only after cache loading is complete on all server
>> nodes. This is the basic criteria which needs to be fulfilled before starts
>> actual processing
>>
>>
>>  We have 4 server nodes and countdownlatch is initialized to 4. We use
>> "cache.loadCache" method to start the cache loading. When each server
>> completes cache loading it reduces the count by 1 using countDown method.
>> So when all the nodes completes cache loading, the count reaches to zero.
>> When this count  reaches to zero we start kafka consumers on all server
>> nodes.
>>
>>  But we saw weird behavior in prod env. The 3 server nodes were shut down
>> at the same time. But 1 node is still alive. When this happened the count
>> down was reinitialized to original value i.e. 4. But I am not able to
>> reproduce this in dev env.
>>
>>  Is this a bug, when one or more (but not all) nodes goes down then count
>> re initializes back to original value?
>>
>> Thanks,
>> Akash
>>
>

Re: Countdown latch issue with 2.6.0

Posted by Akash Shinde <ak...@gmail.com>.

Can someone please help me with this issue.

On Sat, Jun 6, 2020 at 6:45 PM Akash Shinde <ak...@gmail.com> wrote:

> Hi,
> Issue: Countdown latched gets reinitialize to original value(4) when one
> or more (but not all) node goes down. (Partition loss happened)
>
> We are using ignite's distributed countdownlatch to make sure that cache
> loading is completed on all server nodes. We do this to make sure that our
> kafka consumers starts only after cache loading is complete on all server
> nodes. This is the basic criteria which needs to be fulfilled before starts
> actual processing
>
>
>  We have 4 server nodes and countdownlatch is initialized to 4. We use
> "cache.loadCache" method to start the cache loading. When each server
> completes cache loading it reduces the count by 1 using countDown method.
> So when all the nodes completes cache loading, the count reaches to zero.
> When this count  reaches to zero we start kafka consumers on all server
> nodes.
>
>  But we saw weird behavior in prod env. The 3 server nodes were shut down
> at the same time. But 1 node is still alive. When this happened the count
> down was reinitialized to original value i.e. 4. But I am not able to
> reproduce this in dev env.
>
>  Is this a bug, when one or more (but not all) nodes goes down then count
> re initializes back to original value?
>
> Thanks,
> Akash
>