You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Akash Shinde <ak...@gmail.com> on 2019/01/03 12:16:25 UTC

nodes getting disconnected from cluster

Hi,

I am getting " Timed out waiting for message delivery receipt" WARN message
in my logs.
But I am sure that it is not happening because of long GC pause. I have
check the memory utilization and it is very low.

I also tried to check the connectivity between two nodes between which the
timeout is happening.
bandwidth is as shown below.

[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec

Many times I get following message in my logs. Is it because two nodes are
not able communicate within given time limit?

*ERROR:*
 Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
blockedFor=14s]

I have also attached log snippet. Can some one please help to narrow down
the issue?

Thanks,
Akash

Re: nodes getting disconnected from cluster

Posted by Aat <as...@gmail.com>.
It's works.
Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: nodes getting disconnected from cluster

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I don't see any lengthy GC pauses yet one node were segmented. It is
unclear what exactly would cause this.

Can you try increasing failureDetectionTimeout to 2 minutes (120000) and
retrying? Please attach logs if there is failure again.

Regards,
-- 
Ilya Kasnacheev


вт, 8 янв. 2019 г. в 17:33, Akash Shinde <ak...@gmail.com>:

> Hi Evgenii ,
>
> I am starting 7 ignite nodes on 7 VMs. But to narrow down the problem I
> started only two server nodes on two VMs, core03 and core04. Initially
> these VMs were on different VHS. So we moved these two VMs on same VHS (to
> avoid network issues) and checked the network bandwidth using iperf. Now
> the network bandwidth is 6.7 Gbps. Then started one client node from laptop
> just to check the cluster status.
>
> But even after doing this I am facing the same problem. The nodes are
> segmenting during the data loading.
>
> I have attached the logs for two server nodes. It also contains gc logs.
>
>
> Thanks,
> Akash
>
> On Tue, Jan 8, 2019 at 6:00 AM Evgenii Zhuravlev <e....@gmail.com>
> wrote:
>
>> Hi,
>>
>> Can you share logs from all nodes, especially from node qagmscore02/
>> 10.114.113.53:47500 ?
>>
>> Evgenii
>>
>> пн, 7 янв. 2019 г. в 08:14, Akash Shinde <ak...@gmail.com>:
>>
>>> Hi,
>>> Someone could please help me on this issue.
>>>
>>> Thanks,
>>> Akash
>>>
>>> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am getting " Timed out waiting for message delivery receipt" WARN
>>>> message in my logs.
>>>> But I am sure that it is not happening because of long GC pause. I have
>>>> check the memory utilization and it is very low.
>>>>
>>>> I also tried to check the connectivity between two nodes between which
>>>> the timeout is happening.
>>>> bandwidth is as shown below.
>>>>
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec
>>>>
>>>> Many times I get following message in my logs. Is it because two nodes
>>>> are not able communicate within given time limit?
>>>>
>>>> *ERROR:*
>>>>  Blocked system-critical thread has been detected. This can lead to
>>>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>>>> blockedFor=14s]
>>>>
>>>> I have also attached log snippet. Can some one please help to narrow
>>>> down the issue?
>>>>
>>>> Thanks,
>>>> Akash
>>>>
>>>

Re: nodes getting disconnected from cluster

Posted by Akash Shinde <ak...@gmail.com>.
Hi Evgenii ,

I am starting 7 ignite nodes on 7 VMs. But to narrow down the problem I
started only two server nodes on two VMs, core03 and core04. Initially
these VMs were on different VHS. So we moved these two VMs on same VHS (to
avoid network issues) and checked the network bandwidth using iperf. Now
the network bandwidth is 6.7 Gbps. Then started one client node from laptop
just to check the cluster status.

But even after doing this I am facing the same problem. The nodes are
segmenting during the data loading.

I have attached the logs for two server nodes. It also contains gc logs.


Thanks,
Akash

On Tue, Jan 8, 2019 at 6:00 AM Evgenii Zhuravlev <e....@gmail.com>
wrote:

> Hi,
>
> Can you share logs from all nodes, especially from node qagmscore02/
> 10.114.113.53:47500 ?
>
> Evgenii
>
> пн, 7 янв. 2019 г. в 08:14, Akash Shinde <ak...@gmail.com>:
>
>> Hi,
>> Someone could please help me on this issue.
>>
>> Thanks,
>> Akash
>>
>> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am getting " Timed out waiting for message delivery receipt" WARN
>>> message in my logs.
>>> But I am sure that it is not happening because of long GC pause. I have
>>> check the memory utilization and it is very low.
>>>
>>> I also tried to check the connectivity between two nodes between which
>>> the timeout is happening.
>>> bandwidth is as shown below.
>>>
>>> [ ID] Interval       Transfer     Bandwidth
>>> [  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec
>>>
>>> Many times I get following message in my logs. Is it because two nodes
>>> are not able communicate within given time limit?
>>>
>>> *ERROR:*
>>>  Blocked system-critical thread has been detected. This can lead to
>>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>>> blockedFor=14s]
>>>
>>> I have also attached log snippet. Can some one please help to narrow
>>> down the issue?
>>>
>>> Thanks,
>>> Akash
>>>
>>

Re: nodes getting disconnected from cluster

Posted by Evgenii Zhuravlev <e....@gmail.com>.
Hi,

Can you share logs from all nodes, especially from node qagmscore02/
10.114.113.53:47500 ?

Evgenii

пн, 7 янв. 2019 г. в 08:14, Akash Shinde <ak...@gmail.com>:

> Hi,
> Someone could please help me on this issue.
>
> Thanks,
> Akash
>
> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com> wrote:
>
>> Hi,
>>
>> I am getting " Timed out waiting for message delivery receipt" WARN
>> message in my logs.
>> But I am sure that it is not happening because of long GC pause. I have
>> check the memory utilization and it is very low.
>>
>> I also tried to check the connectivity between two nodes between which
>> the timeout is happening.
>> bandwidth is as shown below.
>>
>> [ ID] Interval       Transfer     Bandwidth
>> [  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec
>>
>> Many times I get following message in my logs. Is it because two nodes
>> are not able communicate within given time limit?
>>
>> *ERROR:*
>>  Blocked system-critical thread has been detected. This can lead to
>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>> blockedFor=14s]
>>
>> I have also attached log snippet. Can some one please help to narrow down
>> the issue?
>>
>> Thanks,
>> Akash
>>
>

Re: nodes getting disconnected from cluster

Posted by Akash Shinde <ak...@gmail.com>.
Hi,
Someone could please help me on this issue.

Thanks,
Akash

On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com> wrote:

> Hi,
>
> I am getting " Timed out waiting for message delivery receipt" WARN
> message in my logs.
> But I am sure that it is not happening because of long GC pause. I have
> check the memory utilization and it is very low.
>
> I also tried to check the connectivity between two nodes between which the
> timeout is happening.
> bandwidth is as shown below.
>
> [ ID] Interval       Transfer     Bandwidth
> [  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec
>
> Many times I get following message in my logs. Is it because two nodes are
> not able communicate within given time limit?
>
> *ERROR:*
>  Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
> blockedFor=14s]
>
> I have also attached log snippet. Can some one please help to narrow down
> the issue?
>
> Thanks,
> Akash
>