You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Akash Shinde <ak...@gmail.com> on 2019/01/03 12:16:25 UTC
nodes getting disconnected from cluster
Hi,
I am getting " Timed out waiting for message delivery receipt" WARN message
in my logs.
But I am sure that it is not happening because of long GC pause. I have
check the memory utilization and it is very low.
I also tried to check the connectivity between two nodes between which the
timeout is happening.
bandwidth is as shown below.
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 855 MBytes 708 Mbits/sec
Many times I get following message in my logs. Is it because two nodes are
not able communicate within given time limit?
*ERROR:*
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
blockedFor=14s]
I have also attached log snippet. Can some one please help to narrow down
the issue?
Thanks,
Akash
Re: nodes getting disconnected from cluster
Posted by Aat <as...@gmail.com>.
It's works.
Thanks
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: nodes getting disconnected from cluster
Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!
I don't see any lengthy GC pauses yet one node were segmented. It is
unclear what exactly would cause this.
Can you try increasing failureDetectionTimeout to 2 minutes (120000) and
retrying? Please attach logs if there is failure again.
Regards,
--
Ilya Kasnacheev
вт, 8 янв. 2019 г. в 17:33, Akash Shinde <ak...@gmail.com>:
> Hi Evgenii ,
>
> I am starting 7 ignite nodes on 7 VMs. But to narrow down the problem I
> started only two server nodes on two VMs, core03 and core04. Initially
> these VMs were on different VHS. So we moved these two VMs on same VHS (to
> avoid network issues) and checked the network bandwidth using iperf. Now
> the network bandwidth is 6.7 Gbps. Then started one client node from laptop
> just to check the cluster status.
>
> But even after doing this I am facing the same problem. The nodes are
> segmenting during the data loading.
>
> I have attached the logs for two server nodes. It also contains gc logs.
>
>
> Thanks,
> Akash
>
> On Tue, Jan 8, 2019 at 6:00 AM Evgenii Zhuravlev <e....@gmail.com>
> wrote:
>
>> Hi,
>>
>> Can you share logs from all nodes, especially from node qagmscore02/
>> 10.114.113.53:47500 ?
>>
>> Evgenii
>>
>> пн, 7 янв. 2019 г. в 08:14, Akash Shinde <ak...@gmail.com>:
>>
>>> Hi,
>>> Someone could please help me on this issue.
>>>
>>> Thanks,
>>> Akash
>>>
>>> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am getting " Timed out waiting for message delivery receipt" WARN
>>>> message in my logs.
>>>> But I am sure that it is not happening because of long GC pause. I have
>>>> check the memory utilization and it is very low.
>>>>
>>>> I also tried to check the connectivity between two nodes between which
>>>> the timeout is happening.
>>>> bandwidth is as shown below.
>>>>
>>>> [ ID] Interval Transfer Bandwidth
>>>> [ 4] 0.0-10.1 sec 855 MBytes 708 Mbits/sec
>>>>
>>>> Many times I get following message in my logs. Is it because two nodes
>>>> are not able communicate within given time limit?
>>>>
>>>> *ERROR:*
>>>> Blocked system-critical thread has been detected. This can lead to
>>>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>>>> blockedFor=14s]
>>>>
>>>> I have also attached log snippet. Can some one please help to narrow
>>>> down the issue?
>>>>
>>>> Thanks,
>>>> Akash
>>>>
>>>
Re: nodes getting disconnected from cluster
Posted by Akash Shinde <ak...@gmail.com>.
Hi Evgenii ,
I am starting 7 ignite nodes on 7 VMs. But to narrow down the problem I
started only two server nodes on two VMs, core03 and core04. Initially
these VMs were on different VHS. So we moved these two VMs on same VHS (to
avoid network issues) and checked the network bandwidth using iperf. Now
the network bandwidth is 6.7 Gbps. Then started one client node from laptop
just to check the cluster status.
But even after doing this I am facing the same problem. The nodes are
segmenting during the data loading.
I have attached the logs for two server nodes. It also contains gc logs.
Thanks,
Akash
On Tue, Jan 8, 2019 at 6:00 AM Evgenii Zhuravlev <e....@gmail.com>
wrote:
> Hi,
>
> Can you share logs from all nodes, especially from node qagmscore02/
> 10.114.113.53:47500 ?
>
> Evgenii
>
> пн, 7 янв. 2019 г. в 08:14, Akash Shinde <ak...@gmail.com>:
>
>> Hi,
>> Someone could please help me on this issue.
>>
>> Thanks,
>> Akash
>>
>> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am getting " Timed out waiting for message delivery receipt" WARN
>>> message in my logs.
>>> But I am sure that it is not happening because of long GC pause. I have
>>> check the memory utilization and it is very low.
>>>
>>> I also tried to check the connectivity between two nodes between which
>>> the timeout is happening.
>>> bandwidth is as shown below.
>>>
>>> [ ID] Interval Transfer Bandwidth
>>> [ 4] 0.0-10.1 sec 855 MBytes 708 Mbits/sec
>>>
>>> Many times I get following message in my logs. Is it because two nodes
>>> are not able communicate within given time limit?
>>>
>>> *ERROR:*
>>> Blocked system-critical thread has been detected. This can lead to
>>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>>> blockedFor=14s]
>>>
>>> I have also attached log snippet. Can some one please help to narrow
>>> down the issue?
>>>
>>> Thanks,
>>> Akash
>>>
>>
Re: nodes getting disconnected from cluster
Posted by Evgenii Zhuravlev <e....@gmail.com>.
Hi,
Can you share logs from all nodes, especially from node qagmscore02/
10.114.113.53:47500 ?
Evgenii
пн, 7 янв. 2019 г. в 08:14, Akash Shinde <ak...@gmail.com>:
> Hi,
> Someone could please help me on this issue.
>
> Thanks,
> Akash
>
> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com> wrote:
>
>> Hi,
>>
>> I am getting " Timed out waiting for message delivery receipt" WARN
>> message in my logs.
>> But I am sure that it is not happening because of long GC pause. I have
>> check the memory utilization and it is very low.
>>
>> I also tried to check the connectivity between two nodes between which
>> the timeout is happening.
>> bandwidth is as shown below.
>>
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.0-10.1 sec 855 MBytes 708 Mbits/sec
>>
>> Many times I get following message in my logs. Is it because two nodes
>> are not able communicate within given time limit?
>>
>> *ERROR:*
>> Blocked system-critical thread has been detected. This can lead to
>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>> blockedFor=14s]
>>
>> I have also attached log snippet. Can some one please help to narrow down
>> the issue?
>>
>> Thanks,
>> Akash
>>
>
Re: nodes getting disconnected from cluster
Posted by Akash Shinde <ak...@gmail.com>.
Hi,
Someone could please help me on this issue.
Thanks,
Akash
On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <ak...@gmail.com> wrote:
> Hi,
>
> I am getting " Timed out waiting for message delivery receipt" WARN
> message in my logs.
> But I am sure that it is not happening because of long GC pause. I have
> check the memory utilization and it is very low.
>
> I also tried to check the connectivity between two nodes between which the
> timeout is happening.
> bandwidth is as shown below.
>
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.0-10.1 sec 855 MBytes 708 Mbits/sec
>
> Many times I get following message in my logs. Is it because two nodes are
> not able communicate within given time limit?
>
> *ERROR:*
> Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
> blockedFor=14s]
>
> I have also attached log snippet. Can some one please help to narrow down
> the issue?
>
> Thanks,
> Akash
>