You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Vladimir Steshin <vl...@gmail.com> on 2020/06/04 20:23:56 UTC

Question: network issues of single node.

     Hi, Igniters.


     I wanted to ask how one node may not be able to connect to another 
whereas rest of the cluster can. This got covered in [1]. In short: node 
3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2 
can connect to 4. Questions:

1) Is it real case? Where this problem came from?

2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect 
to 4 (and 5) too?

Sergey, Dmitry maybe you bring light (I see you in [1])? I'm 
participating in [2] and found this backward connection checking. 
Answering would help us a lot.

Thanks!

[1] 
https://issues.apache.org/jira/browse/IGNITE-7163<https://issues.apache.org/jira/browse/IGNITE-7163>

[2] 
https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up<https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up>

Re: Question: network issues of single node.

Posted by Sergey Chugunov <se...@gmail.com>.

Of course I meant ticket [1] increased cluster stability in situation of
blinking network.

[1] https://issues.apache.org/jira/browse/IGNITE-7163

On Mon, Jun 8, 2020 at 1:51 PM Sergey Chugunov <se...@gmail.com>
wrote:

> Vladimir,
>
> Adding to what Alexey has said I remember that cases of short-term network
> issues (blinking network) were also a driver for this improvement. They are
> indeed hard to reproduce but have been seen in real world set-ups and have
> proven to increase cluster stability.
>
> On Sat, Jun 6, 2020 at 5:09 PM Denis Magda <dm...@apache.org> wrote:
>
>> Finally, I got your question.
>>
>> Back in 2017-2018, there was a Discovery SPI's stabilization activity. The
>> networking component could fail in various hard-to-reproduce scenarios
>> affecting cluster availability and consistency. That ticket reminds me of
>> those notorious issues that would fire once a week or month under specific
>> configuration settings. So, I would not touch the code that fixes the
>> issue
>> unless @Alexey Goncharuk <al...@gmail.com> or @Sergey Chugunov
>> <sc...@gridgain.com> confirms that it's safe to do. Also, there
>> should
>> be a test for this scenario.
>>
>> -
>> Denis
>>
>>
>> On Fri, Jun 5, 2020 at 12:28 AM Vladimir Steshin <vl...@gmail.com>
>> wrote:
>>
>> > Denis,
>> >
>> > I have no nodes that I'm unable to interconnect. This case is simulated
>> > in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()
>> > Introduced in [1].
>> >
>> > I’m asking if it is real or supposed problem. Where it was met? Which
>> > network configuration/issues could be?
>> >
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-7163
>> >
>> > 05.06.2020 1:01, Denis Magda пишет:
>> > > Vladimir,
>> > >
>> > > I'm suggesting to share the log files from the nodes that are unable
>> to
>> > > interconnect so that the community can check them for potential
>> issues.
>> > > Instead of sharing the logs from all the 5 nodes, try to start a
>> > two-nodes
>> > > cluster with the nodes that fail to discover each other and attach the
>> > logs
>> > > from those.
>> > >
>> > > -
>> > > Denis
>> > >
>> > >
>> > > On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vl...@gmail.com>
>> > wrote:
>> > >
>> > >> Denis, hi.
>> > >>
>> > >>       Sorry, I didn’t catch your idea. Are you saying this can happen
>> > and
>> > >> suggest experiment? I’m not descripting a probable case. It is
>> already
>> > >> done in [1]. I’m asking is it real, where it was met.
>> > >>
>> > >>
>> > >> 04.06.2020 23:33, Denis Magda пишет:
>> > >>> Vladimir,
>> > >>>
>> > >>> Please do the following experiment. Start a 2-nodes cluster booting
>> > node
>> > >> 3
>> > >>> and, for instance, node 5. Those won't be able to interconnect
>> > according
>> > >> to
>> > >>> your description. Attach the log files from both nodes for analysis.
>> > This
>> > >>> should be a networking issue.
>> > >>>
>> > >>> -
>> > >>> Denis
>> > >>>
>> > >>>
>> > >>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vladsz83@gmail.com
>> >
>> > >> wrote:
>> > >>>>        Hi, Igniters.
>> > >>>>
>> > >>>>
>> > >>>>        I wanted to ask how one node may not be able to connect to
>> > another
>> > >>>> whereas rest of the cluster can. This got covered in [1]. In short:
>> > node
>> > >>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time,
>> node
>> > 2
>> > >>>> can connect to 4. Questions:
>> > >>>>
>> > >>>> 1) Is it real case? Where this problem came from?
>> > >>>>
>> > >>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t
>> > connect
>> > >>>> to 4 (and 5) too?
>> > >>>>
>> > >>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
>> > >>>> participating in [2] and found this backward connection checking.
>> > >>>> Answering would help us a lot.
>> > >>>>
>> > >>>> Thanks!
>> > >>>>
>> > >>>> [1]
>> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163<
>> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163>
>> > >>>>
>> > >>>> [2]
>> > >>>>
>> > >>>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>> > >>>> <
>> > >>>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>> >
>>
>

Re: Question: network issues of single node.

Posted by Sergey Chugunov <se...@gmail.com>.

Vladimir,

Adding to what Alexey has said I remember that cases of short-term network
issues (blinking network) were also a driver for this improvement. They are
indeed hard to reproduce but have been seen in real world set-ups and have
proven to increase cluster stability.

On Sat, Jun 6, 2020 at 5:09 PM Denis Magda <dm...@apache.org> wrote:

> Finally, I got your question.
>
> Back in 2017-2018, there was a Discovery SPI's stabilization activity. The
> networking component could fail in various hard-to-reproduce scenarios
> affecting cluster availability and consistency. That ticket reminds me of
> those notorious issues that would fire once a week or month under specific
> configuration settings. So, I would not touch the code that fixes the issue
> unless @Alexey Goncharuk <al...@gmail.com> or @Sergey Chugunov
> <sc...@gridgain.com> confirms that it's safe to do. Also, there should
> be a test for this scenario.
>
> -
> Denis
>
>
> On Fri, Jun 5, 2020 at 12:28 AM Vladimir Steshin <vl...@gmail.com>
> wrote:
>
> > Denis,
> >
> > I have no nodes that I'm unable to interconnect. This case is simulated
> > in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()
> > Introduced in [1].
> >
> > I’m asking if it is real or supposed problem. Where it was met? Which
> > network configuration/issues could be?
> >
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7163
> >
> > 05.06.2020 1:01, Denis Magda пишет:
> > > Vladimir,
> > >
> > > I'm suggesting to share the log files from the nodes that are unable to
> > > interconnect so that the community can check them for potential issues.
> > > Instead of sharing the logs from all the 5 nodes, try to start a
> > two-nodes
> > > cluster with the nodes that fail to discover each other and attach the
> > logs
> > > from those.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vl...@gmail.com>
> > wrote:
> > >
> > >> Denis, hi.
> > >>
> > >>       Sorry, I didn’t catch your idea. Are you saying this can happen
> > and
> > >> suggest experiment? I’m not descripting a probable case. It is already
> > >> done in [1]. I’m asking is it real, where it was met.
> > >>
> > >>
> > >> 04.06.2020 23:33, Denis Magda пишет:
> > >>> Vladimir,
> > >>>
> > >>> Please do the following experiment. Start a 2-nodes cluster booting
> > node
> > >> 3
> > >>> and, for instance, node 5. Those won't be able to interconnect
> > according
> > >> to
> > >>> your description. Attach the log files from both nodes for analysis.
> > This
> > >>> should be a networking issue.
> > >>>
> > >>> -
> > >>> Denis
> > >>>
> > >>>
> > >>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com>
> > >> wrote:
> > >>>>        Hi, Igniters.
> > >>>>
> > >>>>
> > >>>>        I wanted to ask how one node may not be able to connect to
> > another
> > >>>> whereas rest of the cluster can. This got covered in [1]. In short:
> > node
> > >>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time,
> node
> > 2
> > >>>> can connect to 4. Questions:
> > >>>>
> > >>>> 1) Is it real case? Where this problem came from?
> > >>>>
> > >>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t
> > connect
> > >>>> to 4 (and 5) too?
> > >>>>
> > >>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
> > >>>> participating in [2] and found this backward connection checking.
> > >>>> Answering would help us a lot.
> > >>>>
> > >>>> Thanks!
> > >>>>
> > >>>> [1]
> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163<
> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163>
> > >>>>
> > >>>> [2]
> > >>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> > >>>> <
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >
>

Re: Question: network issues of single node.

Posted by Alexey Goncharuk <al...@gmail.com>.

Vladimir,

Such behavior can be introduced by an erroneous firewall configuration (I
can't find a link, but I remember that quite a large number of major
incidents are caused by an incorrect configuration change). If such a case
can be detected, we prefer Ignite to shutdown some of the nodes rather than
leave the whole cluster hanging on connection await.

сб, 6 июн. 2020 г. в 17:09, Denis Magda <dm...@apache.org>:

> Finally, I got your question.
>
> Back in 2017-2018, there was a Discovery SPI's stabilization activity. The
> networking component could fail in various hard-to-reproduce scenarios
> affecting cluster availability and consistency. That ticket reminds me of
> those notorious issues that would fire once a week or month under specific
> configuration settings. So, I would not touch the code that fixes the issue
> unless @Alexey Goncharuk <al...@gmail.com> or @Sergey Chugunov
> <sc...@gridgain.com> confirms that it's safe to do. Also, there should
> be a test for this scenario.
>
> -
> Denis
>
>
> On Fri, Jun 5, 2020 at 12:28 AM Vladimir Steshin <vl...@gmail.com>
> wrote:
>
> > Denis,
> >
> > I have no nodes that I'm unable to interconnect. This case is simulated
> > in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()
> > Introduced in [1].
> >
> > I’m asking if it is real or supposed problem. Where it was met? Which
> > network configuration/issues could be?
> >
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7163
> >
> > 05.06.2020 1:01, Denis Magda пишет:
> > > Vladimir,
> > >
> > > I'm suggesting to share the log files from the nodes that are unable to
> > > interconnect so that the community can check them for potential issues.
> > > Instead of sharing the logs from all the 5 nodes, try to start a
> > two-nodes
> > > cluster with the nodes that fail to discover each other and attach the
> > logs
> > > from those.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vl...@gmail.com>
> > wrote:
> > >
> > >> Denis, hi.
> > >>
> > >>       Sorry, I didn’t catch your idea. Are you saying this can happen
> > and
> > >> suggest experiment? I’m not descripting a probable case. It is already
> > >> done in [1]. I’m asking is it real, where it was met.
> > >>
> > >>
> > >> 04.06.2020 23:33, Denis Magda пишет:
> > >>> Vladimir,
> > >>>
> > >>> Please do the following experiment. Start a 2-nodes cluster booting
> > node
> > >> 3
> > >>> and, for instance, node 5. Those won't be able to interconnect
> > according
> > >> to
> > >>> your description. Attach the log files from both nodes for analysis.
> > This
> > >>> should be a networking issue.
> > >>>
> > >>> -
> > >>> Denis
> > >>>
> > >>>
> > >>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com>
> > >> wrote:
> > >>>>        Hi, Igniters.
> > >>>>
> > >>>>
> > >>>>        I wanted to ask how one node may not be able to connect to
> > another
> > >>>> whereas rest of the cluster can. This got covered in [1]. In short:
> > node
> > >>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time,
> node
> > 2
> > >>>> can connect to 4. Questions:
> > >>>>
> > >>>> 1) Is it real case? Where this problem came from?
> > >>>>
> > >>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t
> > connect
> > >>>> to 4 (and 5) too?
> > >>>>
> > >>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
> > >>>> participating in [2] and found this backward connection checking.
> > >>>> Answering would help us a lot.
> > >>>>
> > >>>> Thanks!
> > >>>>
> > >>>> [1]
> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163<
> > >>>> https://issues.apache.org/jira/browse/IGNITE-7163>
> > >>>>
> > >>>> [2]
> > >>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> > >>>> <
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >
>

Re: Question: network issues of single node.

Posted by Denis Magda <dm...@apache.org>.

Finally, I got your question.

Back in 2017-2018, there was a Discovery SPI's stabilization activity. The
networking component could fail in various hard-to-reproduce scenarios
affecting cluster availability and consistency. That ticket reminds me of
those notorious issues that would fire once a week or month under specific
configuration settings. So, I would not touch the code that fixes the issue
unless @Alexey Goncharuk <al...@gmail.com> or @Sergey Chugunov
<sc...@gridgain.com> confirms that it's safe to do. Also, there should
be a test for this scenario.

-
Denis


On Fri, Jun 5, 2020 at 12:28 AM Vladimir Steshin <vl...@gmail.com> wrote:

> Denis,
>
> I have no nodes that I'm unable to interconnect. This case is simulated
> in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()
> Introduced in [1].
>
> I’m asking if it is real or supposed problem. Where it was met? Which
> network configuration/issues could be?
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7163
>
> 05.06.2020 1:01, Denis Magda пишет:
> > Vladimir,
> >
> > I'm suggesting to share the log files from the nodes that are unable to
> > interconnect so that the community can check them for potential issues.
> > Instead of sharing the logs from all the 5 nodes, try to start a
> two-nodes
> > cluster with the nodes that fail to discover each other and attach the
> logs
> > from those.
> >
> > -
> > Denis
> >
> >
> > On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vl...@gmail.com>
> wrote:
> >
> >> Denis, hi.
> >>
> >>       Sorry, I didn’t catch your idea. Are you saying this can happen
> and
> >> suggest experiment? I’m not descripting a probable case. It is already
> >> done in [1]. I’m asking is it real, where it was met.
> >>
> >>
> >> 04.06.2020 23:33, Denis Magda пишет:
> >>> Vladimir,
> >>>
> >>> Please do the following experiment. Start a 2-nodes cluster booting
> node
> >> 3
> >>> and, for instance, node 5. Those won't be able to interconnect
> according
> >> to
> >>> your description. Attach the log files from both nodes for analysis.
> This
> >>> should be a networking issue.
> >>>
> >>> -
> >>> Denis
> >>>
> >>>
> >>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com>
> >> wrote:
> >>>>        Hi, Igniters.
> >>>>
> >>>>
> >>>>        I wanted to ask how one node may not be able to connect to
> another
> >>>> whereas rest of the cluster can. This got covered in [1]. In short:
> node
> >>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time, node
> 2
> >>>> can connect to 4. Questions:
> >>>>
> >>>> 1) Is it real case? Where this problem came from?
> >>>>
> >>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t
> connect
> >>>> to 4 (and 5) too?
> >>>>
> >>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
> >>>> participating in [2] and found this backward connection checking.
> >>>> Answering would help us a lot.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> [1]
> >>>> https://issues.apache.org/jira/browse/IGNITE-7163<
> >>>> https://issues.apache.org/jira/browse/IGNITE-7163>
> >>>>
> >>>> [2]
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >>>> <
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>

Re: Question: network issues of single node.

Posted by Vladimir Steshin <vl...@gmail.com>.

Denis,

I have no nodes that I'm unable to interconnect. This case is simulated 
in IgniteDiscoveryMassiveNodeFailTest.testMassiveFailSelfKill()
Introduced in [1].

I’m asking if it is real or supposed problem. Where it was met? Which 
network configuration/issues could be?


[1] https://issues.apache.org/jira/browse/IGNITE-7163

05.06.2020 1:01, Denis Magda пишет:
> Vladimir,
>
> I'm suggesting to share the log files from the nodes that are unable to
> interconnect so that the community can check them for potential issues.
> Instead of sharing the logs from all the 5 nodes, try to start a two-nodes
> cluster with the nodes that fail to discover each other and attach the logs
> from those.
>
> -
> Denis
>
>
> On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vl...@gmail.com> wrote:
>
>> Denis, hi.
>>
>>       Sorry, I didn’t catch your idea. Are you saying this can happen and
>> suggest experiment? I’m not descripting a probable case. It is already
>> done in [1]. I’m asking is it real, where it was met.
>>
>>
>> 04.06.2020 23:33, Denis Magda пишет:
>>> Vladimir,
>>>
>>> Please do the following experiment. Start a 2-nodes cluster booting node
>> 3
>>> and, for instance, node 5. Those won't be able to interconnect according
>> to
>>> your description. Attach the log files from both nodes for analysis. This
>>> should be a networking issue.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com>
>> wrote:
>>>>        Hi, Igniters.
>>>>
>>>>
>>>>        I wanted to ask how one node may not be able to connect to another
>>>> whereas rest of the cluster can. This got covered in [1]. In short: node
>>>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2
>>>> can connect to 4. Questions:
>>>>
>>>> 1) Is it real case? Where this problem came from?
>>>>
>>>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect
>>>> to 4 (and 5) too?
>>>>
>>>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
>>>> participating in [2] and found this backward connection checking.
>>>> Answering would help us a lot.
>>>>
>>>> Thanks!
>>>>
>>>> [1]
>>>> https://issues.apache.org/jira/browse/IGNITE-7163<
>>>> https://issues.apache.org/jira/browse/IGNITE-7163>
>>>>
>>>> [2]
>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>>>> <
>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up

Re: Question: network issues of single node.

Posted by Denis Magda <dm...@apache.org>.

Vladimir,

I'm suggesting to share the log files from the nodes that are unable to
interconnect so that the community can check them for potential issues.
Instead of sharing the logs from all the 5 nodes, try to start a two-nodes
cluster with the nodes that fail to discover each other and attach the logs
from those.

-
Denis


On Thu, Jun 4, 2020 at 1:57 PM Vladimir Steshin <vl...@gmail.com> wrote:

> Denis, hi.
>
>      Sorry, I didn’t catch your idea. Are you saying this can happen and
> suggest experiment? I’m not descripting a probable case. It is already
> done in [1]. I’m asking is it real, where it was met.
>
>
> 04.06.2020 23:33, Denis Magda пишет:
> > Vladimir,
> >
> > Please do the following experiment. Start a 2-nodes cluster booting node
> 3
> > and, for instance, node 5. Those won't be able to interconnect according
> to
> > your description. Attach the log files from both nodes for analysis. This
> > should be a networking issue.
> >
> > -
> > Denis
> >
> >
> > On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com>
> wrote:
> >
> >>       Hi, Igniters.
> >>
> >>
> >>       I wanted to ask how one node may not be able to connect to another
> >> whereas rest of the cluster can. This got covered in [1]. In short: node
> >> 3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2
> >> can connect to 4. Questions:
> >>
> >> 1) Is it real case? Where this problem came from?
> >>
> >> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect
> >> to 4 (and 5) too?
> >>
> >> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
> >> participating in [2] and found this backward connection checking.
> >> Answering would help us a lot.
> >>
> >> Thanks!
> >>
> >> [1]
> >> https://issues.apache.org/jira/browse/IGNITE-7163<
> >> https://issues.apache.org/jira/browse/IGNITE-7163>
> >>
> >> [2]
> >>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >> <
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >>
>

Re: Question: network issues of single node.

Posted by Vladimir Steshin <vl...@gmail.com>.

Denis, hi.

     Sorry, I didn’t catch your idea. Are you saying this can happen and 
suggest experiment? I’m not descripting a probable case. It is already 
done in [1]. I’m asking is it real, where it was met.


04.06.2020 23:33, Denis Magda пишет:
> Vladimir,
>
> Please do the following experiment. Start a 2-nodes cluster booting node 3
> and, for instance, node 5. Those won't be able to interconnect according to
> your description. Attach the log files from both nodes for analysis. This
> should be a networking issue.
>
> -
> Denis
>
>
> On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com> wrote:
>
>>       Hi, Igniters.
>>
>>
>>       I wanted to ask how one node may not be able to connect to another
>> whereas rest of the cluster can. This got covered in [1]. In short: node
>> 3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2
>> can connect to 4. Questions:
>>
>> 1) Is it real case? Where this problem came from?
>>
>> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect
>> to 4 (and 5) too?
>>
>> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
>> participating in [2] and found this backward connection checking.
>> Answering would help us a lot.
>>
>> Thanks!
>>
>> [1]
>> https://issues.apache.org/jira/browse/IGNITE-7163<
>> https://issues.apache.org/jira/browse/IGNITE-7163>
>>
>> [2]
>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>> <
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
>>

Re: Question: network issues of single node.

Posted by Denis Magda <dm...@apache.org>.

Vladimir,

Please do the following experiment. Start a 2-nodes cluster booting node 3
and, for instance, node 5. Those won't be able to interconnect according to
your description. Attach the log files from both nodes for analysis. This
should be a networking issue.

-
Denis


On Thu, Jun 4, 2020 at 1:24 PM Vladimir Steshin <vl...@gmail.com> wrote:

>      Hi, Igniters.
>
>
>      I wanted to ask how one node may not be able to connect to another
> whereas rest of the cluster can. This got covered in [1]. In short: node
> 3 can't connect to nodes 4 and 5 but can to 1. At the same time, node 2
> can connect to 4. Questions:
>
> 1) Is it real case? Where this problem came from?
>
> 2) If node 3 can’t connect to 4 and 5, does it mean node 2 can’t connect
> to 4 (and 5) too?
>
> Sergey, Dmitry maybe you bring light (I see you in [1])? I'm
> participating in [2] and found this backward connection checking.
> Answering would help us a lot.
>
> Thanks!
>
> [1]
> https://issues.apache.org/jira/browse/IGNITE-7163<
> https://issues.apache.org/jira/browse/IGNITE-7163>
>
> [2]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> <
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up
> >
>
>