You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Ray Zhang <ra...@broadcom.com> on 2022/05/26 01:04:09 UTC

Issue with establishing the baseline after nodes restart

Hi all,

I have been having a hard time to find a proper procedure to re-establish
the baseline of the Ignite cluster with 3 nodes after all nodes restarts.
I am running ignite 2.8.1.

The baseline command is basically not functional at all after all the nodes
in the ignite cluster rolling restarted.

If I check the "top" view in the visor, the output of the command shows 3
expected nodes that form the cluster and all cache are in good status.

Here is the output of the command. It seems to suggest one of the nodes is
no longer found and the same output is returned when the command is run in
any of the nodes in the cluster.

My question is how can I reset the baseline state?  In the current state,
no matter what command I provide, either to add, remove or set a new
baseline with the *control.sh --baseline command*, it always returns the
same error. I have also tried 'deactivate' then 'reactivate' the cluster
state, and also restarted the nodes a number of times, but none of these
steps work.

Is this a known bug? Any suggestions to get out of this state?

*/usr/share/apache-ignite/bin/control.sh --baseline*
Control utility [*ver. 2.8.1#20200521-*sha1:86422096]
2020 Copyright(C) Apache Software Foundation
User: root
Time: 2022-05-26T00:51:26.874
Command [BASELINE] started
Arguments: --baseline
--------------------------------------------------------------------------------
Failed to execute baseline command='collect'
Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad not found
Check arguments. *Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad *not
found
Command [BASELINE] finished with code: 1
Control utility has completed execution at: 2022-05-26T00:51:27.161
Execution time: 287 ms

Thanks .

Ray

*Sent with Shift
<https://tryshift.com/?utm_source=SentWithShift&utm_campaign=Sent+with+Shift+Signature&utm_medium=Email+Signature&utm_content=General+Email+Group>*

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

Re: Issue with establishing the baseline after nodes restart

Posted by Ray Zhang <ra...@broadcom.com>.
Thanks for the information Veena.

This is what I figured out too. Here is what I did to find the offending
client. I enabled the rest API access for the ignite servers and then query
the node information using the node id mentioned in the error message. In
the response, the ip of the node that's having the issue will be printed
out and then restarting the node would resolve that particular error, like
you said another similar error would popup and the process needs to be
repeated until all those clients are restarted.

curl 'localhost:8081/ignite?cmd=node&id=PUT_NODE_ID_HERE' 2>/dev/null | jq
'.response|.consistentId+.tcpHostNames[0]'

Ray

*Sent with Shift
<https://tryshift.com/?utm_source=SentWithShift&utm_campaign=Sent+with+Shift+Signature&utm_medium=Email+Signature&utm_content=General+Email+Group>*


On Tue, Jun 21, 2022 at 6:08 AM Veena Mithare <ve...@gmail.com>
wrote:

> Hello,
>
> Not sure if you are still facing this issue,
>
> When we would face this issue , we would find which client node is the
> baseline command saying is not found. And restart that client node.
>
> Sometimes after restarting the client node, the baseline command would say
> the same thing about another client node - so we would need to restart
> another client node etc.
>
> Once all the nodes that the baseline command mentions are restarted, the
> baseline command works correctly.
>
> regards,
> Veena.
>
> On Thu, May 26, 2022 at 2:04 AM Ray Zhang <ra...@broadcom.com> wrote:
>
>> Hi all,
>>
>> I have been having a hard time to find a proper procedure to re-establish
>> the baseline of the Ignite cluster with 3 nodes after all nodes restarts.
>> I am running ignite 2.8.1.
>>
>> The baseline command is basically not functional at all after all the
>> nodes in the ignite cluster rolling restarted.
>>
>> If I check the "top" view in the visor, the output of the command shows 3
>> expected nodes that form the cluster and all cache are in good status.
>>
>> Here is the output of the command. It seems to suggest one of the nodes
>> is no longer found and the same output is returned when the command is run
>> in any of the nodes in the cluster.
>>
>> My question is how can I reset the baseline state?  In the current state,
>> no matter what command I provide, either to add, remove or set a new
>> baseline with the *control.sh --baseline command*, it always returns the
>> same error. I have also tried 'deactivate' then 'reactivate' the cluster
>> state, and also restarted the nodes a number of times, but none of these
>> steps work.
>>
>> Is this a known bug? Any suggestions to get out of this state?
>>
>> */usr/share/apache-ignite/bin/control.sh --baseline*
>> Control utility [*ver. 2.8.1#20200521-*sha1:86422096]
>> 2020 Copyright(C) Apache Software Foundation
>> User: root
>> Time: 2022-05-26T00:51:26.874
>> Command [BASELINE] started
>> Arguments: --baseline
>>
>> --------------------------------------------------------------------------------
>> Failed to execute baseline command='collect'
>> Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad not found
>> Check arguments. *Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad *not
>> found
>> Command [BASELINE] finished with code: 1
>> Control utility has completed execution at: 2022-05-26T00:51:27.161
>> Execution time: 287 ms
>>
>> Thanks .
>>
>> Ray
>>
>> *Sent with Shift
>> <https://tryshift.com/?utm_source=SentWithShift&utm_campaign=Sent+with+Shift+Signature&utm_medium=Email+Signature&utm_content=General+Email+Group>*
>>
>>
>> This electronic communication and the information and any files
>> transmitted with it, or attached to it, are confidential and are intended
>> solely for the use of the individual or entity to whom it is addressed and
>> may contain information that is confidential, legally privileged, protected
>> by privacy laws, or otherwise restricted from disclosure to anyone else. If
>> you are not the intended recipient or the person responsible for delivering
>> the e-mail to the intended recipient, you are hereby notified that any use,
>> copying, distributing, dissemination, forwarding, printing, or copying of
>> this e-mail is strictly prohibited. If you received this e-mail in error,
>> please return the e-mail to the sender, delete it from your computer, and
>> destroy any printed copy of it.
>
>

-- 
Ray

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

Re: Issue with establishing the baseline after nodes restart

Posted by Veena Mithare <ve...@gmail.com>.
Hello,

Not sure if you are still facing this issue,

When we would face this issue , we would find which client node is the
baseline command saying is not found. And restart that client node.

Sometimes after restarting the client node, the baseline command would say
the same thing about another client node - so we would need to restart
another client node etc.

Once all the nodes that the baseline command mentions are restarted, the
baseline command works correctly.

regards,
Veena.

On Thu, May 26, 2022 at 2:04 AM Ray Zhang <ra...@broadcom.com> wrote:

> Hi all,
>
> I have been having a hard time to find a proper procedure to re-establish
> the baseline of the Ignite cluster with 3 nodes after all nodes restarts.
> I am running ignite 2.8.1.
>
> The baseline command is basically not functional at all after all the
> nodes in the ignite cluster rolling restarted.
>
> If I check the "top" view in the visor, the output of the command shows 3
> expected nodes that form the cluster and all cache are in good status.
>
> Here is the output of the command. It seems to suggest one of the nodes is
> no longer found and the same output is returned when the command is run in
> any of the nodes in the cluster.
>
> My question is how can I reset the baseline state?  In the current state,
> no matter what command I provide, either to add, remove or set a new
> baseline with the *control.sh --baseline command*, it always returns the
> same error. I have also tried 'deactivate' then 'reactivate' the cluster
> state, and also restarted the nodes a number of times, but none of these
> steps work.
>
> Is this a known bug? Any suggestions to get out of this state?
>
> */usr/share/apache-ignite/bin/control.sh --baseline*
> Control utility [*ver. 2.8.1#20200521-*sha1:86422096]
> 2020 Copyright(C) Apache Software Foundation
> User: root
> Time: 2022-05-26T00:51:26.874
> Command [BASELINE] started
> Arguments: --baseline
>
> --------------------------------------------------------------------------------
> Failed to execute baseline command='collect'
> Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad not found
> Check arguments. *Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad *not
> found
> Command [BASELINE] finished with code: 1
> Control utility has completed execution at: 2022-05-26T00:51:27.161
> Execution time: 287 ms
>
> Thanks .
>
> Ray
>
> *Sent with Shift
> <https://tryshift.com/?utm_source=SentWithShift&utm_campaign=Sent+with+Shift+Signature&utm_medium=Email+Signature&utm_content=General+Email+Group>*
>
>
> This electronic communication and the information and any files
> transmitted with it, or attached to it, are confidential and are intended
> solely for the use of the individual or entity to whom it is addressed and
> may contain information that is confidential, legally privileged, protected
> by privacy laws, or otherwise restricted from disclosure to anyone else. If
> you are not the intended recipient or the person responsible for delivering
> the e-mail to the intended recipient, you are hereby notified that any use,
> copying, distributing, dissemination, forwarding, printing, or copying of
> this e-mail is strictly prohibited. If you received this e-mail in error,
> please return the e-mail to the sender, delete it from your computer, and
> destroy any printed copy of it.