You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Attila Bukor (JIRA)" <ji...@apache.org> on 2017/07/27 14:23:00 UTC

[jira] [Created] (KUDU-2080) Masters stuck in a bad state when not starting them together on initial deployment

Attila Bukor created KUDU-2080:
----------------------------------

             Summary: Masters stuck in a bad state when not starting them together on initial deployment
                 Key: KUDU-2080
                 URL: https://issues.apache.org/jira/browse/KUDU-2080
             Project: Kudu
          Issue Type: Bug
          Components: master
    Affects Versions: 1.3.0
            Reporter: Attila Bukor


When masters are started separately on the first run when they're trying to write the consensus data they won't be able to connect to each other and fail writing.

{code}
I0726 14:15:22.894768 55240 consensus_peers.cc:503] Retrying to get permanent uuid for remote peer: member_type: VOTER last_known_addr { host: "master1.example.com" port: 7051 } attempt: 10
W0726 14:15:22.895084 55240 consensus_peers.cc:493] Error getting permanent uuid from config peer master1.example.com:7051: Network error: Client connection negotiation failed: client connection to 10.1.0.1:7051: connect: Connection refused (error 111)
I0726 14:15:36.235213 55240 consensus_peers.cc:503] Retrying to get permanent uuid for remote peer: member_type: VOTER last_known_addr { host: "master1.example.com" port: 7051 } attempt: 11
W0726 14:15:36.235498 55240 consensus_peers.cc:493] Error getting permanent uuid from config peer master1.example.com:7051: Network error: Client connection negotiation failed: client connection to 10.1.0.1:7051: connect: Connection refused (error 111)
E0726 14:15:36.235572 55240 master.cc:171] Master@0.0.0.0:7051: Unable to init master catalog manager: Timed out: Unable to initialize catalog manager: Failed to initialize sys tables async: Failed to create new distributed Raft config: Unable to resolve UUID for peer member_type: VOTER last_known_addr { host: "master1.example.com" port: 7051 }: Getting permanent uuid from master1.example.com:7051 timed out after 30000 ms.: Network error: Client connection negotiation failed: client connection to 10.1.0.1:7051: connect: Connection refused (error 111)
F0726 14:15:36.235663 55079 master_main.cc:71] Check failed: _s.ok() Bad status: Timed out: Unable to initialize catalog manager: Failed to initialize sys tables async: Failed to create new distributed Raft config: Unable to resolve UUID for peer member_type: VOTER last_known_addr { host: "master1.example.com" port: 7051 }: Getting permanent uuid from master1.example.com:7051 timed out after 30000 ms.: Network error: Client connection negotiation failed: client connection to 10.1.0.1:7051: connect: Connection refused (error 111)
{code}

After this the tablet-meta will be there but the consensus-meta will be missing and the startup will fail until all masters' data directory is empty and they're started again at the same time (similarly to KUDU-1186):

{code}
I0726 14:20:52.455219 58429 sys_catalog.cc:128] Verifying existing consensus state
E0726 14:20:52.455294 58429 master.cc:171] Master@0.0.0.0:7051: Unable to init master catalog manager: Not found: Unable to initialize catalog manager: Failed to initialize sys tables async: Unable to load consensus metadata for tablet 00000000000000000000000000000000: /data/kudu/master/data/consensus-meta/00000000000000000000000000000000: No such file or directory (error 2)
F0726 14:20:52.455400 58268 master_main.cc:71] Check failed: _s.ok() Bad status: Not found: Unable to initialize catalog manager: Failed to initialize sys tables async: Unable to load consensus metadata for tablet 00000000000000000000000000000000: /data/kudu/master/data/consensus-meta/00000000000000000000000000000000: No such file or directory (error 2)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)