You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2020/03/20 23:45:28 UTC

regionserver can't connect to master

Dear HBase community,

I am having an issue with my ambari hbase deployment where regionserver is not able to connect to master

Hbase Master log files:
2020-03-21 02:36:53,614 INFO [Thread-16] master.ServerManager: Waiting on regionserver count=0; waited=3174901ms, expecting min=1 server(s), max=NO_LIMIT server(s), timeout=30000ms, lastChange=-3174901ms
2020-03-21 02:36:54,287 WARN [master/gl-hdp-ctrl03:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.

Hbase region server logs:
Caused by: java.net.ConnectException: Call to gl-hdp-ctrl03.local/192.168.20.248:16000 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: gl-hdp-ctrl03.local/192.168.20.248:16000
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:166)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)


Test connectivity from region server to master
$ telnet gl-hdp-ctrl03.local 16000
Trying 192.168.20.248...
Connected to gl-hdp-ctrl03.local.
Escape character is '^]'.

Any idea of why region can't connect?

Thank you very much
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: regionserver can't connect to master

Posted by Viraj Jasani <vj...@apache.org>.
Thanks Manuel

Ok so connection to port 16000 looks all good from RegionServer. At this point, we are only left with trying to start HM and RS again and debug further. I hope 16010 should also be accessible from UI.
 
Btw API response that you provided above should be for all services present in blueprint and not necessarily for only running services, it includes HBase but HM and RS are down. 
Anyways, it is recommended to bring up cluster with stable version: https://downloads.apache.org/hbase/stable/


On 2020/03/23 05:14:58, Manuel Sopena Ballesteros <ma...@garvan.org.au> wrote: 
> Hi Jasani,
> 
> 
> Which HBase version are you using?
> 
> [luffy@gl-hdp-ctrl03 ~]$ hbase version
> 
> SLF4J: Class path contains multiple SLF4J bindings.
> 
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> 
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 
> HBase 2.0.2.3.1.0.0-78
> 
> Source code repository git://ctr-e138-1518143905142-586755-01-000023.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hbase revision=
> 
> Compiled by jenkins on Thu Dec  6 12:27:45 UTC 2018
> 
> From source with checksum 015c34650c163b249d16fc7e496a030e
> 
> 
> You are bringing up fresh cluster and not doing an upgrade right?
> 
> Yes this is a fresh cluster I am deploying through ambari blueprints (I always reset ambari to factory settings before deploy the blueprint)
> 
> 
> Has Ambari successfully brought up NameNodes and DataNodes?
> 
> I think so
> 
> [cid:0cef77dd-f616-45ef-8214-e0bb0006b665]
> 
> 
> How-many components are already running so far?
> 
> {
> 
>   "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services",
> 
>   "items" : [
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/AMBARI_METRICS",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "AMBARI_METRICS"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/HBASE",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "HBASE"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/HDFS",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "HDFS"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/HIVE",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "HIVE"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/MAPREDUCE2",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "MAPREDUCE2"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/SMARTSENSE",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "SMARTSENSE"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/SPARK2",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "SPARK2"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/TEZ",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "TEZ"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/YARN",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "YARN"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/ZEPPELIN",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "ZEPPELIN"
> 
>       }
> 
>     },
> 
>     {
> 
>       "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/ZOOKEEPER",
> 
>       "ServiceInfo" : {
> 
>         "cluster_name" : "Grandline",
> 
>         "service_name" : "ZOOKEEPER"
> 
>       }
> 
>     }
> 
>   ]
> 
> }
> 
> 
> Are they connected(e.g. NN and DN) and only RS is having trouble connecting to HM?
> 
> Yes, this is my understanding
> 
> 
> Although telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" from RS just to double check?
> 
> $ nc -zv gl-hdp-ctrl03.local 16000
> 
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> 
> Ncat: Connected to 192.168.20.248:16000.
> 
> Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
> 
> 
> thank you
> 
> ________________________________
> From: Viraj Jasani <vj...@apache.org>
> Sent: Sunday, 22 March 2020 2:47:09 AM
> To: user@hbase.apache.org
> Subject: Re: regionserver can't connect to master
> 
> Which HBase version are you using? You are bringing up fresh cluster and not doing an upgrade right? Has Ambari successfully brought up NameNodes and DataNodes? How-many components are already running so far? Are they connected(e.g. NN and DN) and only RS is having trouble connecting to HM? Although telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" from RS just to double check?
> Thanks
> 
> On 2020/03/20 23:45:28, Manuel Sopena Ballesteros <ma...@garvan.org.au> wrote:
> > Dear HBase community,
> >
> > I am having an issue with my ambari hbase deployment where regionserver is not able to connect to master
> >
> > Hbase Master log files:
> > 2020-03-21 02:36:53,614 INFO [Thread-16] master.ServerManager: Waiting on regionserver count=0; waited=3174901ms, expecting min=1 server(s), max=NO_LIMIT server(s), timeout=30000ms, lastChange=-3174901ms
> > 2020-03-21 02:36:54,287 WARN [master/gl-hdp-ctrl03:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.
> >
> > Hbase region server logs:
> > Caused by: java.net.ConnectException: Call to gl-hdp-ctrl03.local/192.168.20.248:16000 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: gl-hdp-ctrl03.local/192.168.20.248:16000
> > at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:166)
> > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
> > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
> > at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
> > at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
> > at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
> > at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
> >
> >
> > Test connectivity from region server to master
> > $ telnet gl-hdp-ctrl03.local 16000
> > Trying 192.168.20.248...
> > Connected to gl-hdp-ctrl03.local.
> > Escape character is '^]'.
> >
> > Any idea of why region can't connect?
> >
> > Thank you very much
> > NOTICE
> > Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
> >
> 
> NOTICE
> Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
> 

Re: regionserver can't connect to master

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
Hi Jasani,


Which HBase version are you using?

[luffy@gl-hdp-ctrl03 ~]$ hbase version

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase 2.0.2.3.1.0.0-78

Source code repository git://ctr-e138-1518143905142-586755-01-000023.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hbase revision=

Compiled by jenkins on Thu Dec  6 12:27:45 UTC 2018

From source with checksum 015c34650c163b249d16fc7e496a030e


You are bringing up fresh cluster and not doing an upgrade right?

Yes this is a fresh cluster I am deploying through ambari blueprints (I always reset ambari to factory settings before deploy the blueprint)


Has Ambari successfully brought up NameNodes and DataNodes?

I think so

[cid:0cef77dd-f616-45ef-8214-e0bb0006b665]


How-many components are already running so far?

{

  "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services",

  "items" : [

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/AMBARI_METRICS",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "AMBARI_METRICS"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/HBASE",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "HBASE"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/HDFS",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "HDFS"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/HIVE",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "HIVE"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/MAPREDUCE2",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "MAPREDUCE2"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/SMARTSENSE",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "SMARTSENSE"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/SPARK2",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "SPARK2"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/TEZ",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "TEZ"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/YARN",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "YARN"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/ZEPPELIN",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "ZEPPELIN"

      }

    },

    {

      "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/ZOOKEEPER",

      "ServiceInfo" : {

        "cluster_name" : "Grandline",

        "service_name" : "ZOOKEEPER"

      }

    }

  ]

}


Are they connected(e.g. NN and DN) and only RS is having trouble connecting to HM?

Yes, this is my understanding


Although telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" from RS just to double check?

$ nc -zv gl-hdp-ctrl03.local 16000

Ncat: Version 7.50 ( https://nmap.org/ncat )

Ncat: Connected to 192.168.20.248:16000.

Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.


thank you

________________________________
From: Viraj Jasani <vj...@apache.org>
Sent: Sunday, 22 March 2020 2:47:09 AM
To: user@hbase.apache.org
Subject: Re: regionserver can't connect to master

Which HBase version are you using? You are bringing up fresh cluster and not doing an upgrade right? Has Ambari successfully brought up NameNodes and DataNodes? How-many components are already running so far? Are they connected(e.g. NN and DN) and only RS is having trouble connecting to HM? Although telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" from RS just to double check?
Thanks

On 2020/03/20 23:45:28, Manuel Sopena Ballesteros <ma...@garvan.org.au> wrote:
> Dear HBase community,
>
> I am having an issue with my ambari hbase deployment where regionserver is not able to connect to master
>
> Hbase Master log files:
> 2020-03-21 02:36:53,614 INFO [Thread-16] master.ServerManager: Waiting on regionserver count=0; waited=3174901ms, expecting min=1 server(s), max=NO_LIMIT server(s), timeout=30000ms, lastChange=-3174901ms
> 2020-03-21 02:36:54,287 WARN [master/gl-hdp-ctrl03:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.
>
> Hbase region server logs:
> Caused by: java.net.ConnectException: Call to gl-hdp-ctrl03.local/192.168.20.248:16000 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: gl-hdp-ctrl03.local/192.168.20.248:16000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:166)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
>
>
> Test connectivity from region server to master
> $ telnet gl-hdp-ctrl03.local 16000
> Trying 192.168.20.248...
> Connected to gl-hdp-ctrl03.local.
> Escape character is '^]'.
>
> Any idea of why region can't connect?
>
> Thank you very much
> NOTICE
> Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: regionserver can't connect to master

Posted by Viraj Jasani <vj...@apache.org>.
Which HBase version are you using? You are bringing up fresh cluster and not doing an upgrade right? Has Ambari successfully brought up NameNodes and DataNodes? How-many components are already running so far? Are they connected(e.g. NN and DN) and only RS is having trouble connecting to HM? Although telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" from RS just to double check?
Thanks

On 2020/03/20 23:45:28, Manuel Sopena Ballesteros <ma...@garvan.org.au> wrote: 
> Dear HBase community,
> 
> I am having an issue with my ambari hbase deployment where regionserver is not able to connect to master
> 
> Hbase Master log files:
> 2020-03-21 02:36:53,614 INFO [Thread-16] master.ServerManager: Waiting on regionserver count=0; waited=3174901ms, expecting min=1 server(s), max=NO_LIMIT server(s), timeout=30000ms, lastChange=-3174901ms
> 2020-03-21 02:36:54,287 WARN [master/gl-hdp-ctrl03:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.
> 
> Hbase region server logs:
> Caused by: java.net.ConnectException: Call to gl-hdp-ctrl03.local/192.168.20.248:16000 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: gl-hdp-ctrl03.local/192.168.20.248:16000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:166)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
> 
> 
> Test connectivity from region server to master
> $ telnet gl-hdp-ctrl03.local 16000
> Trying 192.168.20.248...
> Connected to gl-hdp-ctrl03.local.
> Escape character is '^]'.
> 
> Any idea of why region can't connect?
> 
> Thank you very much
> NOTICE
> Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
>