You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Manjeet Singh <ma...@gmail.com> on 2016/10/20 05:15:16 UTC
Hbase cluster not getting UP one Region server get down
Hi All
Can any one help me to figure out the root cause I have 4 node cluster and
one data node get down , I did not understand why my Hbase Master not able
to get up
I have belo log
ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is
not running yet
at
org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2296)
at
org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:936)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Thanks
Manjeet
--
luv all
Re: Hbase cluster not getting UP one Region server get down
Posted by Manjeet Singh <ma...@gmail.com>.
Hi Dima
My one node get crashed due to hdd crashed and as I have only 4 node I
configure zk on 3 node as it should be in odd figure and my zk was on this
crashed node. Due to this my whole cluster goes down and by mistake after
removing my crashed node I run hdfs balancing which was again problem.
Right now after deleting table and removeing the crashed node I configure
zk on other node my cluster is now up.
Thanks Dima for you reply may be I might have more query I will ask later.
On 21 Oct 2016 04:08, "Dima Spivak" <di...@apache.org> wrote:
> It can be lots of things, Manjeet. You've gotta do a bit of troubleshooting
> yourself first; a long dump of your machine specs doesn't change that.
>
> Can you describe what happened before/after the node went down? The log
> just says server isn't running, so we can't tell much from that alone.
>
> -Dima
>
> On Wed, Oct 19, 2016 at 10:53 PM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> wrote:
>
> > I want to add few more points
> >
> >
> > below is my cluster configuration
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Distribution*
> >
> > * Total*
> >
> > *Distribution*
> >
> > *OS (RAID-1)*
> >
> > *DATA*
> >
> > *Total RAM*
> >
> > *Components*
> >
> > *Yarn Resource manager/ Node manager*
> >
> > *Node*
> >
> > Node- 1
> >
> > 2x6 Core
> >
> > 12 core
> >
> > 6x300 GB
> >
> > 300
> >
> > Single 900 GB RAID-10
> >
> > 96
> >
> > Hbase Master, HDFS Name Node, Zookeeper Server, Spark History server,
> > phoenix, HDFS Balancer, Spark getway. MySql.
> >
> > · YARN (MR2 Included) JobHistory Server
> > <http://192.168.129.121:7180/cmf/services/10/instances/25/status>.
> >
> > · ResourceManager
> > <http://192.168.129.121:7180/cmf/services/10/instances/26/status>
> >
> > Name Node
> >
> > Node- 2
> >
> > 2x6 Core
> >
> > 12 core
> >
> > 6x300 GB
> >
> > 300
> >
> > 300 GB X 6 Individual RAID-0
> >
> > 80
> >
> > Hdfs data node, Hbase Region, Zookeeper Server, spark, Hbase Master,
> >
> > YARN (MR2 Included) NodeManager
> >
> > Data Node, Spark Node
> >
> > Node- 3
> >
> > 2x6 Core
> >
> > 12 core
> >
> > 6x300 GB
> >
> > 300
> >
> > 300 GB X 6 Individual RAID-0
> >
> > 80
> >
> > Hdfs data node, Hbase Region, Zookeeper Server, spark
> >
> > YARN (MR2 Included) NodeManager
> >
> > Data Node, Spark Node
> >
> > Node - 4
> >
> > 2x6 Core
> >
> > 12 core
> >
> > 8x300 GB
> >
> > 300
> >
> > 300 GB X 6 Individual RAID-0
> >
> > 80
> >
> > Hdfs data node, Hbase Region, spark
> >
> > YARN (MR2 Included) NodeManager
> >
> > Data Node, Spark Node
> >
> >
> >
> >
> >
> >
> >
> >
> > I noticed that Hbase taking more time while reading so i use below
> property
> > to improve its performance
> >
> > *Property Name*
> >
> > *Original value*
> >
> > *Changed value*
> >
> > hfile.block.cache.size
> >
> > 0.4
> >
> > 0.6
> >
> > hbase.regionserver.global.memstore.size
> >
> > 0.4
> >
> > 0.2
> >
> >
> > below is some more information
> >
> > I have Spark ETL jobon same cluster and I have below parameters after
> > running this job
> >
> >
> >
> > *Parameter *
> >
> > *Value*
> >
> > Number of Pipeline
> >
> > 2 (Kafka)
> >
> > Raw Size of Kafka Message
> >
> > 21 GB
> >
> > Data Rate
> >
> > 1 MB/Sec per pipeline
> >
> > Size of Aggregated Data in Hbase
> >
> > 2.6 GB With Snappy and Major Compaction
> >
> > Batch Duration
> >
> > 30 sec
> >
> > Sliding Window , Window Duration
> >
> > 900 Sec [15 Minute]
> >
> > CPU Utilization
> >
> > 63.2 %
> >
> > Number of Executor
> >
> > 3 per pipeline
> >
> > Allocated RAM
> >
> > 3 GB per pipeline
> >
> > Cluster N/W IO
> >
> > 3.2 MB/sec
> >
> > Cluster Disk IO
> >
> > 3.5 MB/Sec
> >
> > Max Time(highest peak) taken by Spark ETL for 900 MB Size of Data to
> > Process data for Domain
> >
> > 2 Hour
> >
> > Max Time(highest peak) taken by Spark ETL for 900 MB Size of Data to
> > Process data for Application
> >
> > 30 Minute
> >
> > Total Time Taken by kafka Simulator to push the data into Kafka
> >
> > 6h
> >
> > Total Time Taken by by Spark ETL to process all the Data
> >
> > 7 h
> >
> > Number of SQL Query
> >
> > 10
> >
> > Number of Profile
> >
> > 9
> >
> > Number of Row in Hbase
> >
> > 11015719
> >
> >
> > Thanks
> > Manjeet
> >
> >
> > On Thu, Oct 20, 2016 at 10:45 AM, Manjeet Singh <
> > manjeet.chandhok@gmail.com>
> > wrote:
> >
> > > Hi All
> > > Can any one help me to figure out the root cause I have 4 node cluster
> > and
> > > one data node get down , I did not understand why my Hbase Master not
> > able
> > > to get up
> > >
> > > I have belo log
> > >
> > > ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException:
> Server
> > > is not running yet
> > > at org.apache.hadoop.hbase.master.HMaster.
> > > checkServiceStarted(HMaster.java:2296)
> > > at org.apache.hadoop.hbase.master.MasterRpcServices.
> > > isMasterRunning(MasterRpcServices.java:936)
> > > at org.apache.hadoop.hbase.protobuf.generated.
> > > MasterProtos$MasterService$2.callBlockingMethod(
> MasterProtos.java:55654)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> > 2170)
> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > java:109)
> > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > RpcExecutor.java:133)
> > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > > java:108)
> > > at java.lang.Thread.run(Thread.java:745)
> > >
> > >
> > > Thanks
> > > Manjeet
> > >
> > > --
> > > luv all
> > >
> >
> >
> >
> > --
> > luv all
> >
>
Re: Hbase cluster not getting UP one Region server get down
Posted by Dima Spivak <di...@apache.org>.
It can be lots of things, Manjeet. You've gotta do a bit of troubleshooting
yourself first; a long dump of your machine specs doesn't change that.
Can you describe what happened before/after the node went down? The log
just says server isn't running, so we can't tell much from that alone.
-Dima
On Wed, Oct 19, 2016 at 10:53 PM, Manjeet Singh <ma...@gmail.com>
wrote:
> I want to add few more points
>
>
> below is my cluster configuration
>
>
>
>
>
>
>
>
>
> *Distribution*
>
> * Total*
>
> *Distribution*
>
> *OS (RAID-1)*
>
> *DATA*
>
> *Total RAM*
>
> *Components*
>
> *Yarn Resource manager/ Node manager*
>
> *Node*
>
> Node- 1
>
> 2x6 Core
>
> 12 core
>
> 6x300 GB
>
> 300
>
> Single 900 GB RAID-10
>
> 96
>
> Hbase Master, HDFS Name Node, Zookeeper Server, Spark History server,
> phoenix, HDFS Balancer, Spark getway. MySql.
>
> · YARN (MR2 Included) JobHistory Server
> <http://192.168.129.121:7180/cmf/services/10/instances/25/status>.
>
> · ResourceManager
> <http://192.168.129.121:7180/cmf/services/10/instances/26/status>
>
> Name Node
>
> Node- 2
>
> 2x6 Core
>
> 12 core
>
> 6x300 GB
>
> 300
>
> 300 GB X 6 Individual RAID-0
>
> 80
>
> Hdfs data node, Hbase Region, Zookeeper Server, spark, Hbase Master,
>
> YARN (MR2 Included) NodeManager
>
> Data Node, Spark Node
>
> Node- 3
>
> 2x6 Core
>
> 12 core
>
> 6x300 GB
>
> 300
>
> 300 GB X 6 Individual RAID-0
>
> 80
>
> Hdfs data node, Hbase Region, Zookeeper Server, spark
>
> YARN (MR2 Included) NodeManager
>
> Data Node, Spark Node
>
> Node - 4
>
> 2x6 Core
>
> 12 core
>
> 8x300 GB
>
> 300
>
> 300 GB X 6 Individual RAID-0
>
> 80
>
> Hdfs data node, Hbase Region, spark
>
> YARN (MR2 Included) NodeManager
>
> Data Node, Spark Node
>
>
>
>
>
>
>
>
> I noticed that Hbase taking more time while reading so i use below property
> to improve its performance
>
> *Property Name*
>
> *Original value*
>
> *Changed value*
>
> hfile.block.cache.size
>
> 0.4
>
> 0.6
>
> hbase.regionserver.global.memstore.size
>
> 0.4
>
> 0.2
>
>
> below is some more information
>
> I have Spark ETL jobon same cluster and I have below parameters after
> running this job
>
>
>
> *Parameter *
>
> *Value*
>
> Number of Pipeline
>
> 2 (Kafka)
>
> Raw Size of Kafka Message
>
> 21 GB
>
> Data Rate
>
> 1 MB/Sec per pipeline
>
> Size of Aggregated Data in Hbase
>
> 2.6 GB With Snappy and Major Compaction
>
> Batch Duration
>
> 30 sec
>
> Sliding Window , Window Duration
>
> 900 Sec [15 Minute]
>
> CPU Utilization
>
> 63.2 %
>
> Number of Executor
>
> 3 per pipeline
>
> Allocated RAM
>
> 3 GB per pipeline
>
> Cluster N/W IO
>
> 3.2 MB/sec
>
> Cluster Disk IO
>
> 3.5 MB/Sec
>
> Max Time(highest peak) taken by Spark ETL for 900 MB Size of Data to
> Process data for Domain
>
> 2 Hour
>
> Max Time(highest peak) taken by Spark ETL for 900 MB Size of Data to
> Process data for Application
>
> 30 Minute
>
> Total Time Taken by kafka Simulator to push the data into Kafka
>
> 6h
>
> Total Time Taken by by Spark ETL to process all the Data
>
> 7 h
>
> Number of SQL Query
>
> 10
>
> Number of Profile
>
> 9
>
> Number of Row in Hbase
>
> 11015719
>
>
> Thanks
> Manjeet
>
>
> On Thu, Oct 20, 2016 at 10:45 AM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> wrote:
>
> > Hi All
> > Can any one help me to figure out the root cause I have 4 node cluster
> and
> > one data node get down , I did not understand why my Hbase Master not
> able
> > to get up
> >
> > I have belo log
> >
> > ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server
> > is not running yet
> > at org.apache.hadoop.hbase.master.HMaster.
> > checkServiceStarted(HMaster.java:2296)
> > at org.apache.hadoop.hbase.master.MasterRpcServices.
> > isMasterRunning(MasterRpcServices.java:936)
> > at org.apache.hadoop.hbase.protobuf.generated.
> > MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> 2170)
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:109)
> > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > RpcExecutor.java:133)
> > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > java:108)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> > Thanks
> > Manjeet
> >
> > --
> > luv all
> >
>
>
>
> --
> luv all
>
Re: Hbase cluster not getting UP one Region server get down
Posted by Manjeet Singh <ma...@gmail.com>.
I want to add few more points
below is my cluster configuration
*Distribution*
* Total*
*Distribution*
*OS (RAID-1)*
*DATA*
*Total RAM*
*Components*
*Yarn Resource manager/ Node manager*
*Node*
Node- 1
2x6 Core
12 core
6x300 GB
300
Single 900 GB RAID-10
96
Hbase Master, HDFS Name Node, Zookeeper Server, Spark History server,
phoenix, HDFS Balancer, Spark getway. MySql.
· YARN (MR2 Included) JobHistory Server
<http://192.168.129.121:7180/cmf/services/10/instances/25/status>.
· ResourceManager
<http://192.168.129.121:7180/cmf/services/10/instances/26/status>
Name Node
Node- 2
2x6 Core
12 core
6x300 GB
300
300 GB X 6 Individual RAID-0
80
Hdfs data node, Hbase Region, Zookeeper Server, spark, Hbase Master,
YARN (MR2 Included) NodeManager
Data Node, Spark Node
Node- 3
2x6 Core
12 core
6x300 GB
300
300 GB X 6 Individual RAID-0
80
Hdfs data node, Hbase Region, Zookeeper Server, spark
YARN (MR2 Included) NodeManager
Data Node, Spark Node
Node - 4
2x6 Core
12 core
8x300 GB
300
300 GB X 6 Individual RAID-0
80
Hdfs data node, Hbase Region, spark
YARN (MR2 Included) NodeManager
Data Node, Spark Node
I noticed that Hbase taking more time while reading so i use below property
to improve its performance
*Property Name*
*Original value*
*Changed value*
hfile.block.cache.size
0.4
0.6
hbase.regionserver.global.memstore.size
0.4
0.2
below is some more information
I have Spark ETL jobon same cluster and I have below parameters after
running this job
*Parameter *
*Value*
Number of Pipeline
2 (Kafka)
Raw Size of Kafka Message
21 GB
Data Rate
1 MB/Sec per pipeline
Size of Aggregated Data in Hbase
2.6 GB With Snappy and Major Compaction
Batch Duration
30 sec
Sliding Window , Window Duration
900 Sec [15 Minute]
CPU Utilization
63.2 %
Number of Executor
3 per pipeline
Allocated RAM
3 GB per pipeline
Cluster N/W IO
3.2 MB/sec
Cluster Disk IO
3.5 MB/Sec
Max Time(highest peak) taken by Spark ETL for 900 MB Size of Data to
Process data for Domain
2 Hour
Max Time(highest peak) taken by Spark ETL for 900 MB Size of Data to
Process data for Application
30 Minute
Total Time Taken by kafka Simulator to push the data into Kafka
6h
Total Time Taken by by Spark ETL to process all the Data
7 h
Number of SQL Query
10
Number of Profile
9
Number of Row in Hbase
11015719
Thanks
Manjeet
On Thu, Oct 20, 2016 at 10:45 AM, Manjeet Singh <ma...@gmail.com>
wrote:
> Hi All
> Can any one help me to figure out the root cause I have 4 node cluster and
> one data node get down , I did not understand why my Hbase Master not able
> to get up
>
> I have belo log
>
> ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server
> is not running yet
> at org.apache.hadoop.hbase.master.HMaster.
> checkServiceStarted(HMaster.java:2296)
> at org.apache.hadoop.hbase.master.MasterRpcServices.
> isMasterRunning(MasterRpcServices.java:936)
> at org.apache.hadoop.hbase.protobuf.generated.
> MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> java:108)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Thanks
> Manjeet
>
> --
> luv all
>
--
luv all