You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Bill Sanchez <bi...@gmail.com> on 2013/12/04 00:45:16 UTC

HBase Large Load Issue

Hello,

I am seeking some advice on my hbase issue.  I am trying to configure a
system that will eventually load and store approximately 50GB-80GB of data
daily.  This data consists of files that are roughly 3MB-5MB each with some
reaching 20MB and some as small as 1MB.  The load job does roughly 20,000
puts to the same table spread across an initial set of 20 pre-split regions
on 20 region servers.  During the first load I see some splitting (ending
with around 50 regions) and in subsequent loads the number of regions will
go much higher.

After running similarly sized loads about 4 or 5 times I start to see the
following behavior that I cannot explain.  The table in question has
VERSIONS=1 and some of these test loads use the same data, but not all.
Below is a summary of the behavior along with a few of the configuration
settings I have tried so far.

Environment:

HBase 0.94.13-security with Kerberos enabled
Zookeeper 3.4.5
Hadoop 1.0.4

Symptoms:

1.  Requests per second fall to 0 for all region servers
2.  Log files show socket timeout exceptions after waiting for scans of META
3.  Region servers sometimes eventually show up as dead
4.  Once HBase reaches a broken state some regions show up as in a
transition state indefinitely
5.  All of these issues seem to happen around the time of major compaction
events

This issue seems to be sensitive to hbase.rpc.timeout which I increased
significantly but only served to lengthen the amount of time until I see
socket timeout exceptions.

A few notes:

1.  I don't see massive GC in the gc log.
2.  Originally Snappy compression was enabled, but as a test I turned it
off and it doesn't seem to make any difference in the testing.
3.  The WAL is disabled for the table involved in the load
4.  TeraSort appears to run normally in HDFS
5.  The HBase randomWrite and randomRead tests appear to run normally on
this cluster (although randomWrite does not write anywhere close to 3MB-5MB)
6.  Ganglia is available in my environment

Settings already altered:

1.  hbase.rpc.timeout=900000 (I realize this may be too high)
2.  hbase.regionserver.handler.count=100
3.  ipc.server.max.callqueue.size=10737418240
4.  hbase.regionserver.lease.period=900000
5.  hbase.hregion.majorcompaction=0 (I have been manually compacting
between loads with no difference in behavior)
6.  hbase.hregion.memstore.flush.size=268435456
7.  dfs.datanode.max.xcievers=131072
8.  dfs.datanode.handler.count=100
9.  ipc.server.listen.queue.size=256
10.  -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m
-XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC
11. I have tried other GC settings but they don't seem to have any real
impact on GC performance in this case

Any advice is appreciated.

Thanks

Re: HBase Large Load Issue

Posted by Geovanie Marquez <ge...@gmail.com>.

What is your distributed hardware/services configuration? Where are your
masters and slaves and what spec is maintained by each?

You have compaction set to zero but the issues happen near a major
compaction event, so are you running manual compactions during a heavy put
operation?


On Tue, Dec 3, 2013 at 6:45 PM, Bill Sanchez <bi...@gmail.com>wrote:

> Hello,
>
> I am seeking some advice on my hbase issue.  I am trying to configure a
> system that will eventually load and store approximately 50GB-80GB of data
> daily.  This data consists of files that are roughly 3MB-5MB each with some
> reaching 20MB and some as small as 1MB.  The load job does roughly 20,000
> puts to the same table spread across an initial set of 20 pre-split regions
> on 20 region servers.  During the first load I see some splitting (ending
> with around 50 regions) and in subsequent loads the number of regions will
> go much higher.
>
> After running similarly sized loads about 4 or 5 times I start to see the
> following behavior that I cannot explain.  The table in question has
> VERSIONS=1 and some of these test loads use the same data, but not all.
> Below is a summary of the behavior along with a few of the configuration
> settings I have tried so far.
>
> Environment:
>
> HBase 0.94.13-security with Kerberos enabled
> Zookeeper 3.4.5
> Hadoop 1.0.4
>
> Symptoms:
>
> 1.  Requests per second fall to 0 for all region servers
> 2.  Log files show socket timeout exceptions after waiting for scans of
> META
> 3.  Region servers sometimes eventually show up as dead
> 4.  Once HBase reaches a broken state some regions show up as in a
> transition state indefinitely
> 5.  All of these issues seem to happen around the time of major compaction
> events
>
> This issue seems to be sensitive to hbase.rpc.timeout which I increased
> significantly but only served to lengthen the amount of time until I see
> socket timeout exceptions.
>
> A few notes:
>
> 1.  I don't see massive GC in the gc log.
> 2.  Originally Snappy compression was enabled, but as a test I turned it
> off and it doesn't seem to make any difference in the testing.
> 3.  The WAL is disabled for the table involved in the load
> 4.  TeraSort appears to run normally in HDFS
> 5.  The HBase randomWrite and randomRead tests appear to run normally on
> this cluster (although randomWrite does not write anywhere close to
> 3MB-5MB)
> 6.  Ganglia is available in my environment
>
> Settings already altered:
>
> 1.  hbase.rpc.timeout=900000 (I realize this may be too high)
> 2.  hbase.regionserver.handler.count=100
> 3.  ipc.server.max.callqueue.size=10737418240
> 4.  hbase.regionserver.lease.period=900000
> 5.  hbase.hregion.majorcompaction=0 (I have been manually compacting
> between loads with no difference in behavior)
> 6.  hbase.hregion.memstore.flush.size=268435456
> 7.  dfs.datanode.max.xcievers=131072
> 8.  dfs.datanode.handler.count=100
> 9.  ipc.server.listen.queue.size=256
> 10.  -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m
> -XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC
> 11. I have tried other GC settings but they don't seem to have any real
> impact on GC performance in this case
>
> Any advice is appreciated.
>
> Thanks
>

Re: HBase Large Load Issue

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Like Vladimir is saying. Do you have any need of storing the files into
HBase? 20mb is pretty big. Can you not just store the file into HDFS and
store only the path of the file into HBase?

Do you have the logs of when the servers are died? Any GC pause?

JM


2013/12/3 Vladimir Rodionov <vr...@carrieriq.com>

> >>Any advice is appreciated.
>
> Do not store your files in HBase, store only references.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Bill Sanchez [bill.sanchez2487@gmail.com]
> Sent: Tuesday, December 03, 2013 3:45 PM
> To: user@hbase.apache.org
> Subject: HBase Large Load Issue
>
> Hello,
>
> I am seeking some advice on my hbase issue.  I am trying to configure a
> system that will eventually load and store approximately 50GB-80GB of data
> daily.  This data consists of files that are roughly 3MB-5MB each with some
> reaching 20MB and some as small as 1MB.  The load job does roughly 20,000
> puts to the same table spread across an initial set of 20 pre-split regions
> on 20 region servers.  During the first load I see some splitting (ending
> with around 50 regions) and in subsequent loads the number of regions will
> go much higher.
>
> After running similarly sized loads about 4 or 5 times I start to see the
> following behavior that I cannot explain.  The table in question has
> VERSIONS=1 and some of these test loads use the same data, but not all.
> Below is a summary of the behavior along with a few of the configuration
> settings I have tried so far.
>
> Environment:
>
> HBase 0.94.13-security with Kerberos enabled
> Zookeeper 3.4.5
> Hadoop 1.0.4
>
> Symptoms:
>
> 1.  Requests per second fall to 0 for all region servers
> 2.  Log files show socket timeout exceptions after waiting for scans of
> META
> 3.  Region servers sometimes eventually show up as dead
> 4.  Once HBase reaches a broken state some regions show up as in a
> transition state indefinitely
> 5.  All of these issues seem to happen around the time of major compaction
> events
>
> This issue seems to be sensitive to hbase.rpc.timeout which I increased
> significantly but only served to lengthen the amount of time until I see
> socket timeout exceptions.
>
> A few notes:
>
> 1.  I don't see massive GC in the gc log.
> 2.  Originally Snappy compression was enabled, but as a test I turned it
> off and it doesn't seem to make any difference in the testing.
> 3.  The WAL is disabled for the table involved in the load
> 4.  TeraSort appears to run normally in HDFS
> 5.  The HBase randomWrite and randomRead tests appear to run normally on
> this cluster (although randomWrite does not write anywhere close to
> 3MB-5MB)
> 6.  Ganglia is available in my environment
>
> Settings already altered:
>
> 1.  hbase.rpc.timeout=900000 (I realize this may be too high)
> 2.  hbase.regionserver.handler.count=100
> 3.  ipc.server.max.callqueue.size=10737418240
> 4.  hbase.regionserver.lease.period=900000
> 5.  hbase.hregion.majorcompaction=0 (I have been manually compacting
> between loads with no difference in behavior)
> 6.  hbase.hregion.memstore.flush.size=268435456
> 7.  dfs.datanode.max.xcievers=131072
> 8.  dfs.datanode.handler.count=100
> 9.  ipc.server.listen.queue.size=256
> 10.  -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m
> -XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC
> 11. I have tried other GC settings but they don't seem to have any real
> impact on GC performance in this case
>
> Any advice is appreciated.
>
> Thanks
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: HBase Large Load Issue

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

>>Any advice is appreciated.

Do not store your files in HBase, store only references.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Bill Sanchez [bill.sanchez2487@gmail.com]
Sent: Tuesday, December 03, 2013 3:45 PM
To: user@hbase.apache.org
Subject: HBase Large Load Issue

Hello,

I am seeking some advice on my hbase issue.  I am trying to configure a
system that will eventually load and store approximately 50GB-80GB of data
daily.  This data consists of files that are roughly 3MB-5MB each with some
reaching 20MB and some as small as 1MB.  The load job does roughly 20,000
puts to the same table spread across an initial set of 20 pre-split regions
on 20 region servers.  During the first load I see some splitting (ending
with around 50 regions) and in subsequent loads the number of regions will
go much higher.

After running similarly sized loads about 4 or 5 times I start to see the
following behavior that I cannot explain.  The table in question has
VERSIONS=1 and some of these test loads use the same data, but not all.
Below is a summary of the behavior along with a few of the configuration
settings I have tried so far.

Environment:

HBase 0.94.13-security with Kerberos enabled
Zookeeper 3.4.5
Hadoop 1.0.4

Symptoms:

1.  Requests per second fall to 0 for all region servers
2.  Log files show socket timeout exceptions after waiting for scans of META
3.  Region servers sometimes eventually show up as dead
4.  Once HBase reaches a broken state some regions show up as in a
transition state indefinitely
5.  All of these issues seem to happen around the time of major compaction
events

This issue seems to be sensitive to hbase.rpc.timeout which I increased
significantly but only served to lengthen the amount of time until I see
socket timeout exceptions.

A few notes:

1.  I don't see massive GC in the gc log.
2.  Originally Snappy compression was enabled, but as a test I turned it
off and it doesn't seem to make any difference in the testing.
3.  The WAL is disabled for the table involved in the load
4.  TeraSort appears to run normally in HDFS
5.  The HBase randomWrite and randomRead tests appear to run normally on
this cluster (although randomWrite does not write anywhere close to 3MB-5MB)
6.  Ganglia is available in my environment

Settings already altered:

1.  hbase.rpc.timeout=900000 (I realize this may be too high)
2.  hbase.regionserver.handler.count=100
3.  ipc.server.max.callqueue.size=10737418240
4.  hbase.regionserver.lease.period=900000
5.  hbase.hregion.majorcompaction=0 (I have been manually compacting
between loads with no difference in behavior)
6.  hbase.hregion.memstore.flush.size=268435456
7.  dfs.datanode.max.xcievers=131072
8.  dfs.datanode.handler.count=100
9.  ipc.server.listen.queue.size=256
10.  -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m
-XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC
11. I have tried other GC settings but they don't seem to have any real
impact on GC performance in this case

Any advice is appreciated.

Thanks

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.