You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rahul Ravindran <ra...@yahoo.com> on 2013/06/04 20:48:52 UTC

Scan + Gets are disk bound

Hi,

We are relatively new to Hbase, and we are hitting a roadblock on our scan performance. I searched through the email archives and applied a bunch of the recommendations there, but they did not improve much. So, I am hoping I am missing something which you could guide me towards. Thanks in advance.

We are currently writing data and reading in an almost continuous mode (stream of data written into an HBase table and then we run a time-based MR on top of this Table). We currently were backed up and about 1.5 TB of data was loaded into the table and we began performing time-based scan MRs in 10 minute time intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute interval had about 100 GB of data to process. 

Our workflow was to primarily eliminate duplicates from this table. We have  maxVersions = 5 for the table. We use TableInputFormat to perform the time-based scan to ensure data locality. In the mapper, we check if there exists a previous version of the row in a time period earlier to the timestamp of the input row. If not, we emit that row. 

We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned off block cache for this table with the expectation that the block index and bloom filter will be cached in the block cache. We expect duplicates to be rare and hence hope for most of these checks to be fulfilled by the bloom filter. Unfortunately, we notice very slow performance on account of being disk bound. Looking at jstack, we notice that most of the time, we appear to be hitting disk for the block index. We performed a major compaction and retried and performance improved some, but not by much. We are processing data at about 2 MB per second.

  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). HBase is running with 30 GB Heap size, memstore values being capped at 3 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 GB). We are using SNAPPY for our tables.


A couple of questions:
	* Is the performance of the time-based scan bad after a major compaction?

	* What can we do to help alleviate being disk bound? The typical answer of adding more RAM does not seem to have helped, or we are missing some other config



Below are some of the metrics from a Regionserver webUI:

requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=27592222, blockCacheMissCount=25373411, blockCacheEvictedCount=7112, blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, fsReadLatencyHistogram999th=511591146.03,
 fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, fsPreadLatencyHistogram95th=11159637.65, fsPreadLatencyHistogram99th=37763281.57, fsPreadLatencyHistogram999th=273192813.91, fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, fsWriteLatencyHistogram95th=576853.8, fsWriteLatencyHistogram99th=1034159.75, fsWriteLatencyHistogram999th=5687910.29



key size: 20 bytes 

Table description:
{NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFI true
 LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '5', TTL => '
 2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_
 ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}

Re: Scan + Gets are disk bound

Posted by Rahul Ravindran <ra...@yahoo.com>.
Thanks for that confirmation. This is what we hypothesized as well.

So, if we are dependent on timerange scans, we need to completely avoid major compaction and depend only on minor compactions? Is there any downside? We do have a TTL set on all the rows in the table.
~Rahul.


________________________________
 From: Anoop John <an...@gmail.com>
To: user@hbase.apache.org; Rahul Ravindran <ra...@yahoo.com> 
Cc: anil gupta <an...@gmail.com> 
Sent: Tuesday, June 4, 2013 10:44 PM
Subject: Re: Scan + Gets are disk bound
 

When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.



-Anoop-

On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Our row-keys do not contain time. By time-based scans, I mean, an MR over
> the Hbase table where the scan object has no startRow or endRow but has a
> startTime and endTime.
>
> Our row key format is <MD5 of UUID>+UUID, so, we expect good distribution.
> We have pre-split initially to prevent any initial hotspotting.
> ~Rahul.
>
>
> ________________________________
>  From: anil gupta <an...@gmail.com>
> To: user@hbase.apache.org; Rahul Ravindran <ra...@yahoo.com>
> Sent: Tuesday, June 4, 2013 9:31 PM
> Subject: Re: Scan + Gets are disk bound
>
>
>
>
>
>
>
>
> On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran <ra...@yahoo.com>
> wrote:
>
> Hi,
> >
> >We are relatively new to Hbase, and we are hitting a roadblock on our
> scan performance. I searched through the email archives and applied a bunch
> of the recommendations there, but they did not improve much. So, I am
> hoping I am missing something which you could guide me towards. Thanks in
> advance.
> >
> >We are currently writing data and reading in an almost continuous mode
> (stream of data written into an HBase table and then we run a time-based MR
> on top of this Table). We currently were backed up and about 1.5 TB of data
> was loaded into the table and we began performing time-based scan MRs in 10
> minute time intervals(startTime and endTime interval is 10 minutes). Most
> of the 10 minute interval had about 100 GB of data to process.
> >
> >Our workflow was to primarily eliminate duplicates from this table. We
> have  maxVersions = 5 for the table. We use TableInputFormat to perform the
> time-based scan to ensure data locality. In the mapper, we check if there
> exists a previous version of the row in a time period earlier to the
> timestamp of the input row. If not, we emit that row.
> >
> >We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
> turned off block cache for this table with the expectation that the block
> index and bloom filter will be cached in the block cache. We expect
> duplicates to be rare and hence hope for most of these checks to be
> fulfilled by the bloom filter. Unfortunately, we notice very slow
> performance on account of being disk bound. Looking at jstack, we notice
> that most of the time, we appear to be hitting disk for the block index. We
> performed a major compaction and retried and performance improved some, but
> not by much. We are processing data at about 2 MB per second.
> >
> >  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
> datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
> Anil: You dont have the right balance between disk,cpu and ram. You have
> too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
> have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
> to be the biggest reason of your problem.
>
> HBase is running with 30 GB Heap size, memstore values being capped at 3
> GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
> heap size(15 GB). We are using SNAPPY for our tables.
> >
> >
> >A couple of questions:
> >        * Is the performance of the time-based scan bad after a major
> compaction?
> >
> Anil: In general, TimeBased(i am assuming you have built your rowkey on
> timestamp) scans are not good for HBase because of region hot-spotting.
> Have you tried setting the ScannerCaching to a higher number?
>
>
> >        * What can we do to help alleviate being disk bound? The typical
> answer of adding more RAM does not seem to have helped, or we are missing
> some other config
> >
> Anil: Try adding more disks to your machines.
>
>
> >
> >
> >Below are some of the metrics from a Regionserver webUI:
> >
> >requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
> numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
> totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
> flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
> blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
> blockCacheHitCount=27592222, blockCacheMissCount=25373411,
> blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
> blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
> slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
> fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
> fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
> fsReadLatencyHistogram99th=100981301.2,
> fsReadLatencyHistogram999th=511591146.03,
> > fsPreadLatencyHistogramMean=3895616.6,
> fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552,
> fsPreadLatencyHistogram75th=8723662.5,
> fsPreadLatencyHistogram95th=11159637.65,
> fsPreadLatencyHistogram99th=37763281.57,
> fsPreadLatencyHistogram999th=273192813.91,
> fsWriteLatencyHistogramMean=6124343.91,
> fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379,
> fsWriteLatencyHistogram75th=431395.75,
> fsWriteLatencyHistogram95th=576853.8,
> fsWriteLatencyHistogram99th=1034159.75,
> fsWriteLatencyHistogram999th=5687910.29
> >
> >
> >
> >key size: 20 bytes
> >
> >Table description:
> >{NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFI true
> > LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '5', TTL => '
> > 2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE
> => '65536', ENCODE_
> > ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Scan + Gets are disk bound

Posted by Anoop John <an...@gmail.com>.
When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.



-Anoop-

On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Our row-keys do not contain time. By time-based scans, I mean, an MR over
> the Hbase table where the scan object has no startRow or endRow but has a
> startTime and endTime.
>
> Our row key format is <MD5 of UUID>+UUID, so, we expect good distribution.
> We have pre-split initially to prevent any initial hotspotting.
> ~Rahul.
>
>
> ________________________________
>  From: anil gupta <an...@gmail.com>
> To: user@hbase.apache.org; Rahul Ravindran <ra...@yahoo.com>
> Sent: Tuesday, June 4, 2013 9:31 PM
> Subject: Re: Scan + Gets are disk bound
>
>
>
>
>
>
>
>
> On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran <ra...@yahoo.com>
> wrote:
>
> Hi,
> >
> >We are relatively new to Hbase, and we are hitting a roadblock on our
> scan performance. I searched through the email archives and applied a bunch
> of the recommendations there, but they did not improve much. So, I am
> hoping I am missing something which you could guide me towards. Thanks in
> advance.
> >
> >We are currently writing data and reading in an almost continuous mode
> (stream of data written into an HBase table and then we run a time-based MR
> on top of this Table). We currently were backed up and about 1.5 TB of data
> was loaded into the table and we began performing time-based scan MRs in 10
> minute time intervals(startTime and endTime interval is 10 minutes). Most
> of the 10 minute interval had about 100 GB of data to process.
> >
> >Our workflow was to primarily eliminate duplicates from this table. We
> have  maxVersions = 5 for the table. We use TableInputFormat to perform the
> time-based scan to ensure data locality. In the mapper, we check if there
> exists a previous version of the row in a time period earlier to the
> timestamp of the input row. If not, we emit that row.
> >
> >We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
> turned off block cache for this table with the expectation that the block
> index and bloom filter will be cached in the block cache. We expect
> duplicates to be rare and hence hope for most of these checks to be
> fulfilled by the bloom filter. Unfortunately, we notice very slow
> performance on account of being disk bound. Looking at jstack, we notice
> that most of the time, we appear to be hitting disk for the block index. We
> performed a major compaction and retried and performance improved some, but
> not by much. We are processing data at about 2 MB per second.
> >
> >  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
> datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
> Anil: You dont have the right balance between disk,cpu and ram. You have
> too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
> have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
> to be the biggest reason of your problem.
>
> HBase is running with 30 GB Heap size, memstore values being capped at 3
> GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
> heap size(15 GB). We are using SNAPPY for our tables.
> >
> >
> >A couple of questions:
> >        * Is the performance of the time-based scan bad after a major
> compaction?
> >
> Anil: In general, TimeBased(i am assuming you have built your rowkey on
> timestamp) scans are not good for HBase because of region hot-spotting.
> Have you tried setting the ScannerCaching to a higher number?
>
>
> >        * What can we do to help alleviate being disk bound? The typical
> answer of adding more RAM does not seem to have helped, or we are missing
> some other config
> >
> Anil: Try adding more disks to your machines.
>
>
> >
> >
> >Below are some of the metrics from a Regionserver webUI:
> >
> >requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
> numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
> totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
> flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
> blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
> blockCacheHitCount=27592222, blockCacheMissCount=25373411,
> blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
> blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
> slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
> fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
> fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
> fsReadLatencyHistogram99th=100981301.2,
> fsReadLatencyHistogram999th=511591146.03,
> > fsPreadLatencyHistogramMean=3895616.6,
> fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552,
> fsPreadLatencyHistogram75th=8723662.5,
> fsPreadLatencyHistogram95th=11159637.65,
> fsPreadLatencyHistogram99th=37763281.57,
> fsPreadLatencyHistogram999th=273192813.91,
> fsWriteLatencyHistogramMean=6124343.91,
> fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379,
> fsWriteLatencyHistogram75th=431395.75,
> fsWriteLatencyHistogram95th=576853.8,
> fsWriteLatencyHistogram99th=1034159.75,
> fsWriteLatencyHistogram999th=5687910.29
> >
> >
> >
> >key size: 20 bytes
> >
> >Table description:
> >{NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFI true
> > LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '5', TTL => '
> > 2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE
> => '65536', ENCODE_
> > ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Scan + Gets are disk bound

Posted by Rahul Ravindran <ra...@yahoo.com>.
Our row-keys do not contain time. By time-based scans, I mean, an MR over the Hbase table where the scan object has no startRow or endRow but has a startTime and endTime.

Our row key format is <MD5 of UUID>+UUID, so, we expect good distribution. We have pre-split initially to prevent any initial hotspotting.
~Rahul.


________________________________
 From: anil gupta <an...@gmail.com>
To: user@hbase.apache.org; Rahul Ravindran <ra...@yahoo.com> 
Sent: Tuesday, June 4, 2013 9:31 PM
Subject: Re: Scan + Gets are disk bound
 







On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran <ra...@yahoo.com> wrote:

Hi,
>
>We are relatively new to Hbase, and we are hitting a roadblock on our scan performance. I searched through the email archives and applied a bunch of the recommendations there, but they did not improve much. So, I am hoping I am missing something which you could guide me towards. Thanks in advance.
>
>We are currently writing data and reading in an almost continuous mode (stream of data written into an HBase table and then we run a time-based MR on top of this Table). We currently were backed up and about 1.5 TB of data was loaded into the table and we began performing time-based scan MRs in 10 minute time intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute interval had about 100 GB of data to process. 
>
>Our workflow was to primarily eliminate duplicates from this table. We have  maxVersions = 5 for the table. We use TableInputFormat to perform the time-based scan to ensure data locality. In the mapper, we check if there exists a previous version of the row in a time period earlier to the timestamp of the input row. If not, we emit that row. 
>
>We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned off block cache for this table with the expectation that the block index and bloom filter will be cached in the block cache. We expect duplicates to be rare and hence hope for most of these checks to be fulfilled by the bloom filter. Unfortunately, we notice very slow performance on account of being disk bound. Looking at jstack, we notice that most of the time, we appear to be hitting disk for the block index. We performed a major compaction and retried and performance improved some, but not by much. We are processing data at about 2 MB per second.
>
>  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). 
Anil: You dont have the right balance between disk,cpu and ram. You have too much of CPU, RAM but very less NUMBER of disks. Usually, its better to have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems to be the biggest reason of your problem.

HBase is running with 30 GB Heap size, memstore values being capped at 3 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 GB). We are using SNAPPY for our tables.
>
>
>A couple of questions:
>        * Is the performance of the time-based scan bad after a major compaction?
>
Anil: In general, TimeBased(i am assuming you have built your rowkey on timestamp) scans are not good for HBase because of region hot-spotting. Have you tried setting the ScannerCaching to a higher number?


>        * What can we do to help alleviate being disk bound? The typical answer of adding more RAM does not seem to have helped, or we are missing some other config
>
Anil: Try adding more disks to your machines. 


>
>
>Below are some of the metrics from a Regionserver webUI:
>
>requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=27592222, blockCacheMissCount=25373411, blockCacheEvictedCount=7112, blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, fsReadLatencyHistogram999th=511591146.03,
> fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, fsPreadLatencyHistogram95th=11159637.65, fsPreadLatencyHistogram99th=37763281.57, fsPreadLatencyHistogram999th=273192813.91, fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, fsWriteLatencyHistogram95th=576853.8, fsWriteLatencyHistogram99th=1034159.75, fsWriteLatencyHistogram999th=5687910.29
>
>
>
>key size: 20 bytes 
>
>Table description:
>{NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFI true
> LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '5', TTL => '
> 2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_
> ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}


-- 
Thanks & Regards,
Anil Gupta 

Re: Scan + Gets are disk bound

Posted by anil gupta <an...@gmail.com>.
On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Hi,
>
> We are relatively new to Hbase, and we are hitting a roadblock on our scan
> performance. I searched through the email archives and applied a bunch of
> the recommendations there, but they did not improve much. So, I am hoping I
> am missing something which you could guide me towards. Thanks in advance.
>
> We are currently writing data and reading in an almost continuous mode
> (stream of data written into an HBase table and then we run a time-based MR
> on top of this Table). We currently were backed up and about 1.5 TB of data
> was loaded into the table and we began performing time-based scan MRs in 10
> minute time intervals(startTime and endTime interval is 10 minutes). Most
> of the 10 minute interval had about 100 GB of data to process.
>
> Our workflow was to primarily eliminate duplicates from this table. We
> have  maxVersions = 5 for the table. We use TableInputFormat to perform the
> time-based scan to ensure data locality. In the mapper, we check if there
> exists a previous version of the row in a time period earlier to the
> timestamp of the input row. If not, we emit that row.
>
> We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
> turned off block cache for this table with the expectation that the block
> index and bloom filter will be cached in the block cache. We expect
> duplicates to be rare and hence hope for most of these checks to be
> fulfilled by the bloom filter. Unfortunately, we notice very slow
> performance on account of being disk bound. Looking at jstack, we notice
> that most of the time, we appear to be hitting disk for the block index. We
> performed a major compaction and retried and performance improved some, but
> not by much. We are processing data at about 2 MB per second.
>
>   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
> datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).

Anil: You dont have the right balance between disk,cpu and ram. You have
too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
to be the biggest reason of your problem.

> HBase is running with 30 GB Heap size, memstore values being capped at 3
> GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
> heap size(15 GB). We are using SNAPPY for our tables.
>
>
> A couple of questions:
>         * Is the performance of the time-based scan bad after a major
> compaction?
>
Anil: In general, TimeBased(i am assuming you have built your rowkey on
timestamp) scans are not good for HBase because of region hot-spotting.
Have you tried setting the ScannerCaching to a higher number?

>
>         * What can we do to help alleviate being disk bound? The typical
> answer of adding more RAM does not seem to have helped, or we are missing
> some other config
>
Anil: Try adding more disks to your machines.

>
>
>
> Below are some of the metrics from a Regionserver webUI:
>
> requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
> numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
> totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
> flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
> blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
> blockCacheHitCount=27592222, blockCacheMissCount=25373411,
> blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
> blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
> slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
> fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
> fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
> fsReadLatencyHistogram99th=100981301.2,
> fsReadLatencyHistogram999th=511591146.03,
>  fsPreadLatencyHistogramMean=3895616.6,
> fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552,
> fsPreadLatencyHistogram75th=8723662.5,
> fsPreadLatencyHistogram95th=11159637.65,
> fsPreadLatencyHistogram99th=37763281.57,
> fsPreadLatencyHistogram999th=273192813.91,
> fsWriteLatencyHistogramMean=6124343.91,
> fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379,
> fsWriteLatencyHistogram75th=431395.75,
> fsWriteLatencyHistogram95th=576853.8,
> fsWriteLatencyHistogram99th=1034159.75,
> fsWriteLatencyHistogram999th=5687910.29
>
>
>
> key size: 20 bytes
>
> Table description:
> {NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFI true
>  LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '5', TTL => '
>  2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE
> => '65536', ENCODE_
>  ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}




-- 
Thanks & Regards,
Anil Gupta

Re: Scan + Gets are disk bound

Posted by Rahul Ravindran <ra...@yahoo.com>.
Thanks for the approach you suggested Asaf. This is definitely very promising. Our use case is that, we have a raw stream of events which may have duplicates. After our HBase + MR processing, we would emit a de-duped stream (which would have duplicates eliminated) for later processing. Let me see if I understand your approach correctly:
	* During major compaction, we emit only the earliest event. I understand this.
	* Between major compactions, we would need only return the earliest event in the scan. However, we would no longer take advantage of the timerange scan since we would need to consider previously compacted files as well(an earlier duplicate could exist in a previously major-compacted hfile, hence we need to skip returning this row in the scan). This would mean the scan would need to be a full - table scan or we perform an exists() call in the prescan hook for an earlier version of the row? 
Thanks,
~Rahul.


________________________________
 From: Asaf Mesika <as...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; Rahul Ravindran <ra...@yahoo.com> 
Sent: Tuesday, June 4, 2013 10:51 PM
Subject: Re: Scan + Gets are disk bound
 




On Tuesday, June 4, 2013, Rahul Ravindran  wrote:

Hi,
>
>We are relatively new to Hbase, and we are hitting a roadblock on our scan performance. I searched through the email archives and applied a bunch of the recommendations there, but they did not improve much. So, I am hoping I am missing something which you could guide me towards. Thanks in advance.
>
>We are currently writing data and reading in an almost continuous mode (stream of data written into an HBase table and then we run a time-based MR on top of this Table). We currently were backed up and about 1.5 TB of data was loaded into the table and we began performing time-based scan MRs in 10 minute time intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute interval had about 100 GB of data to process. 
>
>Our workflow was to primarily eliminate duplicates from this table. We have  maxVersions = 5 for the table. We use TableInputFormat to perform the time-based scan to ensure data locality. In the mapper, we check if there exists a previous version of the row in a time period earlier to the timestamp of the input row. If not, we emit that row.
If I understand correctly, for a rowkey R, column family F, column qualifier C, if you have two values with time stamp 13:00 and 13:02, you want to remove the value associated with 13:02.

The best way to do this is  to write a simple RegionObserver Coprocessor, which hooks to the compaction process (preCompact for instance). In there simply, for any given R, F, C only emit the earliest timestamp value (the last, since timestamp is ordered descending), and that's it.
It's a very effective way, since you are "riding" on top of an existing process which reads the values either way, so you are not paying the price of reading it again your MR job. 
Also, in between major compactions, you can also implement the preScan hook in the region observer, so you'll pick up only the earliest timestamp value, thus achieving the same result for your client, although you haven't removed those values yet.

I've implemented this for counters delayed aggregations, and it works great in production.

 

>We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned off block cache for this table with the expectation that the block index and bloom filter will be cached in the block cache. We expect duplicates to be rare and hence hope for most of these checks to be fulfilled by the bloom filter. Unfortunately, we notice very slow performance on account of being disk bound. Looking at jstack, we notice that most of the time, we appear to be hitting disk for the block index. We performed a major compaction and retried and performance improved some, but not by much. We are processing data at about 2 MB per second.
>
>  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). HBase is running with 30 GB Heap size, memstore values being capped at 3 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 GB). We are using SNAPPY for our tables.
>
>
>A couple of questions:
>        * Is the performance of the time-based scan bad after a major compaction?
>
>        * What can we do to help alleviate being disk bound? The typical answer of adding more RAM does not seem to have helped, or we are missing some other config
>
>
>
>Below are some of the metrics from a Regionserver webUI:
>
>requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=27592222, blockCacheMissCount=25373411, blockCacheEvictedCount=7112, blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, fsReadLatencyHistogram999th=511591146.03,
> fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, fsPreadLatencyHistogram95th=11159637.65, fsPreadLatencyHistogram99th=37763281.57, fsPreadLatencyHistogram999th=273192813.91, fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, fsWriteLatencyHistogram95th=576853.8, fsWriteLatencyHistogram99th=1034159.75, fsWriteLatencyHistogram999th=5687910.29
>
>
>
>key size: 20 bytes 
>
>Table description:
>{NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFI true
> LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY', VERSIONS => '5', TTL => '
> 2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_
> ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}

Re: Scan + Gets are disk bound

Posted by Asaf Mesika <as...@gmail.com>.
On Tuesday, June 4, 2013, Rahul Ravindran wrote:

> Hi,
>
> We are relatively new to Hbase, and we are hitting a roadblock on our scan
> performance. I searched through the email archives and applied a bunch of
> the recommendations there, but they did not improve much. So, I am hoping I
> am missing something which you could guide me towards. Thanks in advance.
>
> We are currently writing data and reading in an almost continuous mode
> (stream of data written into an HBase table and then we run a time-based MR
> on top of this Table). We currently were backed up and about 1.5 TB of data
> was loaded into the table and we began performing time-based scan MRs in 10
> minute time intervals(startTime and endTime interval is 10 minutes). Most
> of the 10 minute interval had about 100 GB of data to process.
>
> Our workflow was to primarily eliminate duplicates from this table. We
> have  maxVersions = 5 for the table. We use TableInputFormat to perform the
> time-based scan to ensure data locality. In the mapper, we check if there
> exists a previous version of the row in a time period earlier to the
> timestamp of the input row. If not, we emit that row.

If I understand correctly, for a rowkey R, column family F, column
qualifier C, if you have two values with time stamp 13:00 and 13:02, you
want to remove the value associated with 13:02.

The best way to do this is  to write a simple RegionObserver Coprocessor,
which hooks to the compaction process (preCompact for instance). In there
simply, for any given R, F, C only emit the earliest timestamp value (the
last, since timestamp is ordered descending), and that's it.
It's a very effective way, since you are "riding" on top of an existing
process which reads the values either way, so you are not paying the price
of reading it again your MR job.
Also, in between major compactions, you can also implement the preScan hook
in the region observer, so you'll pick up only the earliest timestamp
value, thus achieving the same result for your client, although you haven't
removed those values yet.

I've implemented this for counters delayed aggregations, and it works great
in production.




> We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
> turned off block cache for this table with the expectation that the block
> index and bloom filter will be cached in the block cache. We expect
> duplicates to be rare and hence hope for most of these checks to be
> fulfilled by the bloom filter. Unfortunately, we notice very slow
> performance on account of being disk bound. Looking at jstack, we notice
> that most of the time, we appear to be hitting disk for the block index. We
> performed a major compaction and retried and performance improved some, but
> not by much. We are processing data at about 2 MB per second.
>
>   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
> datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
> HBase is running with 30 GB Heap size, memstore values being capped at 3 GB
> and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap
> size(15 GB). We are using SNAPPY for our tables.
>
>
> A couple of questions:
>         * Is the performance of the time-based scan bad after a major
> compaction?
>
>         * What can we do to help alleviate being disk bound? The typical
> answer of adding more RAM does not seem to have helped, or we are missing
> some other config
>
>
>
> Below are some of the metrics from a Regionserver webUI:
>
> requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
> numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
> totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
> flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
> blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
> blockCacheHitCount=27592222, blockCacheMissCount=25373411,
> blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
> blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
> slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
> fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
> fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
> fsReadLatencyHistogram99th=100981301.2,
> fsReadLatencyHistogram999th=511591146.03,
>  fsPreadLatencyHistogramMean=3895616.6,
> fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552,
> fsPreadLatencyHistogram75th=8723662.5,
> fsPreadLatencyHistogram95th=11159637.65,
> fsPreadLatencyHistogram99th=37763281.57,
> fsPreadLatencyHistogram999th=273192813.91,
> fsWriteLatencyHistogramMean=6124343.91,
> fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379,
> fsWriteLatencyHistogram75th=431395.75,
> fsWriteLatencyHistogram95th=576853.8,
> fsWriteLatencyHistogram99th=1034159.75,
> fsWriteLatencyHistogram999th=5687910.29
>
>
>
> key size: 20 bytes
>
> Table description:
> {NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFI true
>  LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '5', TTL => '
>  2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE
> => '65536', ENCODE_
>  ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}