You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Mathews, Jacob 1. (Nokia - IN/Bangalore)" <ja...@nokia.com> on 2021/08/06 07:08:14 UTC
Hbase export is very slow - help needed
Hi HBase team,
We are trying to use Hbase export mentioned here: http://hbase.apache.org/book.html#export
But it is happening sequentially row by row as seen from the logs.
we tried many options of the Hbase export, but all were taking long time.
Backup folder contents size:
bash-4.2$ du -kh
16K ./tsdb-tree
16K ./tsdb-meta
60M ./tsdb-uid
5.9G ./tsdb
6.0G .
took around 104 minutes for 6gb compressed data.
Is there a way we can parallelise this and improve the export time.
Below are the charts from Hbase .
Export with single regionserver:
[cid:image001.jpg@01D78ABF.E9E6C280]
Export with two regionservers:
[cid:image002.jpg@01D78ABF.E9E6C280]
Scaling the HBase Region server also did not help, the export still happens sequentially.
Thanks
Jacob Mathews
Re: Hbase export is very slow - help needed
Posted by Stack <st...@duboce.net>.
Can you host your images elsewhere and instead add links here only? Our
mailing list doesn't allow images inline.
Is the export distributed or running in series? Does upping logging levels
tell you where time is spent (or thread dumping the export task?)
Thanks,
S
On Fri, Aug 6, 2021 at 7:53 AM Mathews, Jacob 1. (Nokia - IN/Bangalore) <
jacob.1.mathews@nokia.com> wrote:
> Hi HBase team,
>
>
>
> We are trying to use Hbase export mentioned here:
> http://hbase.apache.org/book.html#export
>
> But it is happening sequentially row by row as seen from the logs.
>
> we tried many options of the Hbase export, but all were taking long time.
>
>
>
> Backup folder contents size:
>
>
>
> bash-4.2$ du -kh
>
> 16K ./tsdb-tree
>
> 16K ./tsdb-meta
>
> 60M ./tsdb-uid
>
> 5.9G ./tsdb
>
> 6.0G .
>
>
>
> took around 104 minutes for 6gb compressed data.
>
>
>
> Is there a way we can parallelise this and improve the export time.
>
>
>
> Below are the charts from Hbase .
>
>
>
> Export with single regionserver:
>
>
>
>
>
> Export with two regionservers:
>
>
>
>
>
> Scaling the HBase Region server also did not help, the export still
> happens sequentially.
>
>
>
> Thanks
>
> Jacob Mathews
>
Re: Hbase export is very slow - help needed
Posted by Jacob Mathews <ja...@gmail.com>.
Hi Josh,
Thanks for your response.
Can you throw some light on how to tweak the concurrency on the Yarn side. is it accomplished by some settings in yarn-site.xml or mapred-site.xml?
I was also not able to identify any property that can be passed to the HBase Export tool to increase Mapper per region. Any suggestions on this would also be very helpful.
Here is a portion of the logs from HBase Export tool. Perhaps you see something in it which needs to be fixed.
----------------------------
021-10-04 04:22:03,822 INFO [main] mapreduce.RegionSizeCalculator: Calculating region sizes for table "tsdb".
2021-10-04 04:22:04,410 INFO [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x0b25b095 to altiplano-zookeeper:2181
2021-10-04 04:22:04,413 INFO [ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095] zookeeper.ZooKeeper: Session: 0x100000683990011 closed
2021-10-04 04:22:04,413 INFO [ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x100000683990011
2021-10-04 04:22:04,463 INFO [main] mapreduce.JobSubmitter: number of splits:2
2021-10-04 04:22:04,471 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-10-04 04:22:04,620 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1633098453918_0001
2021-10-04 04:22:05,399 INFO [main] impl.YarnClientImpl: Submitted application application_1633098453918_0001
2021-10-04 04:22:05,427 INFO [main] mapreduce.Job: The url to track the job: http://k8s-infra-altiplano-hdfs-yarn-rm:8088/proxy/application_1633098453918_0001/
2021-10-04 04:22:05,427 INFO [main] mapreduce.Job: Running job: job_1633098453918_0001
2021-10-04 04:22:12,535 INFO [main] mapreduce.Job: Job job_1633098453918_0001 running in uber mode : false
2021-10-04 04:22:12,536 INFO [main] mapreduce.Job: map 0% reduce 0%
2021-10-04 04:35:42,713 INFO [main] mapreduce.Job: map 50% reduce 0%
2021-10-04 04:46:26,053 INFO [main] mapreduce.Job: map 100% reduce 0%
2021-10-04 04:46:27,103 INFO [main] mapreduce.Job: Job job_1633098453918_0001 completed successfully
2021-10-04 04:46:27,216 INFO [main] mapreduce.Job: slots (ms)=0
Total time spent by all map tasks (ms)=2900104
Total vcore-milliseconds taken by all map tasks=2900104
Total megabyte-milliseconds taken by all map tasks=2969706496
Map-Reduce Framework
Map input records=149787553
Map output records=149787553
Input split bytes=454
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=10706
CPU time spent (ms)=999190
Physical memory (bytes) snapshot=538783744
Virtual memory (bytes) snapshot=4150628352
Total committed heap usage (bytes)=134217728
HBase Counters
BYTES_IN_REMOTE_RESULTS=13097220466
BYTES_IN_RESULTS=13097220466
MILLIS_BETWEEN_NEXTS=2250916
NOT_SERVING_REGION_EXCEPTION=0
NUM_SCANNER_RESTARTS=0
NUM_SCAN_RESULTS_STALE=0
REGIONS_SCANNED=2
REMOTE_RPC_CALLS=1497876
REMOTE_RPC_RETRIES=0
ROWS_FILTERED=0
ROWS_SCANNED=149787553
RPC_CALLS=1497876
RPC_RETRIES=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=17139600166
------------------------------
Thanks
Jacob Mathews
On 2021/08/19 18:34:31, Josh Elser <el...@apache.org> wrote:
> Export is a MapReduce job, and HBase will only configure a maximum of
> one Mapper per Region in the table being scanned.
>
> If you have multiple regions for your tsdb table, then it's possible
> that you need to tweak the concurrency on the YARN side such that you
> have multiple Mappers running in parallel?
>
> Sounds like looking at the YARN Application log and UI is your next best
> bet.
>
> On 8/18/21 4:52 AM, Nguyen, Tai Van (EXT - VN) wrote:
> > Hi HBase Team
> >
> > Image can see here :
> >
> > * Export with single regionserver: https://imgur.com/86wSUMV
> > <https://imgur.com/86wSUMV>
> > * Export with two regionservers: https://imgur.com/a/XMovlZx
> > <https://imgur.com/a/XMovlZx>
> >
> > Log show about time was:
> >
> > root@solaltiplano-track4-master:~/hbase-exporting/latest# cat hbase_export_compress_default.log | grep export
> > Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
> > tsdb table exported in 6279 seconds
> > tsdb-meta table exported in 6 seconds
> > tsdb-tree table exported in 7 seconds
> > tsdb-uid table exported in 90 seconds
> > Ending hbase export at Fri Jun 11 14:09:08 UTC 2021
> >
> >
> > *
> >
> >
> > Thanks,
> > Tai
> >
> >
> > ------------------------------------------------------------------------
> > *From:* Mathews, Jacob 1. (Nokia - IN/Bangalore) <ja...@nokia.com>
> > *Sent:* Monday, August 16, 2021 6:47 PM
> > *To:* Nguyen, Tai Van (EXT - VN) <ta...@nokia.com>
> > *Subject:* FW: Hbase export is very slow - help needed
> >
> > *From:*Mathews, Jacob 1. (Nokia - IN/Bangalore)
> > *Sent:* Friday, August 6, 2021 12:38 PM
> > *To:* user@hbase.apache.org
> > *Subject:* Hbase export is very slow - help needed
> >
> > Hi HBase team,
> >
> > We are trying to use Hbase export mentioned here:
> > http://hbase.apache.org/book.html#export
> > <http://hbase.apache.org/book.html#export>
> >
> > But it is happening sequentially row by row as seen from the logs.
> >
> > we tried many options of the Hbase export, but all were taking long time.
> >
> > Backup folder contents size:
> >
> > bash-4.2$ du -kh
> >
> > 16K ./tsdb-tree
> >
> > 16K ./tsdb-meta
> >
> > 60M ./tsdb-uid
> >
> > 5.9G ./tsdb
> >
> > 6.0G .
> >
> > took around 104 minutes for 6gb compressed data.
> >
> > Is there a way we can parallelise this and improve the export time.
> >
> > Below are the charts from Hbase .
> >
> > Export with single regionserver:
> >
> > Export with two regionservers:
> >
> > Scaling the HBase Region server also did not help, the export still
> > happens sequentially.
> >
> > Thanks
> >
> > Jacob Mathews
> >
>
Re: Hbase export is very slow - help needed
Posted by Josh Elser <el...@apache.org>.
Export is a MapReduce job, and HBase will only configure a maximum of
one Mapper per Region in the table being scanned.
If you have multiple regions for your tsdb table, then it's possible
that you need to tweak the concurrency on the YARN side such that you
have multiple Mappers running in parallel?
Sounds like looking at the YARN Application log and UI is your next best
bet.
On 8/18/21 4:52 AM, Nguyen, Tai Van (EXT - VN) wrote:
> Hi HBase Team
>
> Image can see here :
>
> * Export with single regionserver: https://imgur.com/86wSUMV
> <https://imgur.com/86wSUMV>
> * Export with two regionservers: https://imgur.com/a/XMovlZx
> <https://imgur.com/a/XMovlZx>
>
> Log show about time was:
>
> root@solaltiplano-track4-master:~/hbase-exporting/latest# cat hbase_export_compress_default.log | grep export
> Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
> tsdb table exported in 6279 seconds
> tsdb-meta table exported in 6 seconds
> tsdb-tree table exported in 7 seconds
> tsdb-uid table exported in 90 seconds
> Ending hbase export at Fri Jun 11 14:09:08 UTC 2021
>
>
> *
>
>
> Thanks,
> Tai
>
>
> ------------------------------------------------------------------------
> *From:* Mathews, Jacob 1. (Nokia - IN/Bangalore) <ja...@nokia.com>
> *Sent:* Monday, August 16, 2021 6:47 PM
> *To:* Nguyen, Tai Van (EXT - VN) <ta...@nokia.com>
> *Subject:* FW: Hbase export is very slow - help needed
>
> *From:*Mathews, Jacob 1. (Nokia - IN/Bangalore)
> *Sent:* Friday, August 6, 2021 12:38 PM
> *To:* user@hbase.apache.org
> *Subject:* Hbase export is very slow - help needed
>
> Hi HBase team,
>
> We are trying to use Hbase export mentioned here:
> http://hbase.apache.org/book.html#export
> <http://hbase.apache.org/book.html#export>
>
> But it is happening sequentially row by row as seen from the logs.
>
> we tried many options of the Hbase export, but all were taking long time.
>
> Backup folder contents size:
>
> bash-4.2$ du -kh
>
> 16K ./tsdb-tree
>
> 16K ./tsdb-meta
>
> 60M ./tsdb-uid
>
> 5.9G ./tsdb
>
> 6.0G .
>
> took around 104 minutes for 6gb compressed data.
>
> Is there a way we can parallelise this and improve the export time.
>
> Below are the charts from Hbase .
>
> Export with single regionserver:
>
> Export with two regionservers:
>
> Scaling the HBase Region server also did not help, the export still
> happens sequentially.
>
> Thanks
>
> Jacob Mathews
>
Re: Hbase export is very slow - help needed
Posted by "Nguyen, Tai Van (EXT - VN)" <ta...@nokia.com>.
Hi HBase Team
Image can see here :
* Export with single regionserver: https://imgur.com/86wSUMV
* Export with two regionservers: https://imgur.com/a/XMovlZx
Log show about time was:
root@solaltiplano-track4-master:~/hbase-exporting/latest# cat hbase_export_compress_default.log | grep export
Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
tsdb table exported in 6279 seconds
tsdb-meta table exported in 6 seconds
tsdb-tree table exported in 7 seconds
tsdb-uid table exported in 90 seconds
Ending hbase export at Fri Jun 11 14:09:08 UTC 2021
*
Thanks,
Tai
________________________________
From: Mathews, Jacob 1. (Nokia - IN/Bangalore) <ja...@nokia.com>
Sent: Monday, August 16, 2021 6:47 PM
To: Nguyen, Tai Van (EXT - VN) <ta...@nokia.com>
Subject: FW: Hbase export is very slow - help needed
From: Mathews, Jacob 1. (Nokia - IN/Bangalore)
Sent: Friday, August 6, 2021 12:38 PM
To: user@hbase.apache.org
Subject: Hbase export is very slow - help needed
Hi HBase team,
We are trying to use Hbase export mentioned here: http://hbase.apache.org/book.html#export
But it is happening sequentially row by row as seen from the logs.
we tried many options of the Hbase export, but all were taking long time.
Backup folder contents size:
bash-4.2$ du -kh
16K ./tsdb-tree
16K ./tsdb-meta
60M ./tsdb-uid
5.9G ./tsdb
6.0G .
took around 104 minutes for 6gb compressed data.
Is there a way we can parallelise this and improve the export time.
Below are the charts from Hbase .
Export with single regionserver:
[cid:image001.jpg@01D78ABF.E9E6C280]
Export with two regionservers:
[cid:image002.jpg@01D78ABF.E9E6C280]
Scaling the HBase Region server also did not help, the export still happens sequentially.
Thanks
Jacob Mathews