You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Mathews, Jacob 1. (Nokia - IN/Bangalore)" <ja...@nokia.com> on 2021/08/06 07:08:14 UTC

Hbase export is very slow - help needed

Hi HBase team,

We are trying to use Hbase export mentioned here: http://hbase.apache.org/book.html#export
But it is happening sequentially row by row as seen from the logs.
we tried many options of the Hbase export, but all were taking long time.

Backup folder contents size:

bash-4.2$ du -kh
16K         ./tsdb-tree
16K         ./tsdb-meta
60M       ./tsdb-uid
5.9G       ./tsdb
6.0G       .

took around 104 minutes for 6gb compressed data.

Is there a way we can parallelise this and improve the export time.

Below are the charts from Hbase .

Export with single regionserver:

[cid:image001.jpg@01D78ABF.E9E6C280]

Export with two regionservers:

[cid:image002.jpg@01D78ABF.E9E6C280]

Scaling the HBase Region server also did not help, the export still happens sequentially.

Thanks
Jacob Mathews

Re: Hbase export is very slow - help needed

Posted by Stack <st...@duboce.net>.
Can you host your images elsewhere and instead add links here only? Our
mailing list doesn't allow images inline.

Is the export distributed or running in series? Does upping logging levels
tell you where time is spent (or thread dumping the export task?)

Thanks,
S

On Fri, Aug 6, 2021 at 7:53 AM Mathews, Jacob 1. (Nokia - IN/Bangalore) <
jacob.1.mathews@nokia.com> wrote:

> Hi HBase team,
>
>
>
> We are trying to use Hbase export mentioned here:
> http://hbase.apache.org/book.html#export
>
> But it is happening sequentially row by row as seen from the logs.
>
> we tried many options of the Hbase export, but all were taking long time.
>
>
>
> Backup folder contents size:
>
>
>
> bash-4.2$ du -kh
>
> 16K         ./tsdb-tree
>
> 16K         ./tsdb-meta
>
> 60M       ./tsdb-uid
>
> 5.9G       ./tsdb
>
> 6.0G       .
>
>
>
> took around 104 minutes for 6gb compressed data.
>
>
>
> Is there a way we can parallelise this and improve the export time.
>
>
>
> Below are the charts from Hbase .
>
>
>
> Export with single regionserver:
>
>
>
>
>
> Export with two regionservers:
>
>
>
>
>
> Scaling the HBase Region server also did not help, the export still
> happens sequentially.
>
>
>
> Thanks
>
> Jacob Mathews
>

Re: Hbase export is very slow - help needed

Posted by Jacob Mathews <ja...@gmail.com>.
Hi Josh,

Thanks for your response.

Can you throw some light on how to tweak the concurrency on the Yarn side. is it accomplished by some settings in yarn-site.xml or mapred-site.xml?

I was also not able to identify any property that can be passed to the HBase Export tool to increase Mapper per region. Any suggestions on this would also be very helpful.

Here is a portion of the logs from HBase Export tool. Perhaps you see something in it which needs to be fixed.

----------------------------
021-10-04 04:22:03,822 INFO  [main] mapreduce.RegionSizeCalculator: Calculating region sizes for table "tsdb".
2021-10-04 04:22:04,410 INFO  [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x0b25b095 to altiplano-zookeeper:2181
2021-10-04 04:22:04,413 INFO  [ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095] zookeeper.ZooKeeper: Session: 0x100000683990011 closed
2021-10-04 04:22:04,413 INFO  [ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x100000683990011
2021-10-04 04:22:04,463 INFO  [main] mapreduce.JobSubmitter: number of splits:2
2021-10-04 04:22:04,471 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-10-04 04:22:04,620 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1633098453918_0001
2021-10-04 04:22:05,399 INFO  [main] impl.YarnClientImpl: Submitted application application_1633098453918_0001
2021-10-04 04:22:05,427 INFO  [main] mapreduce.Job: The url to track the job: http://k8s-infra-altiplano-hdfs-yarn-rm:8088/proxy/application_1633098453918_0001/
2021-10-04 04:22:05,427 INFO  [main] mapreduce.Job: Running job: job_1633098453918_0001
2021-10-04 04:22:12,535 INFO  [main] mapreduce.Job: Job job_1633098453918_0001 running in uber mode : false
2021-10-04 04:22:12,536 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2021-10-04 04:35:42,713 INFO  [main] mapreduce.Job:  map 50% reduce 0%
2021-10-04 04:46:26,053 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2021-10-04 04:46:27,103 INFO  [main] mapreduce.Job: Job job_1633098453918_0001 completed successfully
2021-10-04 04:46:27,216 INFO  [main] mapreduce.Job: slots (ms)=0
		Total time spent by all map tasks (ms)=2900104
		Total vcore-milliseconds taken by all map tasks=2900104
		Total megabyte-milliseconds taken by all map tasks=2969706496
	Map-Reduce Framework
		Map input records=149787553
		Map output records=149787553
		Input split bytes=454
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=10706
		CPU time spent (ms)=999190
		Physical memory (bytes) snapshot=538783744
		Virtual memory (bytes) snapshot=4150628352
		Total committed heap usage (bytes)=134217728
	HBase Counters
		BYTES_IN_REMOTE_RESULTS=13097220466
		BYTES_IN_RESULTS=13097220466
		MILLIS_BETWEEN_NEXTS=2250916
		NOT_SERVING_REGION_EXCEPTION=0
		NUM_SCANNER_RESTARTS=0
		NUM_SCAN_RESULTS_STALE=0
		REGIONS_SCANNED=2
		REMOTE_RPC_CALLS=1497876
		REMOTE_RPC_RETRIES=0
		ROWS_FILTERED=0
		ROWS_SCANNED=149787553
		RPC_CALLS=1497876
		RPC_RETRIES=0
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=17139600166

------------------------------

Thanks
Jacob Mathews

On 2021/08/19 18:34:31, Josh Elser <el...@apache.org> wrote: 
> Export is a MapReduce job, and HBase will only configure a maximum of 
> one Mapper per Region in the table being scanned.
> 
> If you have multiple regions for your tsdb table, then it's possible 
> that you need to tweak the concurrency on the YARN side such that you 
> have multiple Mappers running in parallel?
> 
> Sounds like looking at the YARN Application log and UI is your next best 
> bet.
> 
> On 8/18/21 4:52 AM, Nguyen, Tai Van (EXT - VN) wrote:
> > Hi HBase Team
> > 
> > Image can see here :
> > 
> >   * Export with single regionserver: https://imgur.com/86wSUMV
> >     <https://imgur.com/86wSUMV>
> >   * Export with two regionservers: https://imgur.com/a/XMovlZx
> >     <https://imgur.com/a/XMovlZx>
> > 
> > Log show about time was:
> > 
> >     root@solaltiplano-track4-master:~/hbase-exporting/latest# cat hbase_export_compress_default.log | grep export
> >     Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
> >     tsdb table exported in  6279 seconds
> >     tsdb-meta table exported in  6 seconds
> >     tsdb-tree table exported in  7 seconds
> >     tsdb-uid table exported in  90 seconds
> >     Ending hbase export at Fri Jun 11 14:09:08 UTC 2021
> > 
> > 
> >   *
> > 
> > 
> > Thanks,
> > Tai
> > 
> > 
> > ------------------------------------------------------------------------
> > *From:* Mathews, Jacob 1. (Nokia - IN/Bangalore) <ja...@nokia.com>
> > *Sent:* Monday, August 16, 2021 6:47 PM
> > *To:* Nguyen, Tai Van (EXT - VN) <ta...@nokia.com>
> > *Subject:* FW: Hbase export is very slow - help needed
> > 
> > *From:*Mathews, Jacob 1. (Nokia - IN/Bangalore)
> > *Sent:* Friday, August 6, 2021 12:38 PM
> > *To:* user@hbase.apache.org
> > *Subject:* Hbase export is very slow - help needed
> > 
> > Hi HBase team,
> > 
> > We are trying to use Hbase export mentioned here: 
> > http://hbase.apache.org/book.html#export 
> > <http://hbase.apache.org/book.html#export>
> > 
> > But it is happening sequentially row by row as seen from the logs.
> > 
> > we tried many options of the Hbase export, but all were taking long time.
> > 
> > Backup folder contents size:
> > 
> > bash-4.2$ du -kh
> > 
> > 16K         ./tsdb-tree
> > 
> > 16K         ./tsdb-meta
> > 
> > 60M       ./tsdb-uid
> > 
> > 5.9G       ./tsdb
> > 
> > 6.0G       .
> > 
> > took around 104 minutes for 6gb compressed data.
> > 
> > Is there a way we can parallelise this and improve the export time.
> > 
> > Below are the charts from Hbase .
> > 
> > Export with single regionserver:
> > 
> > Export with two regionservers:
> > 
> > Scaling the HBase Region server also did not help, the export still 
> > happens sequentially.
> > 
> > Thanks
> > 
> > Jacob Mathews
> > 
> 

Re: Hbase export is very slow - help needed

Posted by Josh Elser <el...@apache.org>.
Export is a MapReduce job, and HBase will only configure a maximum of 
one Mapper per Region in the table being scanned.

If you have multiple regions for your tsdb table, then it's possible 
that you need to tweak the concurrency on the YARN side such that you 
have multiple Mappers running in parallel?

Sounds like looking at the YARN Application log and UI is your next best 
bet.

On 8/18/21 4:52 AM, Nguyen, Tai Van (EXT - VN) wrote:
> Hi HBase Team
> 
> Image can see here :
> 
>   * Export with single regionserver: https://imgur.com/86wSUMV
>     <https://imgur.com/86wSUMV>
>   * Export with two regionservers: https://imgur.com/a/XMovlZx
>     <https://imgur.com/a/XMovlZx>
> 
> Log show about time was:
> 
>     root@solaltiplano-track4-master:~/hbase-exporting/latest# cat hbase_export_compress_default.log | grep export
>     Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
>     tsdb table exported in  6279 seconds
>     tsdb-meta table exported in  6 seconds
>     tsdb-tree table exported in  7 seconds
>     tsdb-uid table exported in  90 seconds
>     Ending hbase export at Fri Jun 11 14:09:08 UTC 2021
> 
> 
>   *
> 
> 
> Thanks,
> Tai
> 
> 
> ------------------------------------------------------------------------
> *From:* Mathews, Jacob 1. (Nokia - IN/Bangalore) <ja...@nokia.com>
> *Sent:* Monday, August 16, 2021 6:47 PM
> *To:* Nguyen, Tai Van (EXT - VN) <ta...@nokia.com>
> *Subject:* FW: Hbase export is very slow - help needed
> 
> *From:*Mathews, Jacob 1. (Nokia - IN/Bangalore)
> *Sent:* Friday, August 6, 2021 12:38 PM
> *To:* user@hbase.apache.org
> *Subject:* Hbase export is very slow - help needed
> 
> Hi HBase team,
> 
> We are trying to use Hbase export mentioned here: 
> http://hbase.apache.org/book.html#export 
> <http://hbase.apache.org/book.html#export>
> 
> But it is happening sequentially row by row as seen from the logs.
> 
> we tried many options of the Hbase export, but all were taking long time.
> 
> Backup folder contents size:
> 
> bash-4.2$ du -kh
> 
> 16K         ./tsdb-tree
> 
> 16K         ./tsdb-meta
> 
> 60M       ./tsdb-uid
> 
> 5.9G       ./tsdb
> 
> 6.0G       .
> 
> took around 104 minutes for 6gb compressed data.
> 
> Is there a way we can parallelise this and improve the export time.
> 
> Below are the charts from Hbase .
> 
> Export with single regionserver:
> 
> Export with two regionservers:
> 
> Scaling the HBase Region server also did not help, the export still 
> happens sequentially.
> 
> Thanks
> 
> Jacob Mathews
> 

Re: Hbase export is very slow - help needed

Posted by "Nguyen, Tai Van (EXT - VN)" <ta...@nokia.com>.
Hi HBase Team

Image can see here :

  *   Export with single regionserver: https://imgur.com/86wSUMV
  *   Export with two regionservers: https://imgur.com/a/XMovlZx

Log show about time was:

root@solaltiplano-track4-master:~/hbase-exporting/latest# cat hbase_export_compress_default.log | grep export
Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
tsdb table exported in  6279 seconds
tsdb-meta table exported in  6 seconds
tsdb-tree table exported in  7 seconds
tsdb-uid table exported in  90 seconds
Ending hbase export at Fri Jun 11 14:09:08 UTC 2021


  *

Thanks,
Tai


________________________________
From: Mathews, Jacob 1. (Nokia - IN/Bangalore) <ja...@nokia.com>
Sent: Monday, August 16, 2021 6:47 PM
To: Nguyen, Tai Van (EXT - VN) <ta...@nokia.com>
Subject: FW: Hbase export is very slow - help needed






From: Mathews, Jacob 1. (Nokia - IN/Bangalore)
Sent: Friday, August 6, 2021 12:38 PM
To: user@hbase.apache.org
Subject: Hbase export is very slow - help needed



Hi HBase team,



We are trying to use Hbase export mentioned here: http://hbase.apache.org/book.html#export

But it is happening sequentially row by row as seen from the logs.

we tried many options of the Hbase export, but all were taking long time.



Backup folder contents size:



bash-4.2$ du -kh

16K         ./tsdb-tree

16K         ./tsdb-meta

60M       ./tsdb-uid

5.9G       ./tsdb

6.0G       .



took around 104 minutes for 6gb compressed data.



Is there a way we can parallelise this and improve the export time.



Below are the charts from Hbase .



Export with single regionserver:



[cid:image001.jpg@01D78ABF.E9E6C280]



Export with two regionservers:



[cid:image002.jpg@01D78ABF.E9E6C280]



Scaling the HBase Region server also did not help, the export still happens sequentially.



Thanks

Jacob Mathews