You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Xu Cang (Jira)" <ji...@apache.org> on 2020/03/20 00:10:00 UTC

[jira] [Comment Edited] (HBASE-21394) Restore snapshot in parallel

    [ https://issues.apache.org/jira/browse/HBASE-21394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063006#comment-17063006 ] 

Xu Cang edited comment on HBASE-21394 at 3/20/20, 12:09 AM:
------------------------------------------------------------

While I am debugging snapshot related issue. I found this JIRA.

From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions()  will always try to iterate all regions and open all hfiles for the table from all mappers.

So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles.  (even though the splitting was correct for mappers, but this scanner makes all mappers are scanning all regions) Was this the same symptom you saw and was that by design?  (BTW, I am using branch-1 code, haven't tried this parallel improvements).

 

[~openinx] 

 

Thanks! 


was (Author: xucang):
While I am debugging snapshot related issue. I found this JIRA.

From my observation, this method : RestoreSnapshotHelper#restoreHdfsRegions()  will always try to iterate all regions and open all hfiles for the table from all mappers.

So suppose we have 500 mappers scanning snapshot of the table, all 500 mappers are iterating all regions/hfiles. Was this the same symptom you saw and was that by design?  (BTW, I am using branch-1 code, haven't tried this parallel improvements).

 

[~openinx] 

 

Thanks! 

> Restore snapshot in parallel
> ----------------------------
>
>                 Key: HBASE-21394
>                 URL: https://issues.apache.org/jira/browse/HBASE-21394
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.1.2
>
>
> Our MapReduce/Spark job is highly dependent on SnapshotScanner.  When restore a big table for SnapshotScanner,  it'll take hours ..
> Restore snapshot in parallel will helps a lot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)