You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Mikhail Antonov (JIRA)" <ji...@apache.org> on 2017/06/06 10:06:18 UTC

[jira] [Commented] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region

    [ https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038521#comment-16038521 ] 

Mikhail Antonov commented on HBASE-18090:
-----------------------------------------

Thanks [~tedyu@apache.org] and [~easyliangjob] for reviews! I'll address them shortly.

I've made my patch off branch-1.3 so not sure why you couldn't apply it locally. Merge conflicts? 

I found an issue with current patch, if we try to open a region from several tasks we're hitting a race in this code:

{code}
	at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:1154)
	at org.apache.hadoop.hbase.wal.WALSplitter.writeRegionSequenceIdFile(WALSplitter.java:740)
	at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:876)
	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:802)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6708)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6669)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6640)
	at org.apache.hadoop.hbase.client.ClientSideRegionScanner.<init>(ClientSideRegionScanner.java:60)
{code}

Why do we need to go through the code path if we know region is in read-only mode?



> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>         Attachments: HBASE-18090-branch-1.3-v1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. This places unnecessary restriction that the region layout of the original table needs to take the processing resources available to MR job into consideration. Allowing to run multiple mappers per region (assuming reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)