You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/11/27 03:24:13 UTC

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

    [ https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227150#comment-14227150 ] 

Hadoop QA commented on HBASE-12590:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12683981/HBase-12590-v1.patch
  against master branch at commit f0d95e7f11403d67b4fc3f1fd4ef048047b6842a.
  ATTACHMENT ID: 12683981

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new or modified tests.

    {color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to fail.

    Compilation errors resume:
    [ERROR] COMPILATION ERROR : 
[ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:[45,48] cannot find symbol
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project hbase-server: Compilation failure
[ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:[45,48] cannot find symbol
[ERROR] symbol:   class HLog
[ERROR] location: package org.apache.hadoop.hbase.regionserver.wal
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hbase-server
    

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11848//console

This message is automatically generated.

> A solution for data skew in HBase-Mapreduce Job 
> ------------------------------------------------
>
>                 Key: HBASE-12590
>                 URL: https://issues.apache.org/jira/browse/HBASE-12590
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 2.0.0
>            Reporter: Weichen Ye
>         Attachments: A Solution for Data Skew in HBase-MapReduce Job.pdf, HBase-12590-v1.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table always contains a lot of small regions and several large regions. Small regions waste a lot of computing resources. If we use a job to scan a table with 3000 small regions, we need a job with 3000 mappers. Large regions always block the job. If in a 100-region table, one region is far larger then the other 99 regions. When we run a job with the table as input, 99 mappers will be completed very quickly, and we need to wait for the last mapper for a long time.
> 2, Configuration
> Add two new configuration. 
> hbase.mapreduce.split.autobalance = true means enabling the “auto balance” in HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.split.targetsize = 1073741824 (default 1GB). The target size of mapreduce splits. 
> If a region size is large than the target size, cut the region into two split.If the sum of several small continuous region size less than the target size, combine these regions into one split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)