You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Yi Liang (JIRA)" <ji...@apache.org> on 2016/12/21 01:26:58 UTC

[jira] [Commented] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers

    [ https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765781#comment-15765781 ] 

Yi Liang commented on HBASE-5401:
---------------------------------

I have used this command and also encounter this issue, for example:
when I run hbase org.apache.hadoop.hbase.PerformanceEvaluation  --rows=m randomWrite n

if we use --nomapred, this will create n threads(clients) and each thread write m/n rows into hbase
if we use default mapreduce, this will create 10*n mappers, and each mapper will put m/(n*10) rows into hbase.
   I think the static int {code}static int TASKS_PER_CLIENT = 10{code} here is unnecessary,
   1. If user want more mappers they can just change client numbers, however, if *10 is here, user can only create 10, 20, 30... mappers for different number of client, this is not flexible.  
   2. The TASKS_PER_CLIENT = 10 is hardcoded and invisible to user, sometime may be user just want 5 mappers for their job, and current code will create 50 mappers.
   3. when <nclients> = 5, it means 5 threads and 50 mappers, which is a little inconsistent, PS. I do not mean mapper is same as thread, but it is better to keep them same.  

What do you guys think?

> PerformanceEvaluation generates 10x the number of expected mappers
> ------------------------------------------------------------------
>
>                 Key: HBASE-5401
>                 URL: https://issues.apache.org/jira/browse/HBASE-5401
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0
>            Reporter: Oliver Meyn
>             Fix For: 2.0.0
>
>         Attachments: HBASE-5401-V1.patch
>
>
> With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10' there are 100 mappers spawned, rather than the expected 10.  The culprit appears to be the outer loop in writeInputFile which sets up 10 splits for every "asked-for client".  I think the fix is just to remove that outer loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)