You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2015/10/26 19:53:27 UTC
[jira] [Commented] (HBASE-14696) allowPartialResults in mapreduce Mappers

    [ https://issues.apache.org/jira/browse/HBASE-14696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974825#comment-14974825 ] 

Hadoop QA commented on HBASE-14696:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12768753/14696-branch-1-v1.txt
  against branch-1 branch at commit 899857609c7c2c2b0fd3fa72d3bd585cb2703e0a.
  ATTACHMENT ID: 12768753

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16232//testReport/
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16232//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16232//artifact/patchprocess/checkstyle-aggregate.html

  Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16232//console

This message is automatically generated.

> allowPartialResults in mapreduce Mappers
> ----------------------------------------
>
>                 Key: HBASE-14696
>                 URL: https://issues.apache.org/jira/browse/HBASE-14696
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 2.0.0, 1.1.0
>            Reporter: Mindaugas Kairys
>            Assignee: Ted Yu
>         Attachments: 14696-branch-1-v1.txt, 14696-v1.txt
>
>
> It is currently impossible to get partial results in mapreduce mapper jobs.
> When setting setAllowPartialResults(true) for scan jobs, they still fail with OOME on large rows.
> The reason is that Scan field allowPartialResults is lost during job creation:
>   1. User creates a Job and sets a scan object via TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts a result of TableMapReduceUtil.convertScanToString(scanObj) to the job config.
>   2. When the job starts - method TableInputFormat.setConfig retrieves a scan string from config and converts it to Scan object by calling TableMapReduceUtil.convertStringToScan - which results in a Scan object with a field allowPartialResults always set to false.
> I have tried to experiment and modify a TableInputFormat method setConfig() by forcing all scans to allow partial results and after this all jobs succeeded with no more OOME and I also noticed that mappers began to get partial results (Result.isPartial()).
> My use case is very simple - I just have large rows and expect a mapper to get them partially - to get same rowid several times with different key/value records.
> This would allow me not to worry about implementing my own result partitioning solution, which i would encounter in case the big amount of result key values could be transparently returned for a single large row.
> And from the other side - if a Scan object can return several records for the same rowid (partial results), perhaps the mapper should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)