You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2015/04/26 05:52:38 UTC

[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

    [ https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512859#comment-14512859 ] 

Hadoop QA commented on HBASE-13356:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12728190/HBASE-13356.patch
  against master branch at commit cd83d39fb4f50db901b699ba5470b5f709c95c69.
  ATTACHMENT ID: 12728190

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 12 new or modified tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of protoc compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 4 warning messages.

                {color:red}-1 checkstyle{color}.  The applied patch generated 1965 checkstyle errors (more than the master's current 1900 errors).

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version 2.0.3) warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 7 release audit warnings (more than the master's current 0 warnings).

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines longer than 100:
    + * MultiTableSnapshotInputFormat generalizes {@link org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat}
+ * allowing a MapReduce job to run over one or more table snapshots, with one or more scans configured for each.
+ * Internally, the input format delegates to {@link org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}
+ * and thus has the same performance advantages; see {@link org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat} for
+ * Usage is similar to TableSnapshotInputFormat, with the following exception: initMultiTableSnapshotMapperJob takes in a map
+ * from snapshot name to a collection of scans. For each snapshot in the map, each corresponding scan will be applied;
+ * the overall dataset for the job is defined by the concatenation of the regions and tables included in each snapshot/scan
+ * {@link org.apache.hadoop.hbase.mapred.TableMapReduceUtil#initMultiTableSnapshotMapperJob(Map, Class, Class, Class, JobConf, boolean, Path)}
+ * Internally, this input format restores each snapshot into a subdirectory of the given tmp directory. Input splits and
+ * record readers are created as described in {@link org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/checkstyle-aggregate.html

                Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/patchJavadocWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-13356
>                 URL: https://issues.apache.org/jira/browse/HBASE-13356
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>            Reporter: Andrew Mains
>            Assignee: Andrew Mains
>            Priority: Minor
>         Attachments: HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs over live tables (via MultiTableInputFormat) but only supports a single scan for mapreduce jobs over table snapshots. It would be handy to support multiple scans over snapshots as well, probably through another input format (MultiTableSnapshotInputFormat?). To mimic the functionality present in MultiTableInputFormat, the new input format would likely have to take in the names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)