You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/08/07 20:20:12 UTC

[jira] [Commented] (TEZ-1386) Users should not need to setup TezGroupedInputFormat to enable grouping

    [ https://issues.apache.org/jira/browse/TEZ-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089557#comment-14089557 ] 

Bikas Saha commented on TEZ-1386:
---------------------------------

The patch looks fine overall. My main concern is that things are a little mixed up wrt the input formats on the consumer side. The conf says that the input format should be FOO. However the input splits are actually TezGroupedSplits. As long as we go through MRReader we are fine. But for any one who is not using MRReader or who is accessing the input format config for some other reason is going to get surprised to find the inconsistency. It seems like a leaky abstraction.
Off the top off my head, an alternative could be that when grouping is turned on, then our internal grouping code changes the input format to TezGroupedFormat. So after grouping is done for any downstream component, the input format and split format is consistent without needing to know if grouping has happened or not. There is no need to any help from Tez thereafter.

> Users should not need to setup TezGroupedInputFormat to enable grouping
> -----------------------------------------------------------------------
>
>                 Key: TEZ-1386
>                 URL: https://issues.apache.org/jira/browse/TEZ-1386
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1386.1.txt
>
>
> To enable grouping via Tez, users should not need to change the underlying InputFormat. A simple enable / disable option should be sufficient.
> MRInputConfigurer does this.
> Many of the methods in MRHelpers, however, require an InputFormat to be specified. The main objective of this JIRA is to get rid of this requirement in favor of a simple enableGrouping flag.
> Marking this as a blocker for TEZ-1347, since it should simplify the set of APIs required. Also, making all the changes in TEZ-1347 would just lead to a very large patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)