You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2015/01/26 19:32:35 UTC

[jira] [Comment Edited] (TEZ-1993) Implement a pluggable InputSizeEstimator for grouping fairly

    [ https://issues.apache.org/jira/browse/TEZ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292192#comment-14292192 ] 

Gopal V edited comment on TEZ-1993 at 1/26/15 6:32 PM:
-------------------------------------------------------

No, because inheritance is not shimmable. You need visitor patterns instead here.

And because of that this cannot apply to any other InputFormat that generates FileSplit (which is not going to be a sub-class of TezInputSplit).


was (Author: gopalv):
No, because inheritance is not shimmable.

And because of that this cannot apply to any other InputFormat that generates FileSplit (which is not going to be a sub-class of TezInputSplit).

> Implement a pluggable InputSizeEstimator for grouping fairly
> ------------------------------------------------------------
>
>                 Key: TEZ-1993
>                 URL: https://issues.apache.org/jira/browse/TEZ-1993
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: TEZ-1993.1.patch
>
>
> Split grouping is currently done using a file size measurement which is the exact size of the split as it stays at rest on HDFS.
> This is not valid for columnar formats and especially suffers from highly compressible data skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)