You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/03/01 20:50:20 UTC

[jira] [Commented] (TEZ-873) Allow MRInputLegacy to expose the individual input split

    [ https://issues.apache.org/jira/browse/TEZ-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917173#comment-13917173 ] 

Bikas Saha commented on TEZ-873:
--------------------------------

I see the following annotations in MRInputLegacy.getOldRecordReader() which are missing for getNewRecordReader()? One of them is a compiler warning suppression. Is it not needed?
{code}  @SuppressWarnings("rawtypes")
  @Private{code}

Could you also please describe an example of using the newly added API. Currently most of the methods in MRInputLegacy are marked @Private because they are meant for being used in internal MapProcessor/ReduceProcessor. Do you see these methods to be used beyond that and so the @Private annotation should be removed? [~sseth] opinions?

> Allow MRInputLegacy to expose the individual input split
> --------------------------------------------------------
>
>                 Key: TEZ-873
>                 URL: https://issues.apache.org/jira/browse/TEZ-873
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>         Attachments: TEZ-873.1.patch, TEZ-873.2.patch
>
>
> Currently there is no way of getting InputSplit from TezProcessor. In current MR framework, there is  a way to find out the filename through FileSplit.  For example, one common uses is to get the filename in map
> String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
> There are other meta-data in Inputsplit that could be used by existing MR user.
> This JIRA is to add APIs to expose the InputSplit by adding these   TezGroupedSplit.getWrapperSplit() and MRInput.getInputSplit().
> Although MRInputLegacy provide an API to get the InputSplit, it has few issues:
>  * Without TezGroupedSplit.getWrapperSplit() it is unusable.
>  * Since it is used in various use cases, I propose to move it from MRInputLegacy to MRInput.
> * Currently the APIs are named as getNewInputSplit() and getOldInputSplit().  These should be merged into one : getInputSplit(). The new/old API should be handled internally.
> Please give your feedback.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)