You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2013/01/15 18:58:13 UTC

[jira] [Commented] (CRUNCH-143) CrunchInputSplit should be public

    [ https://issues.apache.org/jira/browse/CRUNCH-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554054#comment-13554054 ] 

Josh Wills commented on CRUNCH-143:
-----------------------------------

It's possible right now, just hacky-- you end up doing ((MapContext) getContext()).getInputSplit() in the DoFn, but I would be good with making information about the input data that is currently being processed easier to access for the client. Thoughts on what the API should look like? Very MapReduce-y, or should we wrap it in some kind of abstraction that would be valid for (say) in-memory pipelines as well?
                
> CrunchInputSplit should be public
> ---------------------------------
>
>                 Key: CRUNCH-143
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-143
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4.0
>            Reporter: Dave Beech
>            Assignee: Josh Wills
>            Priority: Minor
>
> Similar to MAPREDUCE-2226 - it's currently not possible to access the underlying input split details, for instance the path on HDFS. 
> Is there a nice way to make this information available from DoFn instances while keeping with the Crunch abstraction?
> Also - MAPREDUCE-4923 might also be applicable to CrunchInputSplit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira