You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2018/02/02 23:40:00 UTC

[jira] [Resolved] (CRUNCH-663) Expose Record-level File Path to Processing Functions

     [ https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills resolved CRUNCH-663.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.0.0

Pushed to master. Thanks again Ben!

> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>
>                 Key: CRUNCH-663
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-663
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ben Roling
>            Assignee: Josh Wills
>            Priority: Major
>             Fix For: 1.0.0
>
>         Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that each record being processed came from.  It would be nice if this could be exposed to the DoFns in our pipelines.
>  
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34AriP4weTw@mail.gmail.com%3E]
>  
> Unfortunately, that thread dead-ended.
>  
> I will use the comments section and a patch to propose a simple, albeit slightly hacky solution.  Another alternative would be to create a new Source that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the effort it would take to create that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)