You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2015/01/03 21:45:34 UTC

[jira] [Resolved] (CRUNCH-483) Scrunch .map does not allow mapping to a PCollection[(A,B)]

     [ https://issues.apache.org/jira/browse/CRUNCH-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills resolved CRUNCH-483.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 0.12.0

Back from vacation and slowly putting myself back to work. Thanks for this one, David!

> Scrunch .map does not allow mapping to a PCollection[(A,B)]
> -----------------------------------------------------------
>
>                 Key: CRUNCH-483
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-483
>             Project: Crunch
>          Issue Type: Bug
>          Components: Scrunch
>    Affects Versions: 0.11.0
>            Reporter: David Whiting
>            Priority: Minor
>             Fix For: 0.12.0
>
>         Attachments: 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch
>
>
> When using Scrunch PCollections and attempting to map to a pair of values, the keyvalue implicit function in CanParallelDo will "upgrade" the result to a PTable[K, V]. This is often the desired behaviour, but as Scrunch PTable is not an extension of Scrunch PCollection, then there are cases where this is not what is wanted.
> Concrete example from music land: I am trying to count the number of plays for each track in each country. I want to do this:
> trackPlayedMessage(tpm => (tpm.track, tpm.country)).count()
> However because of the implicit CanParallelTransform that is substituted, I cannot call .count() because what I get is a PTable and not a PCollection.
> There are a number of possible remedies that I'm happy to have a go at, but I'd like some input as to which would be best:
> - Make PTable[K,V] a real extension of PCollection[(K, V)] (analagous to how it works in Crunch)
> - Add an "asPCollection" method to PTable which "downgrades" the PTable[K, V] to a PCollection[(K, V)].
> - Make mapToTable and flatMapToTable distinct from map and flatMap to make the choice explicity (warning: breaks existing API).
> - Expose an equivalent to LowPriorityParallelTransforms.single to be invoked explicitly to get a collection instead of a table using .map(fn)(implicitly, single)
> - Something else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)