You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Kiyan Ahmadizadeh (JIRA)" <ji...@apache.org> on 2012/09/05 23:07:08 UTC

[jira] [Updated] (CRUNCH-57) Add a length function to PCollection

     [ https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kiyan Ahmadizadeh updated CRUNCH-57:
------------------------------------

    Attachment: CRUNCH-57.patch

This patch gives an implementation.  A static length() method is added to Aggregate.java and used from PCollectionImpl and MemCollection to implement a length() function defined in the PCollection interface.  The implementation for Aggregate#length closely follows the implementation of Aggregate#count with a few tweaks.

This functionality is exposed in Scala by adding a length function to trait PCollectionLike. 

Integration tests are added for this functionality. 
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered into another.  If I'm interested in how many elements of the original PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the number of elements in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira