You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Kiyan Ahmadizadeh (JIRA)" <ji...@apache.org> on 2012/09/12 01:16:08 UTC
[jira] [Updated] (CRUNCH-58) Implement PObject in Crunch/Scrunch
[ https://issues.apache.org/jira/browse/CRUNCH-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kiyan Ahmadizadeh updated CRUNCH-58:
------------------------------------
Attachment: CRUNCH-58.patch
This commit adds PObjects to Crunch. A PObject encapsulates a singleton
value produced from a distributed computation. The changes in this commit
include:
1. Adding a PObject interface to the Java code base.
2. Adding an abstract class PObjectImpl that implements a PObject backed by
a PCollection. Concrete subclasses implement the PObjectImpl#process method
to transform an iterable obtained from materializing the backing PCollection
into the singleton value encapsulated by the PObject.
3. Adding concrete subclasses of PObjectImpl that a) Use the first element of
the backing PCollection as the PObject value, b) Use a Java collection
containing the elements of the backing PCollection as the PObject value and
c) Use a Java Map containing the mappings defined by Pairs in the backing
PCollection as the PObject value.
4. Modifying min() and max() on PCollection to return PObjects.
5. Adding an asCollection method to PCollection<S> that returns a
PObject<Collection<S>> of the PCollectin's elements.
6. Adding an asMap method to PTable<K, V> that returns a PObject<Map<K,V>>
of the PTable's elements.
7. Adding PObject to the Scala code base and modifying min() and max()
in Scala's PCollection to return PObjects.
Tests have been added for PObjectImpl and its concrete subclasses. Tests
for the new asCollection and asMap methods have also been added. Existing
tests were modified to accomodate changes to min() and max().
> Implement PObject in Crunch/Scrunch
> -----------------------------------
>
> Key: CRUNCH-58
> URL: https://issues.apache.org/jira/browse/CRUNCH-58
> Project: Crunch
> Issue Type: New Feature
> Affects Versions: 0.3.0
> Reporter: Kiyan Ahmadizadeh
> Assignee: Kiyan Ahmadizadeh
> Attachments: CRUNCH-58.patch
>
>
> FlumeJava has the concept of a PObject<T>, a container for a singleton of type T. It is meant represent the result of a distributed computation that yields a singleton value (for example max, min, and length methods on PCollection<T>). Generally speaking, the result of any computation that combines/reduces a PCollection into a singleton value could be represented by a PObject.
> Like PCollection, a PObject defers distributed computation until its value is actually used.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira