You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2013/04/06 20:49:16 UTC

[jira] [Updated] (CRUNCH-129) Cache the Iterable values for each key when a groupByKey op has multiple children

     [ https://issues.apache.org/jira/browse/CRUNCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills updated CRUNCH-129:
------------------------------

    Summary: Cache the Iterable values for each key when a groupByKey op has multiple children  (was: AvroGroupedTableType is not compatible with by())
    
> Cache the Iterable values for each key when a groupByKey op has multiple children
> ---------------------------------------------------------------------------------
>
>                 Key: CRUNCH-129
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-129
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Jonathan Natkins
>
> Given a simple Avro pipeline like this:
>     PGroupedTable<String, MyAvroObject> processedData = data.parallelDo(new DoFn<String, Pair<String, MyAvroObject>>() {
>       public void process(String line, Emitter<Pair<String, MyAvroObject>> emitter) {
>         String key = getKey(line);
>         MyAvroObject value = convertToAvroObject(line);
>         emitter.emit(Pair.of(key, value));
>       }
>     }, Avros.tableOf(Avros.strings(), Avros.specifics(MyAvroObject.class)))
>     .groupByKey(3);
>     PTable<MyAvroGroup, Pair<String, Iterable<MyAvroObject>>> groupedData =
>         processedData.by(new MapFn<Pair<String, Iterable<MyAvroObject>>, MyAvroGroup>() {
>             @Override
>             public MyAvroGroup map(Pair<String, Iterable<MyAvroObject>> input) {
>               MyAvroGroup group = new MyAvroGroup();
>               group.objects = Lists.<MyAvroObject>newArrayList();
>              
>               for (MyAvroObject obj : input.second()) {
>                 group.objects.add(obj);
>               }
>              
>               return group;
>             }
>           },
>           Avros.specifics(MyAvroGroup.class));
> An exception is thrown when the by() code is run:
> 12/12/10 14:11:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.ClassCastException: org.apache.crunch.types.avro.AvroGroupedTableType cannot be cast to org.apache.crunch.types.avro.AvroType
>     at org.apache.crunch.types.avro.Avros.tableOf(Avros.java:608)
>     at org.apache.crunch.types.avro.AvroTypeFamily.tableOf(AvroTypeFamily.java:135)
>     at org.apache.crunch.impl.mem.collect.MemCollection.by(MemCollection.java:222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira