You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2013/04/11 16:33:16 UTC

[jira] [Commented] (CRUNCH-129) Cache the Iterable values for each key when a groupByKey op has multiple children

    [ https://issues.apache.org/jira/browse/CRUNCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628972#comment-13628972 ] 

Gabriel Reid commented on CRUNCH-129:
-------------------------------------

[~joshwills] are these both (i.e. the title and the description of the issue) both talking about the same thing? It seems like the ClassCastException in the description is more of a planner (?) issue, whereas the caching of the iterables for multiple children is more of an execution issue. 

Or is the ClassCastException just covering up the real iterable issue that would come up if the code could get to the point of actually using the iterable?
                
> Cache the Iterable values for each key when a groupByKey op has multiple children
> ---------------------------------------------------------------------------------
>
>                 Key: CRUNCH-129
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-129
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Jonathan Natkins
>
> Given a simple Avro pipeline like this:
>     PGroupedTable<String, MyAvroObject> processedData = data.parallelDo(new DoFn<String, Pair<String, MyAvroObject>>() {
>       public void process(String line, Emitter<Pair<String, MyAvroObject>> emitter) {
>         String key = getKey(line);
>         MyAvroObject value = convertToAvroObject(line);
>         emitter.emit(Pair.of(key, value));
>       }
>     }, Avros.tableOf(Avros.strings(), Avros.specifics(MyAvroObject.class)))
>     .groupByKey(3);
>     PTable<MyAvroGroup, Pair<String, Iterable<MyAvroObject>>> groupedData =
>         processedData.by(new MapFn<Pair<String, Iterable<MyAvroObject>>, MyAvroGroup>() {
>             @Override
>             public MyAvroGroup map(Pair<String, Iterable<MyAvroObject>> input) {
>               MyAvroGroup group = new MyAvroGroup();
>               group.objects = Lists.<MyAvroObject>newArrayList();
>              
>               for (MyAvroObject obj : input.second()) {
>                 group.objects.add(obj);
>               }
>              
>               return group;
>             }
>           },
>           Avros.specifics(MyAvroGroup.class));
> An exception is thrown when the by() code is run:
> 12/12/10 14:11:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.ClassCastException: org.apache.crunch.types.avro.AvroGroupedTableType cannot be cast to org.apache.crunch.types.avro.AvroType
>     at org.apache.crunch.types.avro.Avros.tableOf(Avros.java:608)
>     at org.apache.crunch.types.avro.AvroTypeFamily.tableOf(AvroTypeFamily.java:135)
>     at org.apache.crunch.impl.mem.collect.MemCollection.by(MemCollection.java:222)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira