You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2013/04/06 20:49:16 UTC
[jira] [Updated] (CRUNCH-129) Cache the Iterable values for each
key when a groupByKey op has multiple children
[ https://issues.apache.org/jira/browse/CRUNCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Wills updated CRUNCH-129:
------------------------------
Summary: Cache the Iterable values for each key when a groupByKey op has multiple children (was: AvroGroupedTableType is not compatible with by())
> Cache the Iterable values for each key when a groupByKey op has multiple children
> ---------------------------------------------------------------------------------
>
> Key: CRUNCH-129
> URL: https://issues.apache.org/jira/browse/CRUNCH-129
> Project: Crunch
> Issue Type: Bug
> Reporter: Jonathan Natkins
>
> Given a simple Avro pipeline like this:
> PGroupedTable<String, MyAvroObject> processedData = data.parallelDo(new DoFn<String, Pair<String, MyAvroObject>>() {
> public void process(String line, Emitter<Pair<String, MyAvroObject>> emitter) {
> String key = getKey(line);
> MyAvroObject value = convertToAvroObject(line);
> emitter.emit(Pair.of(key, value));
> }
> }, Avros.tableOf(Avros.strings(), Avros.specifics(MyAvroObject.class)))
> .groupByKey(3);
> PTable<MyAvroGroup, Pair<String, Iterable<MyAvroObject>>> groupedData =
> processedData.by(new MapFn<Pair<String, Iterable<MyAvroObject>>, MyAvroGroup>() {
> @Override
> public MyAvroGroup map(Pair<String, Iterable<MyAvroObject>> input) {
> MyAvroGroup group = new MyAvroGroup();
> group.objects = Lists.<MyAvroObject>newArrayList();
>
> for (MyAvroObject obj : input.second()) {
> group.objects.add(obj);
> }
>
> return group;
> }
> },
> Avros.specifics(MyAvroGroup.class));
> An exception is thrown when the by() code is run:
> 12/12/10 14:11:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.ClassCastException: org.apache.crunch.types.avro.AvroGroupedTableType cannot be cast to org.apache.crunch.types.avro.AvroType
> at org.apache.crunch.types.avro.Avros.tableOf(Avros.java:608)
> at org.apache.crunch.types.avro.AvroTypeFamily.tableOf(AvroTypeFamily.java:135)
> at org.apache.crunch.impl.mem.collect.MemCollection.by(MemCollection.java:222)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira