You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by David Whiting <da...@gmail.com> on 2014/09/30 14:14:05 UTC

Scrunch: Using Avro-generated Java enums in compound types

We have encountered a strange problem in our Scrunch code when attempting
to serialize Java enum types (as generated by Avro).

Basically, if you create an Avro-schema with an enum-typed field, then a
Java enum class will be generated for that field. When you create a Scrunch
pipeline to use it, and use it within a compound type as an intermediate
value, it fails when spilling to disk because the ReflectDatumWriter cannot
instantiate the enum type.

Inspecting the implicit PTypeH parameter passed to the offending function
(flatMap to a 4-tuple in this case), we see that it resolves to
quads(records[MyEnumType), strings, strings, strings). The records gets
implemented by the PTypeFamiliy (AvroTypeFamily in this case), which
delegates to containers and then reflects, which in tern delegates to the
Avro standard reflection stuff. I would expect this to have no problem with
an enum type, but for some reason it is trying to instantiate it instead of
using it as an enum.

Is there some special case for Java enums missing in PTypeH, or have I
maybe done something else wrong somewhere?

Re: Scrunch: Using Avro-generated Java enums in compound types

Posted by Josh Wills <jo...@gmail.com>.
I think that adding the Java enum support directly is the way to go; I'm
trying to avoid hacking Avro reflection where I can. Here's the patch,
building off of the enum PType support in Java Crunch:

https://issues.apache.org/jira/browse/CRUNCH-472

On Tue, Sep 30, 2014 at 5:14 AM, David Whiting <da...@gmail.com>
wrote:

> We have encountered a strange problem in our Scrunch code when attempting
> to serialize Java enum types (as generated by Avro).
>
> Basically, if you create an Avro-schema with an enum-typed field, then a
> Java enum class will be generated for that field. When you create a Scrunch
> pipeline to use it, and use it within a compound type as an intermediate
> value, it fails when spilling to disk because the ReflectDatumWriter cannot
> instantiate the enum type.
>
> Inspecting the implicit PTypeH parameter passed to the offending function
> (flatMap to a 4-tuple in this case), we see that it resolves to
> quads(records[MyEnumType), strings, strings, strings). The records gets
> implemented by the PTypeFamiliy (AvroTypeFamily in this case), which
> delegates to containers and then reflects, which in tern delegates to the
> Avro standard reflection stuff. I would expect this to have no problem with
> an enum type, but for some reason it is trying to instantiate it instead of
> using it as an enum.
>
> Is there some special case for Java enums missing in PTypeH, or have I
> maybe done something else wrong somewhere?
>