You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrunit.apache.org by Dan Filimon <da...@gmail.com> on 2013/07/04 17:25:11 UTC

Re: Partial Avro MapReduce job

It turns out that what I said is wrong. Also CC'ing dev@mrunit

In AvroJob [1] there are config constants for a MapReduce's input and
output keys and values. The input key and input value are the things coming
into the mapper and the output key and output value are the things coming
out of the reducer.

Serialization is configured in AvroSerialization [2] and the mapper outputs
and reducer inputs are the things referred to by the key/value
reader/writer schemas.

Oddly, using the same settings doesn't work in MRUnit. Serialization needs
to happen before the Map and after the Reduce phases from what I've looked
through the code. AvroJob on the other hand only sets it for the shuffling
phase.
If this is indeed the case, MRUnit won't work with Avro at all unless the
types of the keys and values overlap.

Does this make sense to anyone else?

Thanks!

[1]
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.avro/avro-mapred/1.7.4/org/apache/avro/mapreduce/AvroJob.java?av=f
[2]
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.avro/avro-mapred/1.7.4/org/apache/avro/hadoop/io/AvroSerialization.java?av=f


On Thu, Jul 4, 2013 at 6:00 PM, Hien Luu <hl...@yahoo.com> wrote:

> It is possible to set all these Avro related configurations in MRUnit, so
> I am not sure a patch is necessary.   What would be nice to have is
> documentation to point out what configurations are needed.  Surprising I
> couldn't find a FAQ link on http://mrunit.apache.org/
>
> Hien
>   ------------------------------
>  *From:* Dan Filimon <da...@gmail.com>
> *To:* user@mrunit.apache.org; Hien Luu <hl...@yahoo.com>
> *Sent:* Wednesday, July 3, 2013 11:28 PM
>
> *Subject:* Re: Partial Avro MapReduce job
>
> Well, from what I understand, all of these are set with setStrings() in
> the Configuration meaning they have multiple options.
>
> - io.serializations has to also include AvroSerialization.class.getName()
> in addition to the existing serialization (which is for Writables).
> - avro.serialization.key.writer.schema and
> avro.serialization.value.writer.schema also have multiple options and the
> values here should contain the schemas of the different datums inside the
> AvroKeys and AvroValues.
>
> So, if you have a job that has 3 types of AvroKeys: AvroKey<Integer>,
> AvroKey<CustomRecord>, AvroKey<List<Double>>, the value of
> avro.serialization.key.writer.schema should be an array of Strings with the
> respective schemas of Integer, CustomRecord and List<Double>.
> It should be the same for AvroValues.
>
> What do you think about adding support for configuring these in MRUnit?
> I could come up with a patch... :)
>
>
> On Wed, Jul 3, 2013 at 6:48 PM, Hien Luu <hl...@yahoo.com> wrote:
>
> Good to hear you got it working.
>
> I need to find out more information about these properties:
>
> *io.serializations*, *avro.serialization.key.writer.schema*, and *
> avro.serialization.value.writer.schema*
> *
> *
> Hien
>
>   ------------------------------
>  *From:* Dan Filimon <da...@gmail.com>
> *To:* user@mrunit.apache.org; Hien Luu <hl...@yahoo.com>
> *Sent:* Wednesday, July 3, 2013 2:06 AM
>
> *Subject:* Re: Partial Avro MapReduce job
>
> Yes, thank you! That worked. I also needed to register the Avro
> serialization like here:
> https://github.com/Lab41/etl-by-example/wiki/Testing
>
>
> On Tue, Jul 2, 2013 at 9:57 PM, Hien Luu <hl...@yahoo.com> wrote:
>
> Hi Dan,
>
> I think the following setting is needed to solve the issue that you ran
> into:
>
>
> mapDriver.getConfiguration().setStrings("avro.serialization.key.writer.schema",
>                                             <the JSON Avro schema for you
> AvroKey>);
>
> I have been trying to find more documentation about the property
> "avro.serialization.key.writer.schema".
>
> Hien
>
>
>   ------------------------------
>  *From:* Dan Filimon <da...@gmail.com>
> *To:* user@mrunit.apache.org; Hien Luu <hl...@yahoo.com>
> *Sent:* Tuesday, July 2, 2013 9:17 AM
> *Subject:* Re: Partial Avro MapReduce job
>
> Hi Hien!
>
> I saw that answer but it seems like it's for a different kind of
> exception. Did you really have the same problem?
> Thanks!
>
>
> On Tue, Jul 2, 2013 at 6:39 PM, Hien Luu <hl...@yahoo.com> wrote:
>
> I ran into the same issue and the answer is at
> http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization
> .
>
> It would be great if this answer is added to the FAQ of MRUnit or one of
> the tutorials.
>
> Hien
>
>   ------------------------------
>  *From:* Dan Filimon <da...@gmail.com>
> *To:* user@mrunit.apache.org
> *Sent:* Tuesday, July 2, 2013 7:47 AM
> *Subject:* Partial Avro MapReduce job
>
> Hi!
>
> I've been looking online for a way of testing my job and came across what
> seemed to be some promising leads. [1]
>
> I can't find anything exactly suitable to my case: I'm consuming a custom
> AvroKey and outputting IntWritable, VectorWritable from the mapper.
>
> I'm getting the following error:
> java.lang.IllegalStateException: No applicable class implementing
> Serialization in conf at io.serializations for class
> org.apache.avro.mapred.AvroKey
>
> And worryingly, .withConfiguration() is now deprecating in MRUnit so [1]
> doesn't seem like it's 100% up to date.
> Any ideas?
>
> Thanks!
>
> [1] https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+with+Avro
>
>
>
>
>
>
>
>
>
>
>
>