You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@avro.apache.org by Markus Weimer <we...@yahoo-inc.com> on 2011/04/19 01:43:05 UTC

Keys between Mapper and Reducer in AvroJobs

Hi,

another question about writing hadoop  jobs using avro. I want to implement a basic shuffle and file aggregation: Mappers emit their input with random keys, reducers just write to disk. The number of reducers determines how many files I get in the result. The mapred documentation on Jobs where both input and putput are avro says:

> Subclass AvroMapper and specify this as your job's mapper with [...]

However, AvroMapper only seems to support input and output values, not keys. Did I miss the obvious here?

Thanks,

Markus

PS: Ideally, I'd implement the shuffle without ever deserializing the data, which should be possible. But that is the next step.