You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jørn Schou-Rode <js...@malamute.dk> on 2010/01/31 22:23:21 UTC

Using set/list data types for intermediate keys

What are the options for using sets/lists as keys in the output from the
mapper?

My initial idea was to use ArrayWritable as key type, but that is not
allowed, as the class does not implement WritableComparable. Do I need
to define a custom class, or is there some other set like class in the
Hadoop libraries that can act as key?

Thanks in advance.

/Jørn

RE: Using set/list data types for intermediate keys

Posted by "Jones, Nick" <ni...@amd.com>.

Jørn,
I found it fairly quick and simple to implement WritableComparable in a
specific class for the intermediate dataset.  I needed two keys for every
value to make sure each reducer had the right data.  The class just used two
longs internally and implemented the appropriate outputs for
WritableComparable.

It might also be worthwhile looking into the cloud9 library for insights or
implementation: http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/index.html

Nick Jones

-----Original Message-----
From: Jørn Schou-Rode [mailto:jsr@malamute.dk] 
Sent: Sunday, January 31, 2010 3:23 PM
To: common-user@hadoop.apache.org
Subject: Using set/list data types for intermediate keys

What are the options for using sets/lists as keys in the output from the
mapper?

My initial idea was to use ArrayWritable as key type, but that is not
allowed, as the class does not implement WritableComparable. Do I need
to define a custom class, or is there some other set like class in the
Hadoop libraries that can act as key?

Thanks in advance.

/Jørn