You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rahul Sood <rs...@yahoo-inc.com> on 2008/09/05 14:35:21 UTC
contrib join package
Hi,
Is there any detailed documentation on the
org.apache.hadoop.contrib.utils.join package ? I have a simple Join task
consisting of 2 input datasets. Each contains tab-separated records.
Set1: Record format = field1\tfield2\tfield3\tfield4\tfield5
Set2: Record format = field1\tfield2\tfield3
Join criterion: Set1.field1 = Set2.field1
Output: Set2.field2\tSet1.field2\tSet1.field3\tSet1.field4
The org.apache.hadoop.contrib.utils.join package contains DataJoinMapperBase
and DataJoinReducerBase abstract classes, and a TaggedMapOutput class which
should be the base class for the mapper output values. But there aren't any
examples showing how these classes should be used to implement inner or
outer joins in a generic manner.
If anybody has used this package and would like to share their experience,
please let me know.
Thanks,
Rahul Sood
rsood@yahoo-inc.com
Re: contrib join package
Posted by Owen O'Malley <om...@apache.org>.
Please look at the examples in the
source<http://svn.apache.org/repos/asf/hadoop/core/trunk/src/contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/>directory.
Unfortunately, I don't know of any other documentation on it.
-- Owen