You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by abc xyz <fa...@yahoo.com> on 2010/07/03 18:33:39 UTC
Partitioned Datasets Map/Reduce
Hello everyone,
I have written my custom partitioner for partitioning datasets. I want to
partition two datasets using the same partitioner and then in the next
mapreduce job, I want each mapper to handle the same partition from the two
sources and perform some function such as joining etc. How I can I ensure that
one mapper gets the split that corresponds to same partition from both the
sources?
Any help would be highly appreciated.
Re: Partitioned Datasets Map/Reduce
Posted by abc xyz <fa...@yahoo.com>.
well, I want to do some experimentation with hadoop. I need to partition two
datasets using same partitioning function and then in the next job, take the
same partition from both datasets and apply some operation in the mapper. But
how to ensure to get the same partition from both sources in one mapper??
________________________________
From: Hemanth Yamijala <yh...@gmail.com>
To: general@hadoop.apache.org
Sent: Tue, July 6, 2010 5:40:49 AM
Subject: Re: Partitioned Datasets Map/Reduce
Hi,
> I have written my custom partitioner for partitioning datasets. I want to
> partition two datasets using the same partitioner and then in the next
> mapreduce job, I want each mapper to handle the same partition from the two
> sources and perform some function such as joining etc. How I can I ensure
that
> one mapper gets the split that corresponds to same partition from both the
> sources?
>
Not really an answer to your specific question, but have you taken a
look at Pig (http://hadoop.apache.org/pig) which is suitable for
operations like Joining data sets ?
Re: Partitioned Datasets Map/Reduce
Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,
> I have written my custom partitioner for partitioning datasets. I want to
> partition two datasets using the same partitioner and then in the next
> mapreduce job, I want each mapper to handle the same partition from the two
> sources and perform some function such as joining etc. How I can I ensure that
> one mapper gets the split that corresponds to same partition from both the
> sources?
>
Not really an answer to your specific question, but have you taken a
look at Pig (http://hadoop.apache.org/pig) which is suitable for
operations like Joining data sets ?