You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Oded Rosen <od...@legolas-media.com> on 2010/05/09 23:08:37 UTC

MultipleInputs in 0.20

By what I've learned from different sites around the web (hadoop wiki,
cloudera<http://www.cloudera.com/blog/2009/05/what%E2%80%99s-new-in-hadoop-core-020/>,
mail archive, etc),
the MultipleInputs class that was available in 0.18-0.19 versions of hadoop,
was not moved to the 0.20 new API.
(so does MultipleOutputs, but that's another story)

I wanted to know if there is a way around this - to use two different paths
with two different input format (sequence file, text file) as sources to the
same job,
with a special mapper for each input type - using hadoop 0.20 API. I think
that writing a new job using 0.19 API only means more trouble later, when
it's officially deprecated.

I saw there is a jira <goog_292716485>
(MAPREDUCE-1170)<https://issues.apache.org/jira/browse/MAPREDUCE-1170>open
for this issue, with a patch marked as "Won't fix".
If someone out there can help me with this, I will be most thankful.

Cheers,
-- 
Oded

Re: MultipleInputs in 0.20

Posted by Ted Yu <yu...@gmail.com>.

Please refer to MAPREDUCE-1743.

Other option is to duplicate MultipleInputs, DelegatingInputFormat classes
and slightly modify TaggedInputSplit (as I suggested earlier).
This way you use your own (functional) version :-)

On Sun, May 9, 2010 at 2:08 PM, Oded Rosen <od...@legolas-media.com> wrote:

> By what I've learned from different sites around the web (hadoop wiki,
> cloudera<
> http://www.cloudera.com/blog/2009/05/what%E2%80%99s-new-in-hadoop-core-020/
> >,
> mail archive, etc),
> the MultipleInputs class that was available in 0.18-0.19 versions of
> hadoop,
> was not moved to the 0.20 new API.
> (so does MultipleOutputs, but that's another story)
>
> I wanted to know if there is a way around this - to use two different paths
> with two different input format (sequence file, text file) as sources to
> the
> same job,
> with a special mapper for each input type - using hadoop 0.20 API. I think
> that writing a new job using 0.19 API only means more trouble later, when
> it's officially deprecated.
>
> I saw there is a jira <goog_292716485>
> (MAPREDUCE-1170)<https://issues.apache.org/jira/browse/MAPREDUCE-1170>open
> for this issue, with a patch marked as "Won't fix".
> If someone out there can help me with this, I will be most thankful.
>
> Cheers,
> --
> Oded
>

Re: MultipleInputs in 0.20

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.

MultipleInputs is ported new api in branch 0.21 through https://issues.apache.org/jira/browse/MAPREDUCE-369

-Amareshwari

On 5/10/10 2:38 AM, "Oded Rosen" <od...@legolas-media.com> wrote:

By what I've learned from different sites around the web (hadoop wiki,
cloudera<http://www.cloudera.com/blog/2009/05/what%E2%80%99s-new-in-hadoop-core-020/>,
mail archive, etc),
the MultipleInputs class that was available in 0.18-0.19 versions of hadoop,
was not moved to the 0.20 new API.
(so does MultipleOutputs, but that's another story)

I wanted to know if there is a way around this - to use two different paths
with two different input format (sequence file, text file) as sources to the
same job,
with a special mapper for each input type - using hadoop 0.20 API. I think
that writing a new job using 0.19 API only means more trouble later, when
it's officially deprecated.

I saw there is a jira <goog_292716485>
(MAPREDUCE-1170)<https://issues.apache.org/jira/browse/MAPREDUCE-1170>open
for this issue, with a patch marked as "Won't fix".
If someone out there can help me with this, I will be most thankful.

Cheers,
--
Oded