You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Vigeant <ma...@riskmetrics.com> on 2009/11/02 16:23:58 UTC

Multiple Input Paths

Hey, quick question:

I'm writing a program that parses data from 2 different files and puts the data into a table. Currently I have 2 different map functions and so I submit 2 separate jobs to the job client. Would it be more efficient to add both paths to the same mapper and only submit one job? Thanks a lot!

Mark Vigeant
RiskMetrics Group, Inc.

RE: Multiple Input Paths

Posted by Mark Vigeant <ma...@riskmetrics.com>.
Yes, the structure is similar. They're both XML log files documenting the same set of data, just in different ways.

That's a really cool idea though, to combine them. How exactly would I go about doing that?

-----Original Message-----
From: L [mailto:architect@galatea.com] 
Sent: Monday, November 02, 2009 10:27 AM
To: common-user@hadoop.apache.org
Subject: Re: Multiple Input Paths

Mark,

Is the structure of both files the same? It makes even more sense to 
combine the files, if you can, as I have seen a considerable speed up 
when I've done that (at least when I've had small files to deal with).

Lajos


Mark Vigeant wrote:
> Hey, quick question:
> 
> I'm writing a program that parses data from 2 different files and puts the data into a table. Currently I have 2 different map functions and so I submit 2 separate jobs to the job client. Would it be more efficient to add both paths to the same mapper and only submit one job? Thanks a lot!
> 
> Mark Vigeant
> RiskMetrics Group, Inc.
> 

-- 


Re: Multiple Input Paths

Posted by L <ar...@galatea.com>.
Mark,

Is the structure of both files the same? It makes even more sense to 
combine the files, if you can, as I have seen a considerable speed up 
when I've done that (at least when I've had small files to deal with).

Lajos


Mark Vigeant wrote:
> Hey, quick question:
> 
> I'm writing a program that parses data from 2 different files and puts the data into a table. Currently I have 2 different map functions and so I submit 2 separate jobs to the job client. Would it be more efficient to add both paths to the same mapper and only submit one job? Thanks a lot!
> 
> Mark Vigeant
> RiskMetrics Group, Inc.
> 

-- 


RE: Multiple Input Paths

Posted by Mark Vigeant <ma...@riskmetrics.com>.
Ok, thank you very much Amogh, I will redesign my program.

-----Original Message-----
From: Amogh Vasekar [mailto:amogh@yahoo-inc.com] 
Sent: Monday, November 02, 2009 11:45 AM
To: common-user@hadoop.apache.org
Subject: Re: Multiple Input Paths

Mark,
Set-up for a mapred job consumes a considerable amount of time and resources and so, if possible a single job is preferred.
You can add multiple paths to your job, and if you need different processing logic depending upon the input being consumed, you can use parameter map.input.file in your mapper to decide.

Amogh


On 11/2/09 8:53 PM, "Mark Vigeant" <ma...@riskmetrics.com> wrote:

Hey, quick question:

I'm writing a program that parses data from 2 different files and puts the data into a table. Currently I have 2 different map functions and so I submit 2 separate jobs to the job client. Would it be more efficient to add both paths to the same mapper and only submit one job? Thanks a lot!

Mark Vigeant
RiskMetrics Group, Inc.


Re: Multiple Input Paths

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Mark,
Set-up for a mapred job consumes a considerable amount of time and resources and so, if possible a single job is preferred.
You can add multiple paths to your job, and if you need different processing logic depending upon the input being consumed, you can use parameter map.input.file in your mapper to decide.

Amogh


On 11/2/09 8:53 PM, "Mark Vigeant" <ma...@riskmetrics.com> wrote:

Hey, quick question:

I'm writing a program that parses data from 2 different files and puts the data into a table. Currently I have 2 different map functions and so I submit 2 separate jobs to the job client. Would it be more efficient to add both paths to the same mapper and only submit one job? Thanks a lot!

Mark Vigeant
RiskMetrics Group, Inc.