You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Cornelio Iñigo <co...@gmail.com> on 2012/03/16 17:59:09 UTC

Best way to pass values between jobs

Hi

I have 5 dependent jobs, I'm running them with jobcontrol and jobs 2 and 3
run at the same time (not dependency between them). Each job produces
several information that is the input for the following job, this
information contains things like strings, string matrix (2d arrays of
strings), boolean values. I'm putting all this values as value for my
key-vallue pairs. For 2d arrays I have to put each value, so in other words
I put all my information in a concatenated string and this is my value of
key-value pairs...

concatenated string -> (string, booleanValue,Matrix1Index00,
Matrix1Index01.. Matrix2Index00,Matrix2Index01...)

At this moment I'm using the input,/output files that hadoop produces as
input/output for my jobs so as I told you before I have to rebuild the set
of matix's (2d arrays) each time from this value (concatenated string).
My question is:
For multiple jobs, is there a different way (besides files) to pass my
key-value pairs from one job to another?
Or what is the best way to do this task?

Thanks
Corne

Re: Best way to pass values between jobs

Posted by madhu phatak <ph...@gmail.com>.
Hi,
Rather than using String, use Writables to store the matrix information and
save it in Sequence File. If you want to share this info with other job,put
the sequence file in distributed cache and then read in other job.

On Fri, Mar 16, 2012 at 10:29 PM, Cornelio Iñigo
<co...@gmail.com>wrote:

> Hi
>
> I have 5 dependent jobs, I'm running them with jobcontrol and jobs 2 and 3
> run at the same time (not dependency between them). Each job produces
> several information that is the input for the following job, this
> information contains things like strings, string matrix (2d arrays of
> strings), boolean values. I'm putting all this values as value for my
> key-vallue pairs. For 2d arrays I have to put each value, so in other words
> I put all my information in a concatenated string and this is my value of
> key-value pairs...
>
> concatenated string -> (string, booleanValue,Matrix1Index00,
> Matrix1Index01.. Matrix2Index00,Matrix2Index01...)
>
> At this moment I'm using the input,/output files that hadoop produces as
> input/output for my jobs so as I told you before I have to rebuild the set
> of matix's (2d arrays) each time from this value (concatenated string).
> My question is:
> For multiple jobs, is there a different way (besides files) to pass my
> key-value pairs from one job to another?
> Or what is the best way to do this task?
>
> Thanks
> Corne
>
>



-- 
https://github.com/zinnia-phatak-dev/Nectar