You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by some speed <sp...@gmail.com> on 2009/02/15 06:43:06 UTC

Design issue for a problem using Map Reduce

Hello all,

I am trying to implement a Map Reduce Chain to solve a particular statistic
problem. I have come to a point where I have to solve the following type of
equation in Hadoop:

F(t)= A*w(t)*w(t) + B*F(t-1)    ; Given: F(0)=0, A and B are Alpha and Beta
and their values are known.

Now, W is series of numbers (There could be *a million* or more numbers).

So to Solve the equation in terms of Map Reduce, there are basically 2
issues which I can think of:

1) How will I be able to get the value of F(t-1) since it means as each step
i need the value from the previous iteration. And that is not possible while
computing parallely.
2) the w(t) values have to be read and applied in order also ,and, again
that is a prb while computing parallely.

Can some please help me go abt this problem and overcome the issues?

Thanks,

Sharath

Re: Design issue for a problem using Map Reduce

Posted by some speed <sp...@gmail.com>.

Thanks Sagar...That helps to a certain extent.
But is dependency not a common occurrence among equations? Doesn't Hadoop
provide a way to solve such equations in parallel?
Going in for a sequential calculation might prove to be a major performance
degradation given tens of thousands of numbers. Does any one have any ideas
?

Thanks.

On Sun, Feb 15, 2009 at 1:34 AM, Sagar Naik <sn...@attributor.com> wrote:

> Here is one thought
> N maps and 1 Reduce,
> input to map: <t,w(t)>
> output of map <t, w(t)*w(t)>
> I assume t is an integer. So in case of 1 reducer, u will receive
> t0, square(w(0)
> t1, square(w(1)
> t2, square(w(2)
> t3, square(w(3)
> Note this wiil be a sorted series on t.
>
> in reduce
>
> static prevF = 0;
>
> reduce(t, square_w_t)
> {
>  f = square_w_t * A  + B * prevF ;
>  output.collect(t,f)
>  prevF = f
> }
>
> According to me the step of B*F(t-1) is inherently sequential.
> So all we can do is parallelize the a*w(t)*w(t) part.
>
> -Sagar
>
>
> some speed wrote:
>
>> Hello all,
>>
>> I am trying to implement a Map Reduce Chain to solve a particular
>> statistic
>> problem. I have come to a point where I have to solve the following type
>> of
>> equation in Hadoop:
>>
>> F(t)= A*w(t)*w(t) + B*F(t-1)    ; Given: F(0)=0, A and B are Alpha and
>> Beta
>> and their values are known.
>>
>> Now, W is series of numbers (There could be *a million* or more numbers).
>>
>> So to Solve the equation in terms of Map Reduce, there are basically 2
>> issues which I can think of:
>>
>> 1) How will I be able to get the value of F(t-1) since it means as each
>> step
>> i need the value from the previous iteration. And that is not possible
>> while
>> computing parallely.
>> 2) the w(t) values have to be read and applied in order also ,and, again
>> that is a prb while computing parallely.
>>
>> Can some please help me go abt this problem and overcome the issues?
>>
>> Thanks,
>>
>> Sharath
>>
>>
>>
>

Re: Design issue for a problem using Map Reduce

Posted by Sagar Naik <sn...@attributor.com>.

Here is one thought
N maps and 1 Reduce,
input to map: <t,w(t)>
output of map <t, w(t)*w(t)>
I assume t is an integer. So in case of 1 reducer, u will receive
t0, square(w(0)
t1, square(w(1)
t2, square(w(2)
t3, square(w(3)
Note this wiil be a sorted series on t.

in reduce

static prevF = 0;

reduce(t, square_w_t)
{
   f = square_w_t * A  + B * prevF ;
   output.collect(t,f)
   prevF = f
}

According to me the step of B*F(t-1) is inherently sequential.
So all we can do is parallelize the a*w(t)*w(t) part.

-Sagar

some speed wrote:
> Hello all,
>
> I am trying to implement a Map Reduce Chain to solve a particular statistic
> problem. I have come to a point where I have to solve the following type of
> equation in Hadoop:
>
> F(t)= A*w(t)*w(t) + B*F(t-1)    ; Given: F(0)=0, A and B are Alpha and Beta
> and their values are known.
>
> Now, W is series of numbers (There could be *a million* or more numbers).
>
> So to Solve the equation in terms of Map Reduce, there are basically 2
> issues which I can think of:
>
> 1) How will I be able to get the value of F(t-1) since it means as each step
> i need the value from the previous iteration. And that is not possible while
> computing parallely.
> 2) the w(t) values have to be read and applied in order also ,and, again
> that is a prb while computing parallely.
>
> Can some please help me go abt this problem and overcome the issues?
>
> Thanks,
>
> Sharath
>
>