You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Alexandra Anghelescu <ax...@gmail.com> on 2011/04/22 21:15:10 UTC

providing the same input to more than one Map task

Hi all,

I am trying to perform matrix-vector multiplication using Hadoop.
So I have matrix M in a file, and vector v in another file. How can I make
it so that each Map task will get the whole vector v and a chunk of matrix
M?
Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
where i is the row number, and j the column number. And the reduce function
will sum up all the values with the same key i, and that will be the ith
element of my result vector.
Or can you suggest another way to do it?


Thanks,
Alexandra Anghelescu

Re: providing the same input to more than one Map task

Posted by Shi Yu <sh...@uchicago.edu>.

Then, what is the main difference: (1) storing the input on the cluster 
shared directory, loading it in the configure stage of mappers  and (2) 
using the distributed cache?

Shi

On 4/25/2011 8:17 AM, Kai Voigt wrote:
> Hi,
>
> I'd use the distributed cache to store the vector on every mapper machine locally.
>
> Kai
>
> Am 22.04.2011 um 21:15 schrieb Alexandra Anghelescu:
>
>> Hi all,
>>
>> I am trying to perform matrix-vector multiplication using Hadoop.
>> So I have matrix M in a file, and vector v in another file. How can I make
>> it so that each Map task will get the whole vector v and a chunk of matrix
>> M?
>> Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
>> where i is the row number, and j the column number. And the reduce function
>> will sum up all the values with the same key i, and that will be the ith
>> element of my result vector.
>> Or can you suggest another way to do it?
>>
>>
>> Thanks,
>> Alexandra Anghelescu

Re: providing the same input to more than one Map task

Posted by Kai Voigt <k...@123.org>.

Hi,

I'd use the distributed cache to store the vector on every mapper machine locally.

Kai

Am 22.04.2011 um 21:15 schrieb Alexandra Anghelescu:

> Hi all,
> 
> I am trying to perform matrix-vector multiplication using Hadoop.
> So I have matrix M in a file, and vector v in another file. How can I make
> it so that each Map task will get the whole vector v and a chunk of matrix
> M?
> Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
> where i is the row number, and j the column number. And the reduce function
> will sum up all the values with the same key i, and that will be the ith
> element of my result vector.
> Or can you suggest another way to do it?
> 
> 
> Thanks,
> Alexandra Anghelescu

-- 
Kai Voigt
k@123.org