You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Alexandra Anghelescu <ax...@gmail.com> on 2011/04/22 21:15:10 UTC
providing the same input to more than one Map task
Hi all,
I am trying to perform matrix-vector multiplication using Hadoop.
So I have matrix M in a file, and vector v in another file. How can I make
it so that each Map task will get the whole vector v and a chunk of matrix
M?
Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
where i is the row number, and j the column number. And the reduce function
will sum up all the values with the same key i, and that will be the ith
element of my result vector.
Or can you suggest another way to do it?
Thanks,
Alexandra Anghelescu
Re: providing the same input to more than one Map task
Posted by Shi Yu <sh...@uchicago.edu>.
Then, what is the main difference: (1) storing the input on the cluster
shared directory, loading it in the configure stage of mappers and (2)
using the distributed cache?
Shi
On 4/25/2011 8:17 AM, Kai Voigt wrote:
> Hi,
>
> I'd use the distributed cache to store the vector on every mapper machine locally.
>
> Kai
>
> Am 22.04.2011 um 21:15 schrieb Alexandra Anghelescu:
>
>> Hi all,
>>
>> I am trying to perform matrix-vector multiplication using Hadoop.
>> So I have matrix M in a file, and vector v in another file. How can I make
>> it so that each Map task will get the whole vector v and a chunk of matrix
>> M?
>> Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
>> where i is the row number, and j the column number. And the reduce function
>> will sum up all the values with the same key i, and that will be the ith
>> element of my result vector.
>> Or can you suggest another way to do it?
>>
>>
>> Thanks,
>> Alexandra Anghelescu
Re: providing the same input to more than one Map task
Posted by Kai Voigt <k...@123.org>.
Hi,
I'd use the distributed cache to store the vector on every mapper machine locally.
Kai
Am 22.04.2011 um 21:15 schrieb Alexandra Anghelescu:
> Hi all,
>
> I am trying to perform matrix-vector multiplication using Hadoop.
> So I have matrix M in a file, and vector v in another file. How can I make
> it so that each Map task will get the whole vector v and a chunk of matrix
> M?
> Basically I want my map function to output key-value pairs (i,m[i,j]*v[j]),
> where i is the row number, and j the column number. And the reduce function
> will sum up all the values with the same key i, and that will be the ith
> element of my result vector.
> Or can you suggest another way to do it?
>
>
> Thanks,
> Alexandra Anghelescu
--
Kai Voigt
k@123.org