You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by Apurv Verma <da...@gmail.com> on 2012/01/15 12:17:21 UTC

Hama Doubts

Hii all,
 I seem to have overcome my initial hadoop setting up problems. I have some
questions.


   1. I seem to understand the Pi example, here is what I have understood,
   please correct me if I am wrong.
   Each of the BSPPeers does the local computation of Pi and sends it to a
   special BSPPeer which we have chosen as the master node. The choice of
   master node is completely arbitrary. It is from this node that we later
   fetch the results.

   2. I read that a BSP Task is composed of a series of supersteps, when we
   write sync() {which flushes all messages to the input queues of the
   intended BSPPeers , does this correspond to a completion of one superstep
   in the whole computation? Most computations have a sync() as the last line
   in the bsp function.

   3. Just as in hadoop each Map/Reduce Task gets an input split, does the
   bsp task also gets an input split. If yes, can we use the readNext() method
   in BSPPeer interface to obtain the data from files.

   4. How is a matrix going to be represented in the file? Are there any
   papers that describe matrix algorithms on the BSP framework.


Thank you all for the previous help and support !!

--
thanks and regards,

Apurv Verma
B. Tech.(CSE)
IIT- Ropar

Re: Hama Doubts

Posted by Thomas Jungblut <th...@googlemail.com>.

Hi Apurv,

  1. I seem to understand the Pi example, here is what I have understood,
>   please correct me if I am wrong.
>   Each of the BSPPeers does the local computation of Pi and sends it to a
>   special BSPPeer which we have chosen as the master node. The choice of
>   master node is completely arbitrary. It is from this node that we later
>   fetch the results.


Yes, it is used by mapreduce as well, see the piestimator in Hadoop.
The design of a master is quite bad in my opinion, but many algorithms have
to use one to keep a globally synced state.

  2. I read that a BSP Task is composed of a series of supersteps, when we
>   write sync() {which flushes all messages to the input queues of the
>   intended BSPPeers , does this correspond to a completion of one superstep
>   in the whole computation? Most computations have a sync() as the last
> line
>   in the bsp function.


When you call sync() a superstep ends. When the method returns, you're in a
new superstep.

  3. Just as in hadoop each Map/Reduce Task gets an input split, does the
>   bsp task also gets an input split. If yes, can we use the readNext()
> method
>   in BSPPeer interface to obtain the data from files.


Yes. You are also allowed to read the input from beginning again when you
call reOpenInput().

  4. How is a matrix going to be represented in the file? Are there any
>   papers that describe matrix algorithms on the BSP framework.


Interesting theme.
A dense matrix can be represented as a two dimensional array, a sparse
matrix could be a hashmap of an row or column id mapped to a vector (which
can be sparse or dense as well).

In a sequencefile I would write LongWritable as a rowid as the key and a
Vector implementation as the value.
But that is just a naive approach, there are better ones. At least it is
depending on what algorithm you want to code.

I'm not a paper guy, so maybe others can link you to other cool papers
about this ;)

Greetings,
Thomas

2012/1/15 Apurv Verma <da...@gmail.com>

> Hii all,
>  I seem to have overcome my initial hadoop setting up problems. I have some
> questions.
>
>
>   1. I seem to understand the Pi example, here is what I have understood,
>   please correct me if I am wrong.
>   Each of the BSPPeers does the local computation of Pi and sends it to a
>   special BSPPeer which we have chosen as the master node. The choice of
>   master node is completely arbitrary. It is from this node that we later
>   fetch the results.
>
>   2. I read that a BSP Task is composed of a series of supersteps, when we
>   write sync() {which flushes all messages to the input queues of the
>   intended BSPPeers , does this correspond to a completion of one superstep
>   in the whole computation? Most computations have a sync() as the last
> line
>   in the bsp function.
>
>   3. Just as in hadoop each Map/Reduce Task gets an input split, does the
>   bsp task also gets an input split. If yes, can we use the readNext()
> method
>   in BSPPeer interface to obtain the data from files.
>
>   4. How is a matrix going to be represented in the file? Are there any
>   papers that describe matrix algorithms on the BSP framework.
>
>
> Thank you all for the previous help and support !!
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>