You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by Apurv Verma <da...@gmail.com> on 2012/01/15 12:17:21 UTC
Hama Doubts
Hii all,
I seem to have overcome my initial hadoop setting up problems. I have some
questions.
1. I seem to understand the Pi example, here is what I have understood,
please correct me if I am wrong.
Each of the BSPPeers does the local computation of Pi and sends it to a
special BSPPeer which we have chosen as the master node. The choice of
master node is completely arbitrary. It is from this node that we later
fetch the results.
2. I read that a BSP Task is composed of a series of supersteps, when we
write sync() {which flushes all messages to the input queues of the
intended BSPPeers , does this correspond to a completion of one superstep
in the whole computation? Most computations have a sync() as the last line
in the bsp function.
3. Just as in hadoop each Map/Reduce Task gets an input split, does the
bsp task also gets an input split. If yes, can we use the readNext() method
in BSPPeer interface to obtain the data from files.
4. How is a matrix going to be represented in the file? Are there any
papers that describe matrix algorithms on the BSP framework.
Thank you all for the previous help and support !!
--
thanks and regards,
Apurv Verma
B. Tech.(CSE)
IIT- Ropar
Re: Hama Doubts
Posted by Thomas Jungblut <th...@googlemail.com>.
Hi Apurv,
1. I seem to understand the Pi example, here is what I have understood,
> please correct me if I am wrong.
> Each of the BSPPeers does the local computation of Pi and sends it to a
> special BSPPeer which we have chosen as the master node. The choice of
> master node is completely arbitrary. It is from this node that we later
> fetch the results.
Yes, it is used by mapreduce as well, see the piestimator in Hadoop.
The design of a master is quite bad in my opinion, but many algorithms have
to use one to keep a globally synced state.
2. I read that a BSP Task is composed of a series of supersteps, when we
> write sync() {which flushes all messages to the input queues of the
> intended BSPPeers , does this correspond to a completion of one superstep
> in the whole computation? Most computations have a sync() as the last
> line
> in the bsp function.
When you call sync() a superstep ends. When the method returns, you're in a
new superstep.
3. Just as in hadoop each Map/Reduce Task gets an input split, does the
> bsp task also gets an input split. If yes, can we use the readNext()
> method
> in BSPPeer interface to obtain the data from files.
Yes. You are also allowed to read the input from beginning again when you
call reOpenInput().
4. How is a matrix going to be represented in the file? Are there any
> papers that describe matrix algorithms on the BSP framework.
Interesting theme.
A dense matrix can be represented as a two dimensional array, a sparse
matrix could be a hashmap of an row or column id mapped to a vector (which
can be sparse or dense as well).
In a sequencefile I would write LongWritable as a rowid as the key and a
Vector implementation as the value.
But that is just a naive approach, there are better ones. At least it is
depending on what algorithm you want to code.
I'm not a paper guy, so maybe others can link you to other cool papers
about this ;)
Greetings,
Thomas
2012/1/15 Apurv Verma <da...@gmail.com>
> Hii all,
> I seem to have overcome my initial hadoop setting up problems. I have some
> questions.
>
>
> 1. I seem to understand the Pi example, here is what I have understood,
> please correct me if I am wrong.
> Each of the BSPPeers does the local computation of Pi and sends it to a
> special BSPPeer which we have chosen as the master node. The choice of
> master node is completely arbitrary. It is from this node that we later
> fetch the results.
>
> 2. I read that a BSP Task is composed of a series of supersteps, when we
> write sync() {which flushes all messages to the input queues of the
> intended BSPPeers , does this correspond to a completion of one superstep
> in the whole computation? Most computations have a sync() as the last
> line
> in the bsp function.
>
> 3. Just as in hadoop each Map/Reduce Task gets an input split, does the
> bsp task also gets an input split. If yes, can we use the readNext()
> method
> in BSPPeer interface to obtain the data from files.
>
> 4. How is a matrix going to be represented in the file? Are there any
> papers that describe matrix algorithms on the BSP framework.
>
>
> Thank you all for the previous help and support !!
>
> --
> thanks and regards,
>
> Apurv Verma
> B. Tech.(CSE)
> IIT- Ropar
>
--
Thomas Jungblut
Berlin <th...@gmail.com>