You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hama.apache.org by Praveen Sripati <pr...@gmail.com> on 2012/04/05 02:51:29 UTC

K-Means on BSP

http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html

> Now we are going to broadcast each of this computed averages to the other
tasks. Then we are going to sync so all messages can be delivered.

Instead of sending the computed averages to all the tasks, we could send
them to a master task and let the master task do all the computations and
send it back to all the nodes. This way we are decreasing the number of
messages from n*n to 2*n and less computation on the non-master tasks.

Also, can we have a dedicated master task or make one of the bsp task as a
master task?

Thanks,
Praveen

Re: K-Means on BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

This is really not helpful, since the centers must be stored on each task
in RAM.
So the delta updates of each task must be broadcasted to a master,
calculated and send back.
And what are the machines doing during the master tasks calculates?
This is simply waste of resources and I very much doubt that it will be
faster.

And n^2 is not exponential growth FYI.

Am 5. April 2012 16:45 schrieb Praveen Sripati <pr...@gmail.com>:

> As the # of tasks increases, the # of messages increases exponentially. So,
> it might be useful to consider/benchmark sending the messages to the
> master, let it compute and send the data back to all the tasks.
>
> Praveen
>
> On Thu, Apr 5, 2012 at 10:51 AM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
> > The sync() is much more expensive than sending n^2 messages.
> > But it would be very interesting to benchmark both against each other.
> > Another interesting thing would be to know how HAMA-546 [1] could be used
> > to distribute the centers to multiple tasks.
> >
> > [1] https://issues.apache.org/jira/browse/HAMA-546
> >
> > Am 5. April 2012 02:51 schrieb Praveen Sripati <praveensripati@gmail.com
> >:
> >
> > >
> > >
> >
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html
> > >
> > > > Now we are going to broadcast each of this computed averages to the
> > other
> > > tasks. Then we are going to sync so all messages can be delivered.
> > >
> > > Instead of sending the computed averages to all the tasks, we could
> send
> > > them to a master task and let the master task do all the computations
> and
> > > send it back to all the nodes. This way we are decreasing the number of
> > > messages from n*n to 2*n and less computation on the non-master tasks.
> > >
> > > Also, can we have a dedicated master task or make one of the bsp task
> as
> > a
> > > master task?
> > >
> > > Thanks,
> > > Praveen
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: K-Means on BSP

Posted by Praveen Sripati <pr...@gmail.com>.

As the # of tasks increases, the # of messages increases exponentially. So,
it might be useful to consider/benchmark sending the messages to the
master, let it compute and send the data back to all the tasks.

Praveen

On Thu, Apr 5, 2012 at 10:51 AM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> The sync() is much more expensive than sending n^2 messages.
> But it would be very interesting to benchmark both against each other.
> Another interesting thing would be to know how HAMA-546 [1] could be used
> to distribute the centers to multiple tasks.
>
> [1] https://issues.apache.org/jira/browse/HAMA-546
>
> Am 5. April 2012 02:51 schrieb Praveen Sripati <pr...@gmail.com>:
>
> >
> >
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html
> >
> > > Now we are going to broadcast each of this computed averages to the
> other
> > tasks. Then we are going to sync so all messages can be delivered.
> >
> > Instead of sending the computed averages to all the tasks, we could send
> > them to a master task and let the master task do all the computations and
> > send it back to all the nodes. This way we are decreasing the number of
> > messages from n*n to 2*n and less computation on the non-master tasks.
> >
> > Also, can we have a dedicated master task or make one of the bsp task as
> a
> > master task?
> >
> > Thanks,
> > Praveen
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: K-Means on BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

The sync() is much more expensive than sending n^2 messages.
But it would be very interesting to benchmark both against each other.
Another interesting thing would be to know how HAMA-546 [1] could be used
to distribute the centers to multiple tasks.

[1] https://issues.apache.org/jira/browse/HAMA-546

Am 5. April 2012 02:51 schrieb Praveen Sripati <pr...@gmail.com>:

>
> http://codingwiththomas.blogspot.in/2011/12/k-means-clustering-with-bsp-intuition.html
>
> > Now we are going to broadcast each of this computed averages to the other
> tasks. Then we are going to sync so all messages can be delivered.
>
> Instead of sending the computed averages to all the tasks, we could send
> them to a master task and let the master task do all the computations and
> send it back to all the nodes. This way we are decreasing the number of
> messages from n*n to 2*n and less computation on the non-master tasks.
>
> Also, can we have a dedicated master task or make one of the bsp task as a
> master task?
>
> Thanks,
> Praveen
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>