You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by Praveen Sripati <pr...@gmail.com> on 2012/04/05 12:37:25 UTC

# of BSP task slots and # of InputSplits

Hi,

If the number of InputSplits is more than the number of bsp task slots
available in the cluster (which is the case is most of the scenarios) how
is this handled in the Hama? Will the tasks run in multiple iterations
storing the intermediate messages in HDFS? For example, lets say there are
100 InputSplits and 10 bsp slots. So, it will require 10 iterations of 10
bsp tasks to complete the job.

Praveen

Re: # of BSP task slots and # of InputSplits

Posted by Suraj Menon <su...@apache.org>.
> Again, as I mentioned earlier I don't
> want to get every thing fixed in the next release, but would like to get a
> thought process and a discussion going on to make Hama better.

Keep them coming. :)

Thanks,
Suraj

On Fri, Apr 6, 2012 at 12:27 AM, Praveen Sripati
<pr...@gmail.com>wrote:

> > So when the bsp job is in progress, if free slots come into picture, it
> is too late for the currently running bsp jobs to consider these new slots.
>
> It's clear that either due to the limitations of the BSP model or the Hama
> framework, the free slots available after the job has been launched can't
> be used. For the same reason, I think there will be more data locality for
> the tasks in Hadoop than Hama.
>
> It would be good to evaluate this when multiple jobs run in a cluster and
> as resources get free as other jobs get completed.
>
> > The hama job designer would already have set the optimum number of tasks
> for his jobs. This rule even applies to hadoop job designer.
>
> In Hadoop the user sets the number of redcuers and the number of mappers is
> not explicitly set by the user which is determined by the number of
> InputSplits. Also, the parallelization of is determined by the limitation
> of the Hadoop framework, h/w resources and how much of the cluster capacity
> has been assigned.
>
> For tera sort (with HDFS block size of 128 MB) around 8000 map tasks nned
> to be spawned. According to the Yahoo tera sort document (
> sortbenchmark.org/YahooHadoop.pdf) the maximum tasks spawn tare around
> 1500-1600 map and reduce tasks.
>
> > I feel optimizing algorithms on both frameworks is an iterative process.
>
> Can't agree more. Since I am new to BSP and Hama, I am trying to figure it
> out and why some decisions were made. Again, as I mentioned earlier I don't
> want to get every thing fixed in the next release, but would like to get a
> thought process and a discussion going on to make Hama better.
>
> On Fri, Apr 6, 2012 at 2:32 AM, Suraj Menon <me...@gmail.com> wrote:
>
> > First, I would like to explain a BSP task lifecycle. In its first
> > superstep, a set of bsp tasks run in parallel. They are frequently called
> > as peers. The count of peers can be specified by the hama job designer or
> > be decided by the hama job scheduler based on the number of input splits
> or
> > availability of free slots. In the first superstep, all the peers
> typically
> > would be reading from HDFS, or from a socket stream or making up their
> own
> > inputs for next superstep. The outputs generated in the first superstep
> > would be then given to intended peers. Every peer then enters the sync
> > barrier. This gives a global synchronization state for all peers where
> each
> > one of them could be sure that the destination peers received the
> messages
> > sent to them. In the subsequent supersteps, every peer reads from the
> > messages that were sent to itself by other peers in the previous
> superstep.
> > This repeats till the maximum allowed superstep count specified by the
> job
> > designer or till desired results are computed.
> >
> > > But, the # number of free slots change constantly (as other
> > > jobs gets completed and started). The free slots could be assigned to
> the
> > > bsp job in question at a later stage also, similar to an mr job.
> >
> > >Adding additional tasks might make the job complete faster.
> >
> > So this is where we have to understand that the lifecycle of a mapper or
> a
> > reducer is equivalent to a single superstep of bsp task. A bsp task is
> > located on the same machine until the job completes unlike mapper or
> > reducer task.
> >
> > So when the bsp job is in progress, if free slots come into picture, it
> is
> > too late for the currently running bsp jobs to consider these new slots.
> > The hama job designer would already have set the optimum number of tasks
> > for his jobs. This rule even applies to hadoop job designer. We can't
> have
> > 1,000,000 mappers and 1,000,000 reducers as explained on their wiki. A
> > careful designer would specify his input partitions and the partitioner
> for
> > reduce such that it is optimum and won't exhaust the cluster resources.
> > Having more bsp tasks in parallel need not make things faster. I feel
> > optimizing algorithms on both frameworks is an iterative process. In
> Hama,
> > new free slots would be considered if a job is scheduled after the new
> > slots come into picture. If some of the jobs don't get scheduled because
> > non-availability of slots, this is good indication that either the
> current
> > tasks are taking too many slots or your cluster is not big enough to
> carry
> > out all jobs in parallel.
> >
> > Hadoop has some very good functionalities implemented by some very good
> > engineers and we do try to get our insights from them if not reuse their
> > code ;). But the requirements change in certain circumstances when we
> have
> > to implement BSP model. We may have to tweak a lot with HDFS too in
> future.
> >
> > To add to this, I remember Chiahung mentioning about Ciel project that
> > schedules tasks dynamically as needed and I thought it would be cool to
> > have this feature in hama, especially for real-time computation. Vaguely
> > thinking, we can start tasks with input from  checkpointed data. We don't
> > have this feature in our immediate roadmap. We are trying to get some
> > important features out first.
> >
> >
> > -Suraj
> >
>

Re: # of BSP task slots and # of InputSplits

Posted by Praveen Sripati <pr...@gmail.com>.
> So when the bsp job is in progress, if free slots come into picture, it
is too late for the currently running bsp jobs to consider these new slots.

It's clear that either due to the limitations of the BSP model or the Hama
framework, the free slots available after the job has been launched can't
be used. For the same reason, I think there will be more data locality for
the tasks in Hadoop than Hama.

It would be good to evaluate this when multiple jobs run in a cluster and
as resources get free as other jobs get completed.

> The hama job designer would already have set the optimum number of tasks
for his jobs. This rule even applies to hadoop job designer.

In Hadoop the user sets the number of redcuers and the number of mappers is
not explicitly set by the user which is determined by the number of
InputSplits. Also, the parallelization of is determined by the limitation
of the Hadoop framework, h/w resources and how much of the cluster capacity
has been assigned.

For tera sort (with HDFS block size of 128 MB) around 8000 map tasks nned
to be spawned. According to the Yahoo tera sort document (
sortbenchmark.org/YahooHadoop.pdf) the maximum tasks spawn tare around
1500-1600 map and reduce tasks.

> I feel optimizing algorithms on both frameworks is an iterative process.

Can't agree more. Since I am new to BSP and Hama, I am trying to figure it
out and why some decisions were made. Again, as I mentioned earlier I don't
want to get every thing fixed in the next release, but would like to get a
thought process and a discussion going on to make Hama better.

On Fri, Apr 6, 2012 at 2:32 AM, Suraj Menon <me...@gmail.com> wrote:

> First, I would like to explain a BSP task lifecycle. In its first
> superstep, a set of bsp tasks run in parallel. They are frequently called
> as peers. The count of peers can be specified by the hama job designer or
> be decided by the hama job scheduler based on the number of input splits or
> availability of free slots. In the first superstep, all the peers typically
> would be reading from HDFS, or from a socket stream or making up their own
> inputs for next superstep. The outputs generated in the first superstep
> would be then given to intended peers. Every peer then enters the sync
> barrier. This gives a global synchronization state for all peers where each
> one of them could be sure that the destination peers received the messages
> sent to them. In the subsequent supersteps, every peer reads from the
> messages that were sent to itself by other peers in the previous superstep.
> This repeats till the maximum allowed superstep count specified by the job
> designer or till desired results are computed.
>
> > But, the # number of free slots change constantly (as other
> > jobs gets completed and started). The free slots could be assigned to the
> > bsp job in question at a later stage also, similar to an mr job.
>
> >Adding additional tasks might make the job complete faster.
>
> So this is where we have to understand that the lifecycle of a mapper or a
> reducer is equivalent to a single superstep of bsp task. A bsp task is
> located on the same machine until the job completes unlike mapper or
> reducer task.
>
> So when the bsp job is in progress, if free slots come into picture, it is
> too late for the currently running bsp jobs to consider these new slots.
> The hama job designer would already have set the optimum number of tasks
> for his jobs. This rule even applies to hadoop job designer. We can't have
> 1,000,000 mappers and 1,000,000 reducers as explained on their wiki. A
> careful designer would specify his input partitions and the partitioner for
> reduce such that it is optimum and won't exhaust the cluster resources.
> Having more bsp tasks in parallel need not make things faster. I feel
> optimizing algorithms on both frameworks is an iterative process. In Hama,
> new free slots would be considered if a job is scheduled after the new
> slots come into picture. If some of the jobs don't get scheduled because
> non-availability of slots, this is good indication that either the current
> tasks are taking too many slots or your cluster is not big enough to carry
> out all jobs in parallel.
>
> Hadoop has some very good functionalities implemented by some very good
> engineers and we do try to get our insights from them if not reuse their
> code ;). But the requirements change in certain circumstances when we have
> to implement BSP model. We may have to tweak a lot with HDFS too in future.
>
> To add to this, I remember Chiahung mentioning about Ciel project that
> schedules tasks dynamically as needed and I thought it would be cool to
> have this feature in hama, especially for real-time computation. Vaguely
> thinking, we can start tasks with input from  checkpointed data. We don't
> have this feature in our immediate roadmap. We are trying to get some
> important features out first.
>
>
> -Suraj
>

Re: # of BSP task slots and # of InputSplits

Posted by Suraj Menon <me...@gmail.com>.
First, I would like to explain a BSP task lifecycle. In its first
superstep, a set of bsp tasks run in parallel. They are frequently called
as peers. The count of peers can be specified by the hama job designer or
be decided by the hama job scheduler based on the number of input splits or
availability of free slots. In the first superstep, all the peers typically
would be reading from HDFS, or from a socket stream or making up their own
inputs for next superstep. The outputs generated in the first superstep
would be then given to intended peers. Every peer then enters the sync
barrier. This gives a global synchronization state for all peers where each
one of them could be sure that the destination peers received the messages
sent to them. In the subsequent supersteps, every peer reads from the
messages that were sent to itself by other peers in the previous superstep.
This repeats till the maximum allowed superstep count specified by the job
designer or till desired results are computed.

> But, the # number of free slots change constantly (as other
> jobs gets completed and started). The free slots could be assigned to the
> bsp job in question at a later stage also, similar to an mr job.

>Adding additional tasks might make the job complete faster.

So this is where we have to understand that the lifecycle of a mapper or a
reducer is equivalent to a single superstep of bsp task. A bsp task is
located on the same machine until the job completes unlike mapper or
reducer task.

So when the bsp job is in progress, if free slots come into picture, it is
too late for the currently running bsp jobs to consider these new slots.
The hama job designer would already have set the optimum number of tasks
for his jobs. This rule even applies to hadoop job designer. We can't have
1,000,000 mappers and 1,000,000 reducers as explained on their wiki. A
careful designer would specify his input partitions and the partitioner for
reduce such that it is optimum and won't exhaust the cluster resources.
Having more bsp tasks in parallel need not make things faster. I feel
optimizing algorithms on both frameworks is an iterative process. In Hama,
new free slots would be considered if a job is scheduled after the new
slots come into picture. If some of the jobs don't get scheduled because
non-availability of slots, this is good indication that either the current
tasks are taking too many slots or your cluster is not big enough to carry
out all jobs in parallel.

Hadoop has some very good functionalities implemented by some very good
engineers and we do try to get our insights from them if not reuse their
code ;). But the requirements change in certain circumstances when we have
to implement BSP model. We may have to tweak a lot with HDFS too in future.

To add to this, I remember Chiahung mentioning about Ciel project that
schedules tasks dynamically as needed and I thought it would be cool to
have this feature in hama, especially for real-time computation. Vaguely
thinking, we can start tasks with input from  checkpointed data. We don't
have this feature in our immediate roadmap. We are trying to get some
important features out first.


-Suraj

Re: # of BSP task slots and # of InputSplits

Posted by Praveen Sripati <pr...@gmail.com>.
> How often does the slots in your Hadoop cluster change?

The total # of slots (free and busy) change only when more nodes are added
to the cluster. But, the # number of free slots change constantly (as other
jobs gets completed and started). The free slots could be assigned to the
bsp job in question at a later stage also, similar to an mr job.

> I don't think it is needed for the BSP model to use additional tasks,

Adding additional tasks might make the job complete faster.

> This is really not how BSP works.

Let's define it :)

Praveen

On Thu, Apr 5, 2012 at 8:59 PM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> How often does the slots in your Hadoop cluster change? In mine it changes
> once every 2 months when I get a hardware upgrade or a failure occurs.
> I don't think it is needed for the BSP model to use additional tasks, also
> it is not helpful to schedule just a chunk of tasks at the same time.
> This is really not how BSP works.
>
> Am 5. April 2012 16:36 schrieb Praveen Sripati <pr...@gmail.com>:
>
> > So, if more slots are available after the initial 10 slots then Hama
> can't
> > use them because the assignment has already been done. Looks like not an
> > efficient use of the cluster. Hadoop is able to use the additional slots
> > effectively.
> >
> > It would be nice to evaluate the Hadoop way also for Hama.
> >
> > Praveen
> >
> > On Thu, Apr 5, 2012 at 4:42 PM, Thomas Jungblut <
> > thomas.jungblut@googlemail.com> wrote:
> >
> > > No this kind of execution is so Hadoop-y.
> > > It will use the maximum slots (10 in your case) and tries to fit the
> > input
> > > to the 10 slots e.G by assigning multiple files/blocks to a single
> task.
> > >
> > > Am 5. April 2012 12:37 schrieb Praveen Sripati <
> praveensripati@gmail.com
> > >:
> > >
> > > > Hi,
> > > >
> > > > If the number of InputSplits is more than the number of bsp task
> slots
> > > > available in the cluster (which is the case is most of the scenarios)
> > how
> > > > is this handled in the Hama? Will the tasks run in multiple
> iterations
> > > > storing the intermediate messages in HDFS? For example, lets say
> there
> > > are
> > > > 100 InputSplits and 10 bsp slots. So, it will require 10 iterations
> of
> > 10
> > > > bsp tasks to complete the job.
> > > >
> > > > Praveen
> > > >
> > >
> > >
> > >
> > > --
> > > Thomas Jungblut
> > > Berlin <th...@gmail.com>
> > >
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: # of BSP task slots and # of InputSplits

Posted by Thomas Jungblut <th...@googlemail.com>.
How often does the slots in your Hadoop cluster change? In mine it changes
once every 2 months when I get a hardware upgrade or a failure occurs.
I don't think it is needed for the BSP model to use additional tasks, also
it is not helpful to schedule just a chunk of tasks at the same time.
This is really not how BSP works.

Am 5. April 2012 16:36 schrieb Praveen Sripati <pr...@gmail.com>:

> So, if more slots are available after the initial 10 slots then Hama can't
> use them because the assignment has already been done. Looks like not an
> efficient use of the cluster. Hadoop is able to use the additional slots
> effectively.
>
> It would be nice to evaluate the Hadoop way also for Hama.
>
> Praveen
>
> On Thu, Apr 5, 2012 at 4:42 PM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
> > No this kind of execution is so Hadoop-y.
> > It will use the maximum slots (10 in your case) and tries to fit the
> input
> > to the 10 slots e.G by assigning multiple files/blocks to a single task.
> >
> > Am 5. April 2012 12:37 schrieb Praveen Sripati <praveensripati@gmail.com
> >:
> >
> > > Hi,
> > >
> > > If the number of InputSplits is more than the number of bsp task slots
> > > available in the cluster (which is the case is most of the scenarios)
> how
> > > is this handled in the Hama? Will the tasks run in multiple iterations
> > > storing the intermediate messages in HDFS? For example, lets say there
> > are
> > > 100 InputSplits and 10 bsp slots. So, it will require 10 iterations of
> 10
> > > bsp tasks to complete the job.
> > >
> > > Praveen
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: # of BSP task slots and # of InputSplits

Posted by Praveen Sripati <pr...@gmail.com>.
So, if more slots are available after the initial 10 slots then Hama can't
use them because the assignment has already been done. Looks like not an
efficient use of the cluster. Hadoop is able to use the additional slots
effectively.

It would be nice to evaluate the Hadoop way also for Hama.

Praveen

On Thu, Apr 5, 2012 at 4:42 PM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> No this kind of execution is so Hadoop-y.
> It will use the maximum slots (10 in your case) and tries to fit the input
> to the 10 slots e.G by assigning multiple files/blocks to a single task.
>
> Am 5. April 2012 12:37 schrieb Praveen Sripati <pr...@gmail.com>:
>
> > Hi,
> >
> > If the number of InputSplits is more than the number of bsp task slots
> > available in the cluster (which is the case is most of the scenarios) how
> > is this handled in the Hama? Will the tasks run in multiple iterations
> > storing the intermediate messages in HDFS? For example, lets say there
> are
> > 100 InputSplits and 10 bsp slots. So, it will require 10 iterations of 10
> > bsp tasks to complete the job.
> >
> > Praveen
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: # of BSP task slots and # of InputSplits

Posted by Thomas Jungblut <th...@googlemail.com>.
No this kind of execution is so Hadoop-y.
It will use the maximum slots (10 in your case) and tries to fit the input
to the 10 slots e.G by assigning multiple files/blocks to a single task.

Am 5. April 2012 12:37 schrieb Praveen Sripati <pr...@gmail.com>:

> Hi,
>
> If the number of InputSplits is more than the number of bsp task slots
> available in the cluster (which is the case is most of the scenarios) how
> is this handled in the Hama? Will the tasks run in multiple iterations
> storing the intermediate messages in HDFS? For example, lets say there are
> 100 InputSplits and 10 bsp slots. So, it will require 10 iterations of 10
> bsp tasks to complete the job.
>
> Praveen
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>