You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Thomas Jungblut (JIRA)" <ji...@apache.org> on 2012/10/19 20:07:11 UTC

[jira] [Comment Edited] (HAMA-613) Scheduler kills job too silently when out of slots

    [ https://issues.apache.org/jira/browse/HAMA-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480189#comment-13480189 ] 

Thomas Jungblut edited comment on HAMA-613 at 10/19/12 6:05 PM:
----------------------------------------------------------------

There is a small bug.

bq. if (maxTasks < job.getNumBspTask()) {

If we have let's say 10 tasks for a job, and the maxTasks (which is maximum tasks in cluster minus the number of running tasks) are 10 as well (for example if no job runs, but we have 10 slots. This will fail.

Proposing to change this to > instead of lower than.

Certainly a change in naming will also be better suited:

{noformat}
 int availableSlots = clusterStatus.getMaxTasks() - clusterStatus.getTasks();
    if (availableSlots > job.getNumBspTask()) {
      LOG.error("Job failed! No more taks slots available");
      System.exit(-1);
    }
{noformat}

                
      was (Author: thomas.jungblut):
    There is a small bug.

bq. if (maxTasks < job.getNumBspTask()) {

If we have let's say 10 tasks for a job, and the maxTasks (which is maximum tasks in cluster minus the number of running tasks) are 10 as well (for example if no job runs, but we have 10 slots. This will fail.

Proposing to change this to <= instead of lower than.
                  
> Scheduler kills job too silently when out of slots
> --------------------------------------------------
>
>                 Key: HAMA-613
>                 URL: https://issues.apache.org/jira/browse/HAMA-613
>             Project: Hama
>          Issue Type: Bug
>          Components: bsp core
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Yuesheng Hu
>            Priority: Blocker
>             Fix For: 0.6.0
>
>         Attachments: HAMA-613.patch
>
>
> If for example a user submits two text files as input, it will sometimes be split in 4 chunks.
> This usually exceeds the number of tasks that are available in the cluster (if out of the box installation just have 3 tasks configured).
> Mainly two questions that pop into my mind:
> -Why are two text files split into 4 tasks if the BSPJobClient should check if it exceeds the number of available task slots?
> -Why does the Client schedules the job if it knows that there are not enough slots available?
> Of course this should yield into a less cryptic error message. Well, actually currently there is no error messages, constantly confusing users.
> This is a blocker for 0.6.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira