You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "Eroma (Jira)" <ji...@apache.org> on 2020/03/26 18:45:00 UTC

[jira] [Updated] (AIRAVATA-2941) Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job limit per user.

     [ https://issues.apache.org/jira/browse/AIRAVATA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eroma updated AIRAVATA-2941:
----------------------------
    Labels: gsoc2020  (was: )

> Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job limit per user.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2941
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2941
>             Project: Airavata
>          Issue Type: Bug
>          Components: GFac, helix implementation
>    Affects Versions: 0.18
>         Environment: https://staging.ultrascan.scigap.org & https://ultrascan.scigap.org/ 
>            Reporter: Eroma
>            Assignee: Shameera
>            Priority: Major
>              Labels: gsoc2020
>             Fix For: 0.18
>
>
> Currently experiments fail when
>  # HPC queue reaches the max job number for the queue.
>  # When the job submission fails and HPC sent job submission response [1]airavata tags the experiment as FAILED.
>  # The only option for gateway user is to submit the experiment again.
> Fix required is to Airavata to have internal queues or a way to manage such experiments until the HPC queue is available for jobs and not to FAIL the experiment.
>  
> [1]
> This example os from stampede2
> ----------------------------------------------------------------- Welcome to the Stampede2 Supercomputer ----------------------------------------------------------------- No reservation for this job --> Verifying valid submit host (login3)...OK --> Verifying valid jobname...OK --> Enforcing max jobs per user...FAILED [*] Too many simultaneous jobs in queue. --> Max job limits for us3 = 50 jobs
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)