You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "Eroma (JIRA)" <ji...@apache.org> on 2017/05/16 19:20:04 UTC

[jira] [Updated] (AIRAVATA-2378) Jobs failing at execution of squeue command due to response of 'Invalid job ID'

     [ https://issues.apache.org/jira/browse/AIRAVATA-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eroma updated AIRAVATA-2378:
----------------------------
    Issue Type: Sub-task  (was: Bug)
        Parent: AIRAVATA-2386

> Jobs failing at execution of squeue command due to response of 'Invalid job ID'
> -------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2378
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2378
>             Project: Airavata
>          Issue Type: Sub-task
>          Components: Airavata System, PGA PHP Web Gateway
>    Affects Versions: 0.17
>         Environment: https://ultrascan.scigap.org/
>            Reporter: Eroma
>            Assignee: Suresh Marru
>             Fix For: 0.17
>
>
> When the job is submitted and a job ID is returned fro the cluster, gfac executes squeue command. When this command returns queued job details gfac goes and executes gateway user details to XSEDE machines and also adds the job ID to monitoring map.
> In intermittent cases, the SSH session validation takes longer after the job submission and then by the time squeue command is executed the job is no longer in the queue hence error returned [1]
> [1]
> 2017-05-02 06:27:48,047 [pool-7-thread-15] ERROR o.a.a.g.i.t.DefaultJobSubmissionTask process_id=PROCESS_c7e404ed-0822-404a-8f04-6b09e9ba8ece, token_id=75918c63-30fd-4548-a8d3-7f3a41b185ae, experiment_id=US3-AIRA_740b0ad6-62c4-42dc-9eed-f12b92a6b98b, gateway_id=Ultrascan_Production - Error occurred while submitting the job
> org.apache.airavata.gfac.core.GFacException: Error running command squeue -j 9119082  on remote cluster. StandardError: slurm_load_jobs error: Invalid job id specified
>         at org.apache.airavata.gfac.impl.HPCRemoteCluster.throwExceptionOnError(HPCRemoteCluster.java:298)
>         at org.apache.airavata.gfac.impl.HPCRemoteCluster.getJobStatus(HPCRemoteCluster.java:233)
>         at org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.verifyJobSubmissionByJobId(DefaultJobSubmissionTask.java:302)
>         at org.apache.airavata.gfac.impl.task.DefaultJobSubmissionTask.execute(DefaultJobSubmissionTask.java:157)
>         at org.apache.airavata.gfac.impl.GFacEngineImpl.executeTask(GFacEngineImpl.java:814)
>         at org.apache.airavata.gfac.impl.GFacEngineImpl.executeJobSubmission(GFacEngineImpl.java:510)
>         at org.apache.airavata.gfac.impl.GFacEngineImpl.executeTaskListFrom(GFacEngineImpl.java:386)
>         at org.apache.airavata.gfac.impl.GFacEngineImpl.executeProcess(GFacEngineImpl.java:286)
>         at org.apache.airavata.gfac.impl.GFacWorker.executeProcess(GFacWorker.java:227)
>         at org.apache.airavata.gfac.impl.GFacWorker.run(GFacWorker.java:86)
>         at org.apache.airavata.common.logging.MDCUtil.lambda$wrapWithMDC$0(MDCUtil.java:40)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)