You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "Dimuthu Upeksha (JIRA)" <ji...@apache.org> on 2018/05/08 20:32:00 UTC

[jira] [Resolved] (AIRAVATA-2736) Job submitted and running in HPC while the experiment is tagged as FAILED

     [ https://issues.apache.org/jira/browse/AIRAVATA-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dimuthu Upeksha resolved AIRAVATA-2736.
---------------------------------------
    Resolution: Fixed

> Job submitted and running in HPC while the experiment is tagged as FAILED
> -------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2736
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2736
>             Project: Airavata
>          Issue Type: Bug
>          Components: helix implementation
>    Affects Versions: 0.18
>         Environment: http://149.165.168.248:8008/ - Helix test env
>            Reporter: Eroma
>            Assignee: Dimuthu Upeksha
>            Priority: Major
>             Fix For: 0.18
>
>
> # Submitted an experiment which then submitted the job.
>  # Job ID is returned and the status is ACTIVE.
>  # Due to zookeeper connection issue the experiment is FAILED.
>  # The job is still running in HPC
>  # Airavata is not waiting for job monitoring as the task status is not updated in the zookeeper.
>  # error in log [1]
>  # SLM001-AmberSander-BR2_5ed5a19f-ab44-4eba-afb7-1feafaf0bbdd - exp ID
> [1]
> |org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /monitoring/2159926/lock at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778) at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:696) at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:679) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:676) at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453) at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443) at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) at org.apache.airavata.helix.impl.task.submission.JobSubmissionTask.createMonitoringNode(JobSubmissionTask.java:83) at org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:144) at org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:264) at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:74) at org.apache.helix.task.TaskRunner.run(TaskRunner.java:70) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)