You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airavata.apache.org by Jarett DeAngelis <ja...@bioteam.net> on 2017/06/23 20:28:48 UTC

Job stuck in "launched," "submitted" status

Hi gang,

Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?

Thanks,
Jarett

Re: Job stuck in "launched," "submitted" status

Posted by Jarett DeAngelis <ja...@bioteam.net>.
As I said, now I have one job in LAUNCHED and another in EXECUTING. Slurm job IDs are getting created but there appears to be no “callback” when they’re finished.

One odd thing that’s happening can be seen in the log.

2017-06-26 12:50:20,901 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Error parsing email message =====================================>
org.apache.airavata.common.exception.AiravataException: [EJM]: Couldn't identify Resource job manager type from address Google <no...@accounts.google.com>
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.getJobMonitorType(EmailBasedMonitor.java:181)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.parse(EmailBasedMonitor.java:165)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.processMessages(EmailBasedMonitor.java:265)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.run(EmailBasedMonitor.java:234)
	at java.lang.Thread.run(Thread.java:745)
2017-06-26 12:50:20,903 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - FROM: Google <no...@accounts.google.com>
2017-06-26 12:50:20,903 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - TO: ars.scinet@gmail.com
2017-06-26 12:50:20,903 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - SUBJECT: Someone has your password
2017-06-26 12:50:36,459 [Thread-21] INFO  org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 1 job/s in job monitor map
2017-06-26 12:50:36,756 [Thread-21] INFO  org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Retrieving unseen emails
2017-06-26 12:50:37,383 [Thread-21] INFO  org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 104 new email/s received
2017-06-26 12:51:06,350 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Error parsing email message =====================================>
org.apache.airavata.common.exception.AiravataException: [EJM]: Couldn't identify Resource job manager type from address Google <no...@accounts.google.com>
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.getJobMonitorType(EmailBasedMonitor.java:181)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.parse(EmailBasedMonitor.java:165)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.processMessages(EmailBasedMonitor.java:265)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.run(EmailBasedMonitor.java:234)
	at java.lang.Thread.run(Thread.java:745)
2017-06-26 12:51:06,350 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - FROM: Google <no...@accounts.google.com>
2017-06-26 12:51:06,350 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - TO: ars.scinet@gmail.com
2017-06-26 12:51:06,350 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - SUBJECT: Someone has your password
2017-06-26 12:51:22,007 [Thread-21] INFO  org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 1 job/s in job monitor map
2017-06-26 12:51:22,445 [Thread-21] INFO  org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Retrieving unseen emails
2017-06-26 12:51:23,071 [Thread-21] INFO  org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: 104 new email/s received
2017-06-26 12:51:51,038 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - [EJM]: Error parsing email message =====================================>
org.apache.airavata.common.exception.AiravataException: [EJM]: Couldn't identify Resource job manager type from address Google <no...@accounts.google.com>
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.getJobMonitorType(EmailBasedMonitor.java:181)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.parse(EmailBasedMonitor.java:165)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.processMessages(EmailBasedMonitor.java:265)
	at org.apache.airavata.gfac.monitor.email.EmailBasedMonitor.run(EmailBasedMonitor.java:234)
	at java.lang.Thread.run(Thread.java:745)
2017-06-26 12:51:51,039 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - FROM: Google <no...@accounts.google.com>
2017-06-26 12:51:51,039 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - TO: ars.scinet@gmail.com
2017-06-26 12:51:51,039 [Thread-21] ERROR org.apache.airavata.gfac.monitor.email.EmailBasedMonitor  - SUBJECT: Someone has your password

This email from Google to the effect of “someone has your password” is not found anywhere in the account, so I don’t know why it keeps “reading" it over and over again. It does seem to be able to access the email account without any difficulty.

J

> On Jun 26, 2017, at 9:35 AM, Eroma Abeysinghe <er...@gmail.com> wrote:
> 
> Hi Jarett,
> 
> Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now? 
> 
> Hi Jeff,
> Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.
> 
> hope this info helps for further investigations. 
> 
> Thanks,
> Eroma
> 
> On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <esale@ucsd.edu <ma...@ucsd.edu>> wrote:
> I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".
> 
> I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.
> 
> Jeff
> 
> ________________________________________
> From: Jarett DeAngelis [jarett@bioteam.net <ma...@bioteam.net>]
> Sent: Friday, June 23, 2017 1:28 PM
> To: users@airavata.apache.org <ma...@airavata.apache.org>
> Subject: Job stuck in "launched," "submitted" status
> 
> Hi gang,
> 
> Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?
> 
> Thanks,
> Jarett
> 
> 
> 
> -- 
> Thank You,
> Best Regards,
> Eroma


Re: Job stuck in "launched," "submitted" status

Posted by Jarett DeAngelis <ja...@bioteam.net>.
#!/bin/bash
#SBATCH -p short
#SBATCH -N 1
#SBATCH -n 20
#SBATCH --mail-user=ars.scinet@gmail.com,jarett@bioteam.net
#SBATCH --mail-type=ALL 
#SBATCH -t 1:0:00
#SBATCH -J A1002898126
#SBATCH -o /home/pga.submit/PGA_files/PROCESS_bc3d3c5f-30b9-48ab-bef7-e92fa8065f7a/MUSCLE.stdout
#SBATCH -e /home/pga.submit/PGA_files/PROCESS_bc3d3c5f-30b9-48ab-bef7-e92fa8065f7a/MUSCLE.stderr

module load muscle 


cd /home/pga.submit/PGA_files/PROCESS_bc3d3c5f-30b9-48ab-bef7-e92fa8065f7a

muscle -in inputFile -out outFile 


> On Jun 26, 2017, at 9:35 AM, Eroma Abeysinghe <er...@gmail.com> wrote:
> 
> Hi Jarett,
> 
> Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now? 
> 
> Hi Jeff,
> Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.
> 
> hope this info helps for further investigations. 
> 
> Thanks,
> Eroma
> 
> On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <esale@ucsd.edu <ma...@ucsd.edu>> wrote:
> I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".
> 
> I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.
> 
> Jeff
> 
> ________________________________________
> From: Jarett DeAngelis [jarett@bioteam.net <ma...@bioteam.net>]
> Sent: Friday, June 23, 2017 1:28 PM
> To: users@airavata.apache.org <ma...@airavata.apache.org>
> Subject: Job stuck in "launched," "submitted" status
> 
> Hi gang,
> 
> Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?
> 
> Thanks,
> Jarett
> 
> 
> 
> -- 
> Thank You,
> Best Regards,
> Eroma


Re: Job stuck in "launched," "submitted" status

Posted by Eroma Abeysinghe <er...@gmail.com>.
Hi Jarett,

Did you do a recent upgrade of airavata and pga? If not please do so with
the latest production. By the information you have provided, it could be an
issue with gfac server reading from the rabbitmq queue. But you said
although the experiment is in LAUNCHED job is in submitted. So does your
email contain unread emails for this job? When was the last time the
experiment completed and any changes done to server machines, etc.. from
then to now?

Hi Jeff,
Yours is slightly different since its in EXECUTING. With the information
you have provided, I think your issue could be with email monitoring. Do
you have unread emails for the jobs in EXECUTING in your email box? If you
do, then you need to check you gfac-config.yaml in airavata bin folder and
make sure it processes emails from the comet.

hope this info helps for further investigations.

Thanks,
Eroma

On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <es...@ucsd.edu> wrote:

> I have a similar issue. I have been working with the Airavata support
> folks, Eroma, Supun, and Marcus for the past few weeks trying to get
> Gaussian jobs to run on Comet. They have been super helpful, and it appears
> I am now able to run jobs to completion according to the Gaussian.log file
> in the scratch directory on Comet, but when I browse to the Experiment on
> the PGA the stdout and stderr files never appear as a link in Outputs and
> the job status is perpetually in  "EXECUTING".
>
> I seem to recall Supun saying this was something they were aware of and
> are working to resolve, but I could be wrong about this.
>
> Jeff
>
> ________________________________________
> From: Jarett DeAngelis [jarett@bioteam.net]
> Sent: Friday, June 23, 2017 1:28 PM
> To: users@airavata.apache.org
> Subject: Job stuck in "launched," "submitted" status
>
> Hi gang,
>
> Working on our Airavata deployment (still build 16) again and have
> encountered an issue where after submitting a job to Slurm, it gets stuck
> in the “LAUNCHED” state, appearing to have sent the job to Slurm because it
> says “SUBMITTED” underneath, but it just stays that way forever. If you
> look at RabbitMQ there is a message sitting in the queue. Our first thought
> was that it was the email account we’re using for job tracking, but that is
> functioning fine. Where should I be looking for answers?
>
> Thanks,
> Jarett
>



-- 
Thank You,
Best Regards,
Eroma

RE: Job stuck in "launched," "submitted" status

Posted by "Sale, Jeff" <es...@ucsd.edu>.
I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING". 

I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.

Jeff

________________________________________
From: Jarett DeAngelis [jarett@bioteam.net]
Sent: Friday, June 23, 2017 1:28 PM
To: users@airavata.apache.org
Subject: Job stuck in "launched," "submitted" status

Hi gang,

Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?

Thanks,
Jarett