You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airavata.apache.org by "Sale, Jeff" <es...@ucsd.edu> on 2017/06/26 15:50:20 UTC

Jobs stuck in "EXECUTING" mode appears to be resolved

Thanks, Eroma. Yes, my gfac-config.yaml file appears to be configured properly. I am still using the test.airavata@gmail.com email and the emails are being sent to it correctly. My airavata-server.properties file appears to also be configured properly with that email.

However, I went ahead made one seemingly minor change and added the test airavata gmail account to the airavata-server.properties file under **** Monitoring module Configuration ******. It may have been previously set to the default airavata user and xxx password by me when I sent you the file a few weeks ago. Sorry about that, if that in fact was the key change.

Once I made that change the job completed and the Gaussian.log, .stderr, and .stdout files were successfully created locally. The latter two were empty, but the .log file seems to imply that the job ran successfully, tho' I am not a Gaussian person.

Also, having made that change, my server is now getting a bunch of [ERROR] outputs to the console from what appear to be previous failed jobs which used the same test airavata email account. I'm guessing I can ignore these but I'm not sure. Any thoughts? Also, the links in the Experiment view to the Gaussian.stderr, .stdout, .log files are still broken, but I can view them if I click the "Open" link for the Storage.

Next, I will go ahead and change the email account to my own test gmail account and see what happens. Thanks!

Jeff


________________________________
From: Eroma Abeysinghe [eroma.abeysinghe@gmail.com]
Sent: Monday, June 26, 2017 6:35 AM
To: users@airavata.apache.org
Subject: Re: Job stuck in "launched," "submitted" status

Hi Jarett,

Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now?

Hi Jeff,
Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.

hope this info helps for further investigations.

Thanks,
Eroma

On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <es...@ucsd.edu>> wrote:
I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".

I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.

Jeff

________________________________________
From: Jarett DeAngelis [jarett@bioteam.net<ma...@bioteam.net>]
Sent: Friday, June 23, 2017 1:28 PM
To: users@airavata.apache.org<ma...@airavata.apache.org>
Subject: Job stuck in "launched," "submitted" status

Hi gang,

Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?

Thanks,
Jarett



--
Thank You,
Best Regards,
Eroma

Re: Jobs stuck in "EXECUTING" mode appears to be resolved

Posted by "Pierce, Marlon" <ma...@iu.edu>.
Hi Jarett, can you send a screen shot of the admin console where you get this information? I think what you are seeing is correct behavior.  Airavata distinguishes experiments from jobs, so these have different states. The experiment state is an internal state within Airavata, while the job state is (in this case) the state of the job on the remote queuing system.

 

Marlon

 

 

From: Jarett DeAngelis <ja...@bioteam.net>
Reply-To: "users@airavata.apache.org" <us...@airavata.apache.org>
Date: Monday, June 26, 2017 at 12:54 PM
To: "users@airavata.apache.org" <us...@airavata.apache.org>
Subject: Re: Jobs stuck in "EXECUTING" mode appears to be resolved

 

Confusingly, while I still have no emails in the account, I’ve submitted another job and this one is in EXECUTING status, with a Slurm status of QUEUED. Not sure why there’s an inconsistency there, but it too appears not to be updating. 

 

Jarett

 

 

On Jun 26, 2017, at 11:50 AM, Sale, Jeff <es...@ucsd.edu> wrote:

 

Thanks, Eroma. Yes, my gfac-config.yaml file appears to be configured properly. I am still using the test.airavata@gmail.com email and the emails are being sent to it correctly. My airavata-server.properties file appears to also be configured properly with that email.  

 

However, I went ahead made one seemingly minor change and added the test airavata gmail account to the airavata-server.properties file under **** Monitoring module Configuration ******. It may have been previously set to the default airavata user and xxx password by me when I sent you the file a few weeks ago. Sorry about that, if that in fact was the key change.

 

Once I made that change the job completed and the Gaussian.log, .stderr, and .stdout files were successfully created locally. The latter two were empty, but the .log file seems to imply that the job ran successfully, tho' I am not a Gaussian person. 

 

Also, having made that change, my server is now getting a bunch of [ERROR] outputs to the console from what appear to be previous failed jobs which used the same test airavata email account. I'm guessing I can ignore these but I'm not sure. Any thoughts? Also, the links in the Experiment view to the Gaussian.stderr, .stdout, .log files are still broken, but I can view them if I click the "Open" link for the Storage.

 

Next, I will go ahead and change the email account to my own test gmail account and see what happens. Thanks!

 

Jeff

 

 

From: Eroma Abeysinghe [eroma.abeysinghe@gmail.com]
Sent: Monday, June 26, 2017 6:35 AM
To: users@airavata.apache.org
Subject: Re: Job stuck in "launched," "submitted" status

Hi Jarett,

 

Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now? 

 

Hi Jeff,

Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.

 

hope this info helps for further investigations. 

 

Thanks,

Eroma

 

On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <es...@ucsd.edu> wrote:

I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".

I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.

Jeff

________________________________________
From: Jarett DeAngelis [jarett@bioteam.net]
Sent: Friday, June 23, 2017 1:28 PM
To: users@airavata.apache.org
Subject: Job stuck in "launched," "submitted" status


Hi gang,

Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?

Thanks,
Jarett



 

-- 

Thank You,

Best Regards,

Eroma

 


Re: Jobs stuck in "EXECUTING" mode appears to be resolved

Posted by Jarett DeAngelis <ja...@bioteam.net>.
Confusingly, while I still have no emails in the account, I’ve submitted another job and this one is in EXECUTING status, with a Slurm status of QUEUED. Not sure why there’s an inconsistency there, but it too appears not to be updating.

Jarett


> On Jun 26, 2017, at 11:50 AM, Sale, Jeff <es...@ucsd.edu> wrote:
> 
> Thanks, Eroma. Yes, my gfac-config.yaml file appears to be configured properly. I am still using the test.airavata@gmail.com <ma...@gmail.com> email and the emails are being sent to it correctly. My airavata-server.properties file appears to also be configured properly with that email. 
> 
> However, I went ahead made one seemingly minor change and added the test airavata gmail account to the airavata-server.properties file under **** Monitoring module Configuration ******. It may have been previously set to the default airavata user and xxx password by me when I sent you the file a few weeks ago. Sorry about that, if that in fact was the key change.
> 
> Once I made that change the job completed and the Gaussian.log, .stderr, and .stdout files were successfully created locally. The latter two were empty, but the .log file seems to imply that the job ran successfully, tho' I am not a Gaussian person. 
> 
> Also, having made that change, my server is now getting a bunch of [ERROR] outputs to the console from what appear to be previous failed jobs which used the same test airavata email account. I'm guessing I can ignore these but I'm not sure. Any thoughts? Also, the links in the Experiment view to the Gaussian.stderr, .stdout, .log files are still broken, but I can view them if I click the "Open" link for the Storage.
> 
> Next, I will go ahead and change the email account to my own test gmail account and see what happens. Thanks!
> 
> Jeff
> 
> 
> From: Eroma Abeysinghe [eroma.abeysinghe@gmail.com <ma...@gmail.com>]
> Sent: Monday, June 26, 2017 6:35 AM
> To: users@airavata.apache.org <ma...@airavata.apache.org>
> Subject: Re: Job stuck in "launched," "submitted" status
> 
> Hi Jarett,
> 
> Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now? 
> 
> Hi Jeff,
> Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.
> 
> hope this info helps for further investigations. 
> 
> Thanks,
> Eroma
> 
> On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <esale@ucsd.edu <ma...@ucsd.edu>> wrote:
> I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".
> 
> I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.
> 
> Jeff
> 
> ________________________________________
> From: Jarett DeAngelis [jarett@bioteam.net <ma...@bioteam.net>]
> Sent: Friday, June 23, 2017 1:28 PM
> To: users@airavata.apache.org <ma...@airavata.apache.org>
> Subject: Job stuck in "launched," "submitted" status
> 
> Hi gang,
> 
> Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?
> 
> Thanks,
> Jarett
> 
> 
> 
> -- 
> Thank You,
> Best Regards,
> Eroma


Re: Jobs stuck in "EXECUTING" mode appears to be resolved

Posted by "Pamidighantam, Sudhakar V" <sp...@illinois.edu>.
Jeff:
I can check the gaussian.log to make sure it completed correctly if you can provide a link to the file.

Thanks,
Sudhakar.
On Jun 26, 2017, at 11:50 AM, Sale, Jeff <es...@ucsd.edu>> wrote:

Thanks, Eroma. Yes, my gfac-config.yaml file appears to be configured properly. I am still using the test.airavata@gmail.com<ma...@gmail.com> email and the emails are being sent to it correctly. My airavata-server.properties file appears to also be configured properly with that email.

However, I went ahead made one seemingly minor change and added the test airavata gmail account to the airavata-server.properties file under **** Monitoring module Configuration ******. It may have been previously set to the default airavata user and xxx password by me when I sent you the file a few weeks ago. Sorry about that, if that in fact was the key change.

Once I made that change the job completed and the Gaussian.log, .stderr, and .stdout files were successfully created locally. The latter two were empty, but the .log file seems to imply that the job ran successfully, tho' I am not a Gaussian person.

Also, having made that change, my server is now getting a bunch of [ERROR] outputs to the console from what appear to be previous failed jobs which used the same test airavata email account. I'm guessing I can ignore these but I'm not sure. Any thoughts? Also, the links in the Experiment view to the Gaussian.stderr, .stdout, .log files are still broken, but I can view them if I click the "Open" link for the Storage.

Next, I will go ahead and change the email account to my own test gmail account and see what happens. Thanks!

Jeff


________________________________
From: Eroma Abeysinghe [eroma.abeysinghe@gmail.com<ma...@gmail.com>]
Sent: Monday, June 26, 2017 6:35 AM
To: users@airavata.apache.org<ma...@airavata.apache.org>
Subject: Re: Job stuck in "launched," "submitted" status

Hi Jarett,

Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now?

Hi Jeff,
Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.

hope this info helps for further investigations.

Thanks,
Eroma

On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <es...@ucsd.edu>> wrote:
I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".

I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.

Jeff

________________________________________
From: Jarett DeAngelis [jarett@bioteam.net<ma...@bioteam.net>]
Sent: Friday, June 23, 2017 1:28 PM
To: users@airavata.apache.org<ma...@airavata.apache.org>
Subject: Job stuck in "launched," "submitted" status

Hi gang,

Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?

Thanks,
Jarett



--
Thank You,
Best Regards,
Eroma


Re: Jobs stuck in "EXECUTING" mode appears to be resolved

Posted by Jarett DeAngelis <ja...@bioteam.net>.
Hi Eroma,

I have no emails in my Airavata account for this job at all. The last time an email went into/out of it was the end of 2016, which is also the last time this instance of Airavata was tested or used.

Jarett

> On Jun 26, 2017, at 11:50 AM, Sale, Jeff <es...@ucsd.edu> wrote:
> 
> Thanks, Eroma. Yes, my gfac-config.yaml file appears to be configured properly. I am still using the test.airavata@gmail.com <ma...@gmail.com> email and the emails are being sent to it correctly. My airavata-server.properties file appears to also be configured properly with that email. 
> 
> However, I went ahead made one seemingly minor change and added the test airavata gmail account to the airavata-server.properties file under **** Monitoring module Configuration ******. It may have been previously set to the default airavata user and xxx password by me when I sent you the file a few weeks ago. Sorry about that, if that in fact was the key change.
> 
> Once I made that change the job completed and the Gaussian.log, .stderr, and .stdout files were successfully created locally. The latter two were empty, but the .log file seems to imply that the job ran successfully, tho' I am not a Gaussian person. 
> 
> Also, having made that change, my server is now getting a bunch of [ERROR] outputs to the console from what appear to be previous failed jobs which used the same test airavata email account. I'm guessing I can ignore these but I'm not sure. Any thoughts? Also, the links in the Experiment view to the Gaussian.stderr, .stdout, .log files are still broken, but I can view them if I click the "Open" link for the Storage.
> 
> Next, I will go ahead and change the email account to my own test gmail account and see what happens. Thanks!
> 
> Jeff
> 
> 
> From: Eroma Abeysinghe [eroma.abeysinghe@gmail.com <ma...@gmail.com>]
> Sent: Monday, June 26, 2017 6:35 AM
> To: users@airavata.apache.org <ma...@airavata.apache.org>
> Subject: Re: Job stuck in "launched," "submitted" status
> 
> Hi Jarett,
> 
> Did you do a recent upgrade of airavata and pga? If not please do so with the latest production. By the information you have provided, it could be an issue with gfac server reading from the rabbitmq queue. But you said although the experiment is in LAUNCHED job is in submitted. So does your email contain unread emails for this job? When was the last time the experiment completed and any changes done to server machines, etc.. from then to now? 
> 
> Hi Jeff,
> Yours is slightly different since its in EXECUTING. With the information you have provided, I think your issue could be with email monitoring. Do you have unread emails for the jobs in EXECUTING in your email box? If you do, then you need to check you gfac-config.yaml in airavata bin folder and make sure it processes emails from the comet.
> 
> hope this info helps for further investigations. 
> 
> Thanks,
> Eroma
> 
> On Fri, Jun 23, 2017 at 4:56 PM, Sale, Jeff <esale@ucsd.edu <ma...@ucsd.edu>> wrote:
> I have a similar issue. I have been working with the Airavata support folks, Eroma, Supun, and Marcus for the past few weeks trying to get Gaussian jobs to run on Comet. They have been super helpful, and it appears I am now able to run jobs to completion according to the Gaussian.log file in the scratch directory on Comet, but when I browse to the Experiment on the PGA the stdout and stderr files never appear as a link in Outputs and the job status is perpetually in  "EXECUTING".
> 
> I seem to recall Supun saying this was something they were aware of and are working to resolve, but I could be wrong about this.
> 
> Jeff
> 
> ________________________________________
> From: Jarett DeAngelis [jarett@bioteam.net <ma...@bioteam.net>]
> Sent: Friday, June 23, 2017 1:28 PM
> To: users@airavata.apache.org <ma...@airavata.apache.org>
> Subject: Job stuck in "launched," "submitted" status
> 
> Hi gang,
> 
> Working on our Airavata deployment (still build 16) again and have encountered an issue where after submitting a job to Slurm, it gets stuck in the “LAUNCHED” state, appearing to have sent the job to Slurm because it says “SUBMITTED” underneath, but it just stays that way forever. If you look at RabbitMQ there is a message sitting in the queue. Our first thought was that it was the email account we’re using for job tracking, but that is functioning fine. Where should I be looking for answers?
> 
> Thanks,
> Jarett
> 
> 
> 
> -- 
> Thank You,
> Best Regards,
> Eroma