You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by Raminder Singh <ra...@gmail.com> on 2014/09/19 16:56:24 UTC

Partial results of an application run

Hi Dev,

Currently we are not moving partial results (stdout/stderr and other files) to the gateway incase of application failure (failed to produce output). This can be fixed but question is do we want it to be the default behavior or based on some user flag in experiment. To make it work properly, we need user input to provide regular expressions or other details about required files, incase of failure. Any suggestions on these changes in Application catalog and Airavata API. We also need APIs functions to get job working directory and other details. I created a JIRA for this.

https://issues.apache.org/jira/browse/AIRAVATA-1449

Thanks
Raminder

Re: Partial results of an application run

Posted by Raminderjeet Singh <ra...@indiana.edu>.

Size of the those intermediate files were in gigabytes and some of them
were removed on successful completion, e.g. restart file for checkpointing
the application etc. For the gadget run, user was manually managing the
data transfer using Globus online. Reason was, it used to take a week or
more to transfer all the data out of Stampede to SLAC. I think those large
transfer need to be managed outside Airavata but we can discuss option
based on usecases.

I agree, data management is new thread as transfer options can be
different. Airavata can do gridftp or scp file transfers at this point but
globus online can be an option in future.

Thanks
Raminder


On Fri, Sep 19, 2014 at 5:40 PM, Miller, Mark <mm...@sdsc.edu> wrote:

> Aha, Very interesting, I was not thinking about the full span of
> possibilities. You make a good point.
> In fact, CIPRES has a special system that manages files above a certain
> size, because otherwise they just gum up our whole app and bring the server
> down.
> These are regular result files.
>
> I agree there may be cases where the Gateway needs/wants  to refuse
> certain files, such as the example you mentioned.
> For us it possible, but a lot more work, to specify each file that needs
> to be returned. But for apps with one or two codes that are well understood
> it would be very little effort.
>
> I am wondering if the large files you mentioned are produced by gadget
> only in the failure case?
> And how will/does  Airavata manage files that are very large in the
> success case?
> And how large of files can we manage in Airavata?
>
> But now that's a whole new thread I fear....
>
>
> Mark
>
>
> -----Original Message-----
> From: Raminder Singh [mailto:raminderjsingh@gmail.com]
> Sent: Friday, September 19, 2014 2:28 PM
> To: dev@airavata.apache.org
> Subject: Re: Partial results of an application run
>
> Thanks Mark for the feedback. I agree with you that SciGap should provide
> all the information to the gateway to debug a job. Followup question is, we
> should make this a default behavior or gateway configured behavior? I had
> worked with codes like Gadget in the past which produce huge intermediate
> files. In that case, we don't want to transfer all the files. I would
> suggest gateway to provide details about the file it needs to debug a job
> failure, if not the real intermediate output name some regular expression
> to find them and transfer to the gateway server.
>
> Other way of dealing with this can be transfer intermediate files on
> demand. Gateway admins or users get the list of intermediate files in the
> working folder. Airavata only transfer files based on the user selection.
> Just an idea.
>
> Thanks
> Raminder
>
> On Sep 19, 2014, at 12:18 PM, Marlon Pierce <ma...@iu.edu> wrote:
>
> > Great feedback, Mark--
> >
> > Marlon
> >
> > On 9/19/14, 12:03 PM, Miller, Mark wrote:
> >> Hi Raminder,
> >>
> >> If I understand the issue, I have a comment.
> >>
> >> The stdout/stderr files are absolutely critical to a gateway runner and
> the every end user for debugging issues, even simple ones like a typo that
> breaks the infile formatting.
> >> There are two levels of benefit: first, the savvy user reads these docs
> and solves their own problem more quickly, if it is on their side, and
> notifies me of the message when it isn't on their side. This saves time for
> everyone.
> >> Second, when the non-savvy user reports an issue but can't figure out
> what's wrong,, this is the first place I look to identify the issue. It
> also makes it possible at all to debug on a reasonably large user
> population. removing these files takes away the levers both users and
> gateway owners have to manage issues. If I understand the issue correctly,
> I don't think that SciGap wants to inherit the job of debugging job runs
> for all its constituent Gateways, and not passing these files along would
> do just that.
> >> I strongly feel, again assuming I understand correctly, that all the
> files available from every failed job should be passed along to the Gateway
> by SciGap. If the Gateway owners wish to debug every user issue on their
> own, they can pass only certain files along to the user.
> >>
> >> In our time with CIPRES I think I have used almost every file we or the
> users job has created to debug an issue at one time or another.
> >> Sometimes the absence of a file alone tells me what the issue is. On
> the other hand, many of code produce both STDOUT as a file, and stdout.txt
> as a file.
> >> If SciGap wanted to be responsible for eliminating that ambiguity,
> >> that might be fine. But delivering both copies puts the effort back on
> the Gateway developer to decide how to handle it, and perhaps that is again
> the best solution. And it requires no effort on the part of SciGap, which
> already has many things to take care of.
> >>
> >> So my vote is to return everything to the Gateway for every job.
> >>
> >> The exception with this might be SciGap-created files that do not have
> any relevance to the Gateway. I still think they would be of benefit to the
> Gateway developer (at least) because then they can report the issue to
> SciGap directly and explicitly.
> >> That means the admin on the SciGap side does not have to look up the
> job, maneuver to the directory, and open files to find the error message.
> It conserves many keystrokes/clicks. If we have 50 client Gateways, we will
> be grateful when a User reports the SciGap error message and job number,
> rather than just saying they have an issue.
> >>
> >> Those are my thoughts.
> >>
> >> Best,
> >> Mark
> >>
> >>
> >> From: Raminder Singh [mailto:raminderjsingh@gmail.com]
> >> Sent: Friday, September 19, 2014 7:56 AM
> >> To: dev@airavata.apache.org
> >> Subject: Partial results of an application run
> >>
> >> Hi Dev,
> >>
> >> Currently we are not moving partial results (stdout/stderr and other
> files) to the gateway incase of application failure (failed to produce
> output). This can be fixed but question is do we want it to be the default
> behavior or based on some user flag in experiment. To make it work
> properly, we need user input to provide regular expressions or other
> details about required files, incase of failure. Any suggestions on these
> changes in Application catalog and Airavata API. We also need APIs
> functions to get job working directory and other details. I created a JIRA
> for this.
> >>
> >> https://issues.apache.org/jira/browse/AIRAVATA-1449
> >>
> >> Thanks
> >> Raminder
> >>
> >
>
>

RE: Partial results of an application run

Posted by "Miller, Mark" <mm...@sdsc.edu>.

Aha, Very interesting, I was not thinking about the full span of possibilities. You make a good point.
In fact, CIPRES has a special system that manages files above a certain size, because otherwise they just gum up our whole app and bring the server down.
These are regular result files.

I agree there may be cases where the Gateway needs/wants  to refuse certain files, such as the example you mentioned. 
For us it possible, but a lot more work, to specify each file that needs to be returned. But for apps with one or two codes that are well understood it would be very little effort.

I am wondering if the large files you mentioned are produced by gadget only in the failure case?
And how will/does  Airavata manage files that are very large in the success case? 
And how large of files can we manage in Airavata?

But now that's a whole new thread I fear....


Mark


-----Original Message-----
From: Raminder Singh [mailto:raminderjsingh@gmail.com] 
Sent: Friday, September 19, 2014 2:28 PM
To: dev@airavata.apache.org
Subject: Re: Partial results of an application run

Thanks Mark for the feedback. I agree with you that SciGap should provide all the information to the gateway to debug a job. Followup question is, we should make this a default behavior or gateway configured behavior? I had worked with codes like Gadget in the past which produce huge intermediate files. In that case, we don't want to transfer all the files. I would suggest gateway to provide details about the file it needs to debug a job failure, if not the real intermediate output name some regular expression to find them and transfer to the gateway server.  

Other way of dealing with this can be transfer intermediate files on demand. Gateway admins or users get the list of intermediate files in the working folder. Airavata only transfer files based on the user selection. Just an idea. 

Thanks
Raminder

On Sep 19, 2014, at 12:18 PM, Marlon Pierce <ma...@iu.edu> wrote:

> Great feedback, Mark--
> 
> Marlon
> 
> On 9/19/14, 12:03 PM, Miller, Mark wrote:
>> Hi Raminder,
>> 
>> If I understand the issue, I have a comment.
>> 
>> The stdout/stderr files are absolutely critical to a gateway runner and the every end user for debugging issues, even simple ones like a typo that breaks the infile formatting.
>> There are two levels of benefit: first, the savvy user reads these docs and solves their own problem more quickly, if it is on their side, and notifies me of the message when it isn't on their side. This saves time for everyone.
>> Second, when the non-savvy user reports an issue but can't figure out what's wrong,, this is the first place I look to identify the issue. It also makes it possible at all to debug on a reasonably large user population. removing these files takes away the levers both users and gateway owners have to manage issues. If I understand the issue correctly, I don't think that SciGap wants to inherit the job of debugging job runs for all its constituent Gateways, and not passing these files along would do just that.
>> I strongly feel, again assuming I understand correctly, that all the files available from every failed job should be passed along to the Gateway by SciGap. If the Gateway owners wish to debug every user issue on their own, they can pass only certain files along to the user.
>> 
>> In our time with CIPRES I think I have used almost every file we or the users job has created to debug an issue at one time or another.
>> Sometimes the absence of a file alone tells me what the issue is. On the other hand, many of code produce both STDOUT as a file, and stdout.txt as a file.
>> If SciGap wanted to be responsible for eliminating that ambiguity, 
>> that might be fine. But delivering both copies puts the effort back on the Gateway developer to decide how to handle it, and perhaps that is again the best solution. And it requires no effort on the part of SciGap, which already has many things to take care of.
>> 
>> So my vote is to return everything to the Gateway for every job.
>> 
>> The exception with this might be SciGap-created files that do not have any relevance to the Gateway. I still think they would be of benefit to the Gateway developer (at least) because then they can report the issue to SciGap directly and explicitly.
>> That means the admin on the SciGap side does not have to look up the job, maneuver to the directory, and open files to find the error message. It conserves many keystrokes/clicks. If we have 50 client Gateways, we will be grateful when a User reports the SciGap error message and job number, rather than just saying they have an issue.
>> 
>> Those are my thoughts.
>> 
>> Best,
>> Mark
>> 
>> 
>> From: Raminder Singh [mailto:raminderjsingh@gmail.com]
>> Sent: Friday, September 19, 2014 7:56 AM
>> To: dev@airavata.apache.org
>> Subject: Partial results of an application run
>> 
>> Hi Dev,
>> 
>> Currently we are not moving partial results (stdout/stderr and other files) to the gateway incase of application failure (failed to produce output). This can be fixed but question is do we want it to be the default behavior or based on some user flag in experiment. To make it work properly, we need user input to provide regular expressions or other details about required files, incase of failure. Any suggestions on these changes in Application catalog and Airavata API. We also need APIs functions to get job working directory and other details. I created a JIRA for this.
>> 
>> https://issues.apache.org/jira/browse/AIRAVATA-1449
>> 
>> Thanks
>> Raminder
>> 
>

Re: Partial results of an application run

Posted by Raminder Singh <ra...@gmail.com>.

Thanks Mark for the feedback. I agree with you that SciGap should provide all the information to the gateway to debug a job. Followup question is, we should make this a default behavior or gateway configured behavior? I had worked with codes like Gadget in the past which produce huge intermediate files. In that case, we don’t want to transfer all the files. I would suggest gateway to provide details about the file it needs to debug a job failure, if not the real intermediate output name some regular expression to find them and transfer to the gateway server.  

Other way of dealing with this can be transfer intermediate files on demand. Gateway admins or users get the list of intermediate files in the working folder. Airavata only transfer files based on the user selection. Just an idea. 

Thanks
Raminder

On Sep 19, 2014, at 12:18 PM, Marlon Pierce <ma...@iu.edu> wrote:

> Great feedback, Mark--
> 
> Marlon
> 
> On 9/19/14, 12:03 PM, Miller, Mark wrote:
>> Hi Raminder,
>> 
>> If I understand the issue, I have a comment.
>> 
>> The stdout/stderr files are absolutely critical to a gateway runner and the every end user for debugging issues, even simple ones like a typo that breaks the infile formatting.
>> There are two levels of benefit: first, the savvy user reads these docs and solves their own problem more quickly, if it is on their side, and notifies me of the message when it isn't on their side. This saves time for everyone.
>> Second, when the non-savvy user reports an issue but can't figure out what's wrong,, this is the first place I look to identify the issue. It also makes it possible at all to debug on a reasonably large user population. removing these files takes away the levers both users and gateway owners have to manage issues. If I understand the issue correctly, I don't think that SciGap wants to inherit the job of debugging job runs for all its constituent Gateways, and not passing these files along would do just that.
>> I strongly feel, again assuming I understand correctly, that all the files available from every failed job should be passed along to the Gateway by SciGap. If the Gateway owners wish to debug every user issue on their own, they can pass only certain files along to the user.
>> 
>> In our time with CIPRES I think I have used almost every file we or the users job has created to debug an issue at one time or another.
>> Sometimes the absence of a file alone tells me what the issue is. On the other hand, many of code produce both STDOUT as a file, and stdout.txt as a file.
>> If SciGap wanted to be responsible for eliminating that ambiguity, that might be fine. But delivering both copies puts the effort back on the Gateway developer to decide how to handle it, and perhaps that is again
>> the best solution. And it requires no effort on the part of SciGap, which already has many things to take care of.
>> 
>> So my vote is to return everything to the Gateway for every job.
>> 
>> The exception with this might be SciGap-created files that do not have any relevance to the Gateway. I still think they would be of benefit to the Gateway developer (at least) because then they can report the issue to SciGap directly and explicitly.
>> That means the admin on the SciGap side does not have to look up the job, maneuver to the directory, and open files to find the error message. It conserves many keystrokes/clicks. If we have 50 client Gateways, we will be grateful when a User reports the SciGap error message and job number, rather than just saying they have an issue.
>> 
>> Those are my thoughts.
>> 
>> Best,
>> Mark
>> 
>> 
>> From: Raminder Singh [mailto:raminderjsingh@gmail.com]
>> Sent: Friday, September 19, 2014 7:56 AM
>> To: dev@airavata.apache.org
>> Subject: Partial results of an application run
>> 
>> Hi Dev,
>> 
>> Currently we are not moving partial results (stdout/stderr and other files) to the gateway incase of application failure (failed to produce output). This can be fixed but question is do we want it to be the default behavior or based on some user flag in experiment. To make it work properly, we need user input to provide regular expressions or other details about required files, incase of failure. Any suggestions on these changes in Application catalog and Airavata API. We also need APIs functions to get job working directory and other details. I created a JIRA for this.
>> 
>> https://issues.apache.org/jira/browse/AIRAVATA-1449
>> 
>> Thanks
>> Raminder
>> 
>

Re: Partial results of an application run

Posted by Marlon Pierce <ma...@iu.edu>.

Great feedback, Mark--

Marlon

On 9/19/14, 12:03 PM, Miller, Mark wrote:
> Hi Raminder,
>
> If I understand the issue, I have a comment.
>
> The stdout/stderr files are absolutely critical to a gateway runner and the every end user for debugging issues, even simple ones like a typo that breaks the infile formatting.
> There are two levels of benefit: first, the savvy user reads these docs and solves their own problem more quickly, if it is on their side, and notifies me of the message when it isn't on their side. This saves time for everyone.
> Second, when the non-savvy user reports an issue but can't figure out what's wrong,, this is the first place I look to identify the issue. It also makes it possible at all to debug on a reasonably large user population. removing these files takes away the levers both users and gateway owners have to manage issues. If I understand the issue correctly, I don't think that SciGap wants to inherit the job of debugging job runs for all its constituent Gateways, and not passing these files along would do just that.
> I strongly feel, again assuming I understand correctly, that all the files available from every failed job should be passed along to the Gateway by SciGap. If the Gateway owners wish to debug every user issue on their own, they can pass only certain files along to the user.
>
> In our time with CIPRES I think I have used almost every file we or the users job has created to debug an issue at one time or another.
> Sometimes the absence of a file alone tells me what the issue is. On the other hand, many of code produce both STDOUT as a file, and stdout.txt as a file.
> If SciGap wanted to be responsible for eliminating that ambiguity, that might be fine. But delivering both copies puts the effort back on the Gateway developer to decide how to handle it, and perhaps that is again
> the best solution. And it requires no effort on the part of SciGap, which already has many things to take care of.
>
> So my vote is to return everything to the Gateway for every job.
>
> The exception with this might be SciGap-created files that do not have any relevance to the Gateway. I still think they would be of benefit to the Gateway developer (at least) because then they can report the issue to SciGap directly and explicitly.
> That means the admin on the SciGap side does not have to look up the job, maneuver to the directory, and open files to find the error message. It conserves many keystrokes/clicks. If we have 50 client Gateways, we will be grateful when a User reports the SciGap error message and job number, rather than just saying they have an issue.
>
> Those are my thoughts.
>
> Best,
> Mark
>
>
> From: Raminder Singh [mailto:raminderjsingh@gmail.com]
> Sent: Friday, September 19, 2014 7:56 AM
> To: dev@airavata.apache.org
> Subject: Partial results of an application run
>
> Hi Dev,
>
> Currently we are not moving partial results (stdout/stderr and other files) to the gateway incase of application failure (failed to produce output). This can be fixed but question is do we want it to be the default behavior or based on some user flag in experiment. To make it work properly, we need user input to provide regular expressions or other details about required files, incase of failure. Any suggestions on these changes in Application catalog and Airavata API. We also need APIs functions to get job working directory and other details. I created a JIRA for this.
>
> https://issues.apache.org/jira/browse/AIRAVATA-1449
>
> Thanks
> Raminder
>

RE: Partial results of an application run

Posted by "Miller, Mark" <mm...@sdsc.edu>.

Hi Raminder,

If I understand the issue, I have a comment.

The stdout/stderr files are absolutely critical to a gateway runner and the every end user for debugging issues, even simple ones like a typo that breaks the infile formatting.
There are two levels of benefit: first, the savvy user reads these docs and solves their own problem more quickly, if it is on their side, and notifies me of the message when it isn't on their side. This saves time for everyone.
Second, when the non-savvy user reports an issue but can't figure out what's wrong,, this is the first place I look to identify the issue. It also makes it possible at all to debug on a reasonably large user population. removing these files takes away the levers both users and gateway owners have to manage issues. If I understand the issue correctly, I don't think that SciGap wants to inherit the job of debugging job runs for all its constituent Gateways, and not passing these files along would do just that.
I strongly feel, again assuming I understand correctly, that all the files available from every failed job should be passed along to the Gateway by SciGap. If the Gateway owners wish to debug every user issue on their own, they can pass only certain files along to the user.

In our time with CIPRES I think I have used almost every file we or the users job has created to debug an issue at one time or another.
Sometimes the absence of a file alone tells me what the issue is. On the other hand, many of code produce both STDOUT as a file, and stdout.txt as a file.
If SciGap wanted to be responsible for eliminating that ambiguity, that might be fine. But delivering both copies puts the effort back on the Gateway developer to decide how to handle it, and perhaps that is again
the best solution. And it requires no effort on the part of SciGap, which already has many things to take care of.

So my vote is to return everything to the Gateway for every job.

The exception with this might be SciGap-created files that do not have any relevance to the Gateway. I still think they would be of benefit to the Gateway developer (at least) because then they can report the issue to SciGap directly and explicitly.
That means the admin on the SciGap side does not have to look up the job, maneuver to the directory, and open files to find the error message. It conserves many keystrokes/clicks. If we have 50 client Gateways, we will be grateful when a User reports the SciGap error message and job number, rather than just saying they have an issue.

Those are my thoughts.

Best,
Mark


From: Raminder Singh [mailto:raminderjsingh@gmail.com]
Sent: Friday, September 19, 2014 7:56 AM
To: dev@airavata.apache.org
Subject: Partial results of an application run

Hi Dev,

Currently we are not moving partial results (stdout/stderr and other files) to the gateway incase of application failure (failed to produce output). This can be fixed but question is do we want it to be the default behavior or based on some user flag in experiment. To make it work properly, we need user input to provide regular expressions or other details about required files, incase of failure. Any suggestions on these changes in Application catalog and Airavata API. We also need APIs functions to get job working directory and other details. I created a JIRA for this.

https://issues.apache.org/jira/browse/AIRAVATA-1449

Thanks
Raminder