You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Raminderjeet Singh <ra...@gmail.com> on 2012/08/27 16:11:03 UTC

Output Array

Hi Dev, 

I am working on a astronomy (One degree imager-ODI ) workflow  where we have a requirement of For Each. Application inside the workflow need to be run for every outputs but there are multiple outputs from the previous application in the workflow.  For each is designed to work on Array list of outputs and to run multiple instance of same application for each output in the array. Our current implementation only deal with URIArray when outputs are written to outputData folder. In this case also outputs are written there but there are different outputs so my recommendation to application developer was to write the output parameter multiple time to stdout file [see below] . GFAC can read the stdout file for same output parameter and generate output Array for for-each node. As we have StringArray, DoubleArray, BooleanArray etc and we are not use any of those. I am thinking of handling this as StringArray if there are no objections.  Suggestions?

Application stdout

CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1
CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList2
CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList3
processLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/processLogs.txt
outList=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/outList.txt
dataLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/dataLogs.txt
parent=
exitStatus=2 NOPIPE 6 t HALT "subpipeline not available"
exitBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/exitBB.txt
restoreBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/restoreBB.txt
datasets=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/datasets.txt

Thanks
Raminder


Re: Output Array

Posted by Suresh Marru <sm...@apache.org>.
On Aug 27, 2012, at 11:02 AM, Raminderjeet Singh <ra...@gmail.com> wrote:

> I thought about that. The challenge is how we will distinguish between file paths and strings. My idea is in case of  same host and if output parameter is coming multiple times and user defined as StringArray we can create a string array and iterate For-each based on that. The case you mention for URIArray need to be handled as separate requirement where we read output paths from stdout and then output handler validate the outputs in directory path and create URIArray. We should not create URI's without validating outputs exist.

I see now, the stdout's are not distinguished, I raised a JIRA (AIRAVATA-547). So enhancing (or implementing) StringArray for multiple stdout name-value pairs makes sense. However, we need to make sure we do not add any implicit inferences like prefixing url's and so on. I do not think thats the intent here, but just mentioning that in Airavata we all need to ensure that special cases are handled by explicit mechanisms, that way we do not confuse future incumbents. 

Suresh

> Thanks
> Raminder
> 
> On Aug 27, 2012, at 10:43 AM, Suresh Marru wrote:
> 
>> Hi Raman,
>> 
>> On Aug 27, 2012, at 10:11 AM, Raminderjeet Singh <ra...@gmail.com> wrote:
>> 
>>> Hi Dev, 
>>> 
>>> I am working on a astronomy (One degree imager-ODI ) workflow  where we have a requirement of For Each. Application inside the workflow need to be run for every outputs but there are multiple outputs from the previous application in the workflow.  For each is designed to work on Array list of outputs and to run multiple instance of same application for each output in the array. Our current implementation only deal with URIArray when outputs are written to outputData folder. In this case also outputs are written there but there are different outputs so my recommendation to application developer was to write the output parameter multiple time to stdout file [see below] . GFAC can read the stdout file for same output parameter and generate output Array for for-each node. As we have StringArray, DoubleArray, BooleanArray etc and we are not use any of those. I am thinking of handling this as StringArray if there are no objections.  Suggestions?
>> 
>> This is a good use case, the StringArray will work but how about if a user wants to actually to output Strings? As an example if the output is, "/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1", it will not make sense to a subsequent service executing on a different host, unless we prefix with right file transfer protocol and such. But how about if the user really wanted to send this path as a string? how do we distinguish both of these? 
>> 
>> I suggest to modify the URIArray itself and have it first look at the stdout for occurrence of multiple name-value pairs before going to look in outputData folder. This way all the other semantics of URI handling are preserved. And we can preserve the semantic difference of output types.
>> 
>> Suresh
>> 
>>> Application stdout
>>> 
>>> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1
>>> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList2
>>> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList3
>>> processLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/processLogs.txt
>>> outList=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/outList.txt
>>> dataLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/dataLogs.txt
>>> parent=
>>> exitStatus=2 NOPIPE 6 t HALT "subpipeline not available"
>>> exitBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/exitBB.txt
>>> restoreBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/restoreBB.txt
>>> datasets=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/datasets.txt
>>> 
>>> Thanks
>>> Raminder
>>> 
>> 
> 


Re: Output Array

Posted by Raminderjeet Singh <ra...@gmail.com>.
I thought about that. The challenge is how we will distinguish between file paths and strings. My idea is in case of  same host and if output parameter is coming multiple times and user defined as StringArray we can create a string array and iterate For-each based on that. The case you mention for URIArray need to be handled as separate requirement where we read output paths from stdout and then output handler validate the outputs in directory path and create URIArray. We should not create URI's without validating outputs exist.

Thanks
Raminder

On Aug 27, 2012, at 10:43 AM, Suresh Marru wrote:

> Hi Raman,
> 
> On Aug 27, 2012, at 10:11 AM, Raminderjeet Singh <ra...@gmail.com> wrote:
> 
>> Hi Dev, 
>> 
>> I am working on a astronomy (One degree imager-ODI ) workflow  where we have a requirement of For Each. Application inside the workflow need to be run for every outputs but there are multiple outputs from the previous application in the workflow.  For each is designed to work on Array list of outputs and to run multiple instance of same application for each output in the array. Our current implementation only deal with URIArray when outputs are written to outputData folder. In this case also outputs are written there but there are different outputs so my recommendation to application developer was to write the output parameter multiple time to stdout file [see below] . GFAC can read the stdout file for same output parameter and generate output Array for for-each node. As we have StringArray, DoubleArray, BooleanArray etc and we are not use any of those. I am thinking of handling this as StringArray if there are no objections.  Suggestions?
> 
> This is a good use case, the StringArray will work but how about if a user wants to actually to output Strings? As an example if the output is, "/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1", it will not make sense to a subsequent service executing on a different host, unless we prefix with right file transfer protocol and such. But how about if the user really wanted to send this path as a string? how do we distinguish both of these? 
> 
> I suggest to modify the URIArray itself and have it first look at the stdout for occurrence of multiple name-value pairs before going to look in outputData folder. This way all the other semantics of URI handling are preserved. And we can preserve the semantic difference of output types.
> 
> Suresh
> 
>> Application stdout
>> 
>> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1
>> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList2
>> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList3
>> processLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/processLogs.txt
>> outList=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/outList.txt
>> dataLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/dataLogs.txt
>> parent=
>> exitStatus=2 NOPIPE 6 t HALT "subpipeline not available"
>> exitBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/exitBB.txt
>> restoreBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/restoreBB.txt
>> datasets=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/datasets.txt
>> 
>> Thanks
>> Raminder
>> 
> 


Re: Output Array

Posted by Suresh Marru <sm...@apache.org>.
Hi Raman,

On Aug 27, 2012, at 10:11 AM, Raminderjeet Singh <ra...@gmail.com> wrote:

> Hi Dev, 
> 
> I am working on a astronomy (One degree imager-ODI ) workflow  where we have a requirement of For Each. Application inside the workflow need to be run for every outputs but there are multiple outputs from the previous application in the workflow.  For each is designed to work on Array list of outputs and to run multiple instance of same application for each output in the array. Our current implementation only deal with URIArray when outputs are written to outputData folder. In this case also outputs are written there but there are different outputs so my recommendation to application developer was to write the output parameter multiple time to stdout file [see below] . GFAC can read the stdout file for same output parameter and generate output Array for for-each node. As we have StringArray, DoubleArray, BooleanArray etc and we are not use any of those. I am thinking of handling this as StringArray if there are no objections.  Suggestions?

This is a good use case, the StringArray will work but how about if a user wants to actually to output Strings? As an example if the output is, "/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1", it will not make sense to a subsequent service executing on a different host, unless we prefix with right file transfer protocol and such. But how about if the user really wanted to send this path as a string? how do we distinguish both of these? 

I suggest to modify the URIArray itself and have it first look at the stdout for occurrence of multiple name-value pairs before going to look in outputData folder. This way all the other semantics of URI handling are preserved. And we can preserve the semantic difference of output types.

Suresh

> Application stdout
> 
> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList1
> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList2
> CallPipeList=/scratch/01437/ogce/newrun/Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/CallPipeList3
> processLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/processLogs.txt
> outList=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/outList.txt
> dataLogs=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/dataLogs.txt
> parent=
> exitStatus=2 NOPIPE 6 t HALT "subpipeline not available"
> exitBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/exitBB.txt
> restoreBB=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/restoreBB.txt
> datasets=/scratch/01437/ogce/newrun//Lonestar_application_Mon_Aug_27_09_50_27_EDT_2012_ee954a79-d9ff-4757-8c0a-f6946d60b0e1/outputData/datasets.txt
> 
> Thanks
> Raminder
>