You are viewing a plain text version of this content. The canonical link for it is here.

Posted to architecture@airavata.apache.org by Sachith Withana <sw...@gmail.com> on 2014/03/14 16:25:46 UTC

Experiment Summary retrieval

Hi all,

Almost all gateways have a requirement of retrieving the experiment
summaries of all the experiments . The fields that are required  differ
based on the gateway.

For Example:
CIPRES requires : Experiment name and status
Some gateways requires the Experiment name with the inputs only.

But right now the getAllExperiments() method returns the list of all the
Experiments with all the experiment related attributes filled ( the whole
Experiment Model).

It's costly to get the whole Experiment objects from the API rather than
getting the required few attributes.

Any suggestions on how we could achieve this?

One suggestion would be to have a getAllExperiments method with the
parameters as a list of required fields of the Experiments and return only
those fields.

-- 
Thanks,
Sachith Withana

Re: Experiment Summary retrieval

Posted by Lahiru Gunathilake <gl...@gmail.com>.

Hi All,

I think this is a good idea, from airavata point of view, it sounds like a
good idea to hand over the static data to some other party (static data is
the data generated during a job submission and monitoring, which is not
goingt o change). Airavata shouldn't get burdened with those provenance
type of data. For airavata's runtime we only need these data, until the job
get completed.

Based on the usecase of the science gateway, they can arrange their data
and make the data retrieval process more reliable. My idea is airavata
server or runtime should not get burden with provenance data, if Airavata
needs this feature we might have to separate it from Airavata runtime (I
mean job submission and monitoring).

Regards
Lahiru


On Fri, Mar 14, 2014 at 12:11 PM, Borries Demeler <
demeler@biochem.uthscsa.edu> wrote:

> I am not sure if this relates at all to the getAllExperiments() question,
> but I thought it may help to see how we deal with retrieving expt. data
> from
> the gateway in UltraScan.
>
> In the case of UltraScan we have the HPC output very much separated from
> the
> information of the results that are useful to the user. All data are
> getting
> tar.gzipped into a single archive which is transported back out and
> automatically
> parsed into a relational database that is part of the UltraScan software,
> and not
> a function of the gateway. The user will then access the database with
> secondary
> software and create a type of meta-analysis and visualization of the
> results
> that ultimately are then useful to the user. Having said this, while data
> are
> being calculated on the HPC resource, the user can monitor the process of
> the
> calculation by reviewing a queue viewer that shows all pending and active
> jobs
> and their actual state. This state include a lot of information keeping the
> user apprised of the state. This information is continually being sent via
> UDP
> from the HPC resource and monitored by a daemon on our backend LIMS server,
> again, this is semi-independent from the gateway software. This may be very
> specialized for our use case, but it is one example of how the problem can
> be
> solved.
>
> -borries
>
> On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> > Thanks for identifying the problem Sachith. To be precise potentially
> there
> > are many subset variations of interested fields in the Experiment Data
> > which different gateways will be interested in the same usecase or
> > different usecase. We cannot scale by providing different functions for
> > each of these scenarios.
> >
> >
> > On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <swsachith@gmail.com
> >wrote:
> >
> > > Hi all,
> > >
> > > Almost all gateways have a requirement of retrieving the experiment
> > > summaries of all the experiments . The fields that are required  differ
> > > based on the gateway.
> > >
> > > For Example:
> > > CIPRES requires : Experiment name and status
> > > Some gateways requires the Experiment name with the inputs only.
> > >
> > > But right now the getAllExperiments() method returns the list of all
> the
> > > Experiments with all the experiment related attributes filled ( the
> whole
> > > Experiment Model).
> > >
> > > It's costly to get the whole Experiment objects from the API rather
> than
> > > getting the required few attributes.
> > >
> > > Any suggestions on how we could achieve this?
> > >
> > > One suggestion would be to have a getAllExperiments method with the
> > > parameters as a list of required fields of the Experiments and return
> only
> > > those fields.
> > >
> > > --
> > > Thanks,
> > > Sachith Withana
> > >
>



-- 
System Analyst Programmer
PTI Lab
Indiana University

Re: Experiment Summary retrieval

Posted by Borries Demeler <de...@biochem.uthscsa.edu>.

Yes, any vncviewer should work - realvnc, tightvnc, tigervnc, chicken of the vnc (mac), 
and there are others. All of them should be free. To test it out, please try to connect
to the login screen:

vncviewer alamo.uthscsa.edu:1
password: SciGap

-b.
On Fri, Mar 14, 2014 at 07:16:12PM +0000, Miller, Mark wrote:
> Borries,
> 
> I haven't used this vnc thing before.
> Do I just need the free client mentioned on this page?
> http://www.realvnc.com/products/vnc/
> 
> Mark
> 
> -----Original Message-----
> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
> Sent: Friday, March 14, 2014 11:48 AM
> To: architecture@airavata.apache.org
> Subject: Re: Experiment Summary retrieval
> 
> Marlon,
> would it be possible to put me on the agenda for about 30 mins. during our next telecon?
> Everyone should have a vnc viewer client on their computer to follow along.
> 
> Thanks, -borries
> 
> On Fri, Mar 14, 2014 at 06:42:46PM +0000, Miller, Mark wrote:
> > I think that would be useful, personally. 
> > 
> > -----Original Message-----
> > From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> > Sent: Friday, March 14, 2014 11:37 AM
> > To: architecture@airavata.apache.org
> > Subject: Re: Experiment Summary retrieval
> > 
> > 
> > Mark, Lahiru and others:
> > 
> > If you guys aren't averse to yet another online conference, I would be happy to give an on-screen demo of our system. It is a bit difficult to explain per email how this rather complicated system is organized, but it may be worthwhile to get some insights of one possible solution to how this can be handled. I wouldn't say that the gateway is completely separated from the task of processing output data, but the task is abstracted to a very basic and uncomplicated level, and then the more complex issues are handled in a custom fashion by our software. This way we can on the one hand have robust communications and on the other fine-grained and customized data handling for the specific needs of the user. I would probably need about 20-30 mins of your time. Perhaps during the next SciGap meeting?
> > It would be best if everyone had a vnc viewer on their desktop to follow along.
> > 
> > -Borries
> > 
> > 
> > On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
> > > Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
> > > 
> > > I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
> > > 
> > > The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
> > > 
> > > Mark
> > > 
> > > 
> > > -----Original Message-----
> > > From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> > > Sent: Friday, March 14, 2014 9:12 AM
> > > To: architecture@airavata.apache.org
> > > Subject: Re: Experiment Summary retrieval
> > > 
> > > I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
> > > 
> > > In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
> > > 
> > > -borries
> > > 
> > > On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> > > > Thanks for identifying the problem Sachith. To be precise 
> > > > potentially there are many subset variations of interested fields 
> > > > in the Experiment Data which different gateways will be interested 
> > > > in the same usecase or different usecase. We cannot scale by 
> > > > providing different functions for each of these scenarios.
> > > > 
> > > > 
> > > > On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> > > > 
> > > > > Hi all,
> > > > >
> > > > > Almost all gateways have a requirement of retrieving the 
> > > > > experiment summaries of all the experiments . The fields that 
> > > > > are required differ based on the gateway.
> > > > >
> > > > > For Example:
> > > > > CIPRES requires : Experiment name and status Some gateways 
> > > > > requires the Experiment name with the inputs only.
> > > > >
> > > > > But right now the getAllExperiments() method returns the list of 
> > > > > all the Experiments with all the experiment related attributes 
> > > > > filled ( the whole Experiment Model).
> > > > >
> > > > > It's costly to get the whole Experiment objects from the API 
> > > > > rather than getting the required few attributes.
> > > > >
> > > > > Any suggestions on how we could achieve this?
> > > > >
> > > > > One suggestion would be to have a getAllExperiments method with 
> > > > > the parameters as a list of required fields of the Experiments 
> > > > > and return only those fields.
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Sachith Withana
> > > > >

Re: Experiment Summary retrieval

Posted by Borries Demeler <de...@biochem.uthscsa.edu>.

GH doesn't support Slackware Linux - already checked it out.

-b.
On Fri, Mar 14, 2014 at 04:13:52PM -0400, Suresh Marru wrote:
> Hi Borries,
> 
> I think it is worth considering to do a google hangout screen presentation instead of a VNC. Hangouts have good support for multiple operating systems and you can look here if your system is supported - https://support.google.com/plus/answer/1216376?hl=en
> 
> Suresh
> 
> On Mar 14, 2014, at 3:16 PM, Miller, Mark <mm...@sdsc.edu> wrote:
> 
> > Borries,
> > 
> > I haven't used this vnc thing before.
> > Do I just need the free client mentioned on this page?
> > http://www.realvnc.com/products/vnc/
> > 
> > Mark
> > 
> > -----Original Message-----
> > From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
> > Sent: Friday, March 14, 2014 11:48 AM
> > To: architecture@airavata.apache.org
> > Subject: Re: Experiment Summary retrieval
> > 
> > Marlon,
> > would it be possible to put me on the agenda for about 30 mins. during our next telecon?
> > Everyone should have a vnc viewer client on their computer to follow along.
> > 
> > Thanks, -borries
> > 
> > On Fri, Mar 14, 2014 at 06:42:46PM +0000, Miller, Mark wrote:
> >> I think that would be useful, personally. 
> >> 
> >> -----Original Message-----
> >> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> >> Sent: Friday, March 14, 2014 11:37 AM
> >> To: architecture@airavata.apache.org
> >> Subject: Re: Experiment Summary retrieval
> >> 
> >> 
> >> Mark, Lahiru and others:
> >> 
> >> If you guys aren't averse to yet another online conference, I would be happy to give an on-screen demo of our system. It is a bit difficult to explain per email how this rather complicated system is organized, but it may be worthwhile to get some insights of one possible solution to how this can be handled. I wouldn't say that the gateway is completely separated from the task of processing output data, but the task is abstracted to a very basic and uncomplicated level, and then the more complex issues are handled in a custom fashion by our software. This way we can on the one hand have robust communications and on the other fine-grained and customized data handling for the specific needs of the user. I would probably need about 20-30 mins of your time. Perhaps during the next SciGap meeting?
> >> It would be best if everyone had a vnc viewer on their desktop to follow along.
> >> 
> >> -Borries
> >> 
> >> 
> >> On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
> >>> Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
> >>> 
> >>> I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
> >>> 
> >>> The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
> >>> 
> >>> Mark
> >>> 
> >>> 
> >>> -----Original Message-----
> >>> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> >>> Sent: Friday, March 14, 2014 9:12 AM
> >>> To: architecture@airavata.apache.org
> >>> Subject: Re: Experiment Summary retrieval
> >>> 
> >>> I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
> >>> 
> >>> In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
> >>> 
> >>> -borries
> >>> 
> >>> On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> >>>> Thanks for identifying the problem Sachith. To be precise 
> >>>> potentially there are many subset variations of interested fields 
> >>>> in the Experiment Data which different gateways will be interested 
> >>>> in the same usecase or different usecase. We cannot scale by 
> >>>> providing different functions for each of these scenarios.
> >>>> 
> >>>> 
> >>>> On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> >>>> 
> >>>>> Hi all,
> >>>>> 
> >>>>> Almost all gateways have a requirement of retrieving the 
> >>>>> experiment summaries of all the experiments . The fields that 
> >>>>> are required differ based on the gateway.
> >>>>> 
> >>>>> For Example:
> >>>>> CIPRES requires : Experiment name and status Some gateways 
> >>>>> requires the Experiment name with the inputs only.
> >>>>> 
> >>>>> But right now the getAllExperiments() method returns the list of 
> >>>>> all the Experiments with all the experiment related attributes 
> >>>>> filled ( the whole Experiment Model).
> >>>>> 
> >>>>> It's costly to get the whole Experiment objects from the API 
> >>>>> rather than getting the required few attributes.
> >>>>> 
> >>>>> Any suggestions on how we could achieve this?
> >>>>> 
> >>>>> One suggestion would be to have a getAllExperiments method with 
> >>>>> the parameters as a list of required fields of the Experiments 
> >>>>> and return only those fields.
> >>>>> 
> >>>>> --
> >>>>> Thanks,
> >>>>> Sachith Withana
> >>>>>

Re: Experiment Summary retrieval

Posted by Suresh Marru <sm...@apache.org>.

Hi Borries,

I think it is worth considering to do a google hangout screen presentation instead of a VNC. Hangouts have good support for multiple operating systems and you can look here if your system is supported - https://support.google.com/plus/answer/1216376?hl=en

Suresh

On Mar 14, 2014, at 3:16 PM, Miller, Mark <mm...@sdsc.edu> wrote:

> Borries,
> 
> I haven't used this vnc thing before.
> Do I just need the free client mentioned on this page?
> http://www.realvnc.com/products/vnc/
> 
> Mark
> 
> -----Original Message-----
> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
> Sent: Friday, March 14, 2014 11:48 AM
> To: architecture@airavata.apache.org
> Subject: Re: Experiment Summary retrieval
> 
> Marlon,
> would it be possible to put me on the agenda for about 30 mins. during our next telecon?
> Everyone should have a vnc viewer client on their computer to follow along.
> 
> Thanks, -borries
> 
> On Fri, Mar 14, 2014 at 06:42:46PM +0000, Miller, Mark wrote:
>> I think that would be useful, personally. 
>> 
>> -----Original Message-----
>> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
>> Sent: Friday, March 14, 2014 11:37 AM
>> To: architecture@airavata.apache.org
>> Subject: Re: Experiment Summary retrieval
>> 
>> 
>> Mark, Lahiru and others:
>> 
>> If you guys aren't averse to yet another online conference, I would be happy to give an on-screen demo of our system. It is a bit difficult to explain per email how this rather complicated system is organized, but it may be worthwhile to get some insights of one possible solution to how this can be handled. I wouldn't say that the gateway is completely separated from the task of processing output data, but the task is abstracted to a very basic and uncomplicated level, and then the more complex issues are handled in a custom fashion by our software. This way we can on the one hand have robust communications and on the other fine-grained and customized data handling for the specific needs of the user. I would probably need about 20-30 mins of your time. Perhaps during the next SciGap meeting?
>> It would be best if everyone had a vnc viewer on their desktop to follow along.
>> 
>> -Borries
>> 
>> 
>> On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
>>> Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
>>> 
>>> I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
>>> 
>>> The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
>>> 
>>> Mark
>>> 
>>> 
>>> -----Original Message-----
>>> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
>>> Sent: Friday, March 14, 2014 9:12 AM
>>> To: architecture@airavata.apache.org
>>> Subject: Re: Experiment Summary retrieval
>>> 
>>> I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
>>> 
>>> In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
>>> 
>>> -borries
>>> 
>>> On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
>>>> Thanks for identifying the problem Sachith. To be precise 
>>>> potentially there are many subset variations of interested fields 
>>>> in the Experiment Data which different gateways will be interested 
>>>> in the same usecase or different usecase. We cannot scale by 
>>>> providing different functions for each of these scenarios.
>>>> 
>>>> 
>>>> On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Almost all gateways have a requirement of retrieving the 
>>>>> experiment summaries of all the experiments . The fields that 
>>>>> are required differ based on the gateway.
>>>>> 
>>>>> For Example:
>>>>> CIPRES requires : Experiment name and status Some gateways 
>>>>> requires the Experiment name with the inputs only.
>>>>> 
>>>>> But right now the getAllExperiments() method returns the list of 
>>>>> all the Experiments with all the experiment related attributes 
>>>>> filled ( the whole Experiment Model).
>>>>> 
>>>>> It's costly to get the whole Experiment objects from the API 
>>>>> rather than getting the required few attributes.
>>>>> 
>>>>> Any suggestions on how we could achieve this?
>>>>> 
>>>>> One suggestion would be to have a getAllExperiments method with 
>>>>> the parameters as a list of required fields of the Experiments 
>>>>> and return only those fields.
>>>>> 
>>>>> --
>>>>> Thanks,
>>>>> Sachith Withana
>>>>>

RE: Experiment Summary retrieval

Posted by "Miller, Mark" <mm...@sdsc.edu>.

Borries,

I haven't used this vnc thing before.
Do I just need the free client mentioned on this page?
http://www.realvnc.com/products/vnc/

Mark

-----Original Message-----
From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
Sent: Friday, March 14, 2014 11:48 AM
To: architecture@airavata.apache.org
Subject: Re: Experiment Summary retrieval

Marlon,
would it be possible to put me on the agenda for about 30 mins. during our next telecon?
Everyone should have a vnc viewer client on their computer to follow along.

Thanks, -borries

On Fri, Mar 14, 2014 at 06:42:46PM +0000, Miller, Mark wrote:
> I think that would be useful, personally. 
> 
> -----Original Message-----
> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> Sent: Friday, March 14, 2014 11:37 AM
> To: architecture@airavata.apache.org
> Subject: Re: Experiment Summary retrieval
> 
> 
> Mark, Lahiru and others:
> 
> If you guys aren't averse to yet another online conference, I would be happy to give an on-screen demo of our system. It is a bit difficult to explain per email how this rather complicated system is organized, but it may be worthwhile to get some insights of one possible solution to how this can be handled. I wouldn't say that the gateway is completely separated from the task of processing output data, but the task is abstracted to a very basic and uncomplicated level, and then the more complex issues are handled in a custom fashion by our software. This way we can on the one hand have robust communications and on the other fine-grained and customized data handling for the specific needs of the user. I would probably need about 20-30 mins of your time. Perhaps during the next SciGap meeting?
> It would be best if everyone had a vnc viewer on their desktop to follow along.
> 
> -Borries
> 
> 
> On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
> > Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
> > 
> > I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
> > 
> > The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
> > 
> > Mark
> > 
> > 
> > -----Original Message-----
> > From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> > Sent: Friday, March 14, 2014 9:12 AM
> > To: architecture@airavata.apache.org
> > Subject: Re: Experiment Summary retrieval
> > 
> > I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
> > 
> > In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
> > 
> > -borries
> > 
> > On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> > > Thanks for identifying the problem Sachith. To be precise 
> > > potentially there are many subset variations of interested fields 
> > > in the Experiment Data which different gateways will be interested 
> > > in the same usecase or different usecase. We cannot scale by 
> > > providing different functions for each of these scenarios.
> > > 
> > > 
> > > On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> > > 
> > > > Hi all,
> > > >
> > > > Almost all gateways have a requirement of retrieving the 
> > > > experiment summaries of all the experiments . The fields that 
> > > > are required differ based on the gateway.
> > > >
> > > > For Example:
> > > > CIPRES requires : Experiment name and status Some gateways 
> > > > requires the Experiment name with the inputs only.
> > > >
> > > > But right now the getAllExperiments() method returns the list of 
> > > > all the Experiments with all the experiment related attributes 
> > > > filled ( the whole Experiment Model).
> > > >
> > > > It's costly to get the whole Experiment objects from the API 
> > > > rather than getting the required few attributes.
> > > >
> > > > Any suggestions on how we could achieve this?
> > > >
> > > > One suggestion would be to have a getAllExperiments method with 
> > > > the parameters as a list of required fields of the Experiments 
> > > > and return only those fields.
> > > >
> > > > --
> > > > Thanks,
> > > > Sachith Withana
> > > >

Re: Experiment Summary retrieval

Posted by Borries Demeler <de...@biochem.uthscsa.edu>.

Marlon,
would it be possible to put me on the agenda for about 30 mins. during our next telecon?
Everyone should have a vnc viewer client on their computer to follow along.

Thanks, -borries

On Fri, Mar 14, 2014 at 06:42:46PM +0000, Miller, Mark wrote:
> I think that would be useful, personally. 
> 
> -----Original Message-----
> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
> Sent: Friday, March 14, 2014 11:37 AM
> To: architecture@airavata.apache.org
> Subject: Re: Experiment Summary retrieval
> 
> 
> Mark, Lahiru and others:
> 
> If you guys aren't averse to yet another online conference, I would be happy to give an on-screen demo of our system. It is a bit difficult to explain per email how this rather complicated system is organized, but it may be worthwhile to get some insights of one possible solution to how this can be handled. I wouldn't say that the gateway is completely separated from the task of processing output data, but the task is abstracted to a very basic and uncomplicated level, and then the more complex issues are handled in a custom fashion by our software. This way we can on the one hand have robust communications and on the other fine-grained and customized data handling for the specific needs of the user. I would probably need about 20-30 mins of your time. Perhaps during the next SciGap meeting?
> It would be best if everyone had a vnc viewer on their desktop to follow along.
> 
> -Borries
> 
> 
> On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
> > Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
> > 
> > I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
> > 
> > The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
> > 
> > Mark
> > 
> > 
> > -----Original Message-----
> > From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> > Sent: Friday, March 14, 2014 9:12 AM
> > To: architecture@airavata.apache.org
> > Subject: Re: Experiment Summary retrieval
> > 
> > I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
> > 
> > In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
> > 
> > -borries
> > 
> > On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> > > Thanks for identifying the problem Sachith. To be precise 
> > > potentially there are many subset variations of interested fields in 
> > > the Experiment Data which different gateways will be interested in 
> > > the same usecase or different usecase. We cannot scale by providing 
> > > different functions for each of these scenarios.
> > > 
> > > 
> > > On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> > > 
> > > > Hi all,
> > > >
> > > > Almost all gateways have a requirement of retrieving the 
> > > > experiment summaries of all the experiments . The fields that are 
> > > > required differ based on the gateway.
> > > >
> > > > For Example:
> > > > CIPRES requires : Experiment name and status Some gateways 
> > > > requires the Experiment name with the inputs only.
> > > >
> > > > But right now the getAllExperiments() method returns the list of 
> > > > all the Experiments with all the experiment related attributes 
> > > > filled ( the whole Experiment Model).
> > > >
> > > > It's costly to get the whole Experiment objects from the API 
> > > > rather than getting the required few attributes.
> > > >
> > > > Any suggestions on how we could achieve this?
> > > >
> > > > One suggestion would be to have a getAllExperiments method with 
> > > > the parameters as a list of required fields of the Experiments and 
> > > > return only those fields.
> > > >
> > > > --
> > > > Thanks,
> > > > Sachith Withana
> > > >

RE: Experiment Summary retrieval

Posted by "Miller, Mark" <mm...@sdsc.edu>.

I think that would be useful, personally. 

-----Original Message-----
From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
Sent: Friday, March 14, 2014 11:37 AM
To: architecture@airavata.apache.org
Subject: Re: Experiment Summary retrieval


Mark, Lahiru and others:

If you guys aren't averse to yet another online conference, I would be happy to give an on-screen demo of our system. It is a bit difficult to explain per email how this rather complicated system is organized, but it may be worthwhile to get some insights of one possible solution to how this can be handled. I wouldn't say that the gateway is completely separated from the task of processing output data, but the task is abstracted to a very basic and uncomplicated level, and then the more complex issues are handled in a custom fashion by our software. This way we can on the one hand have robust communications and on the other fine-grained and customized data handling for the specific needs of the user. I would probably need about 20-30 mins of your time. Perhaps during the next SciGap meeting?
It would be best if everyone had a vnc viewer on their desktop to follow along.

-Borries


On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
> Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
> 
> I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
> 
> The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
> 
> Mark
> 
> 
> -----Original Message-----
> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu]
> Sent: Friday, March 14, 2014 9:12 AM
> To: architecture@airavata.apache.org
> Subject: Re: Experiment Summary retrieval
> 
> I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
> 
> In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
> 
> -borries
> 
> On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> > Thanks for identifying the problem Sachith. To be precise 
> > potentially there are many subset variations of interested fields in 
> > the Experiment Data which different gateways will be interested in 
> > the same usecase or different usecase. We cannot scale by providing 
> > different functions for each of these scenarios.
> > 
> > 
> > On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> > 
> > > Hi all,
> > >
> > > Almost all gateways have a requirement of retrieving the 
> > > experiment summaries of all the experiments . The fields that are 
> > > required differ based on the gateway.
> > >
> > > For Example:
> > > CIPRES requires : Experiment name and status Some gateways 
> > > requires the Experiment name with the inputs only.
> > >
> > > But right now the getAllExperiments() method returns the list of 
> > > all the Experiments with all the experiment related attributes 
> > > filled ( the whole Experiment Model).
> > >
> > > It's costly to get the whole Experiment objects from the API 
> > > rather than getting the required few attributes.
> > >
> > > Any suggestions on how we could achieve this?
> > >
> > > One suggestion would be to have a getAllExperiments method with 
> > > the parameters as a list of required fields of the Experiments and 
> > > return only those fields.
> > >
> > > --
> > > Thanks,
> > > Sachith Withana
> > >

Re: Experiment Summary retrieval

Posted by Borries Demeler <de...@biochem.uthscsa.edu>.

Mark, Lahiru and others:

If you guys aren't averse to yet another online conference, I would be happy to
give an on-screen demo of our system. It is a bit difficult to explain per email
how this rather complicated system is organized, but it may be worthwhile to get
some insights of one possible solution to how this can be handled. I wouldn't say
that the gateway is completely separated from the task of processing output data,
but the task is abstracted to a very basic and uncomplicated level, and then the
more complex issues are handled in a custom fashion by our software. This way we
can on the one hand have robust communications and on the other fine-grained and
customized data handling for the specific needs of the user. I would probably need
about 20-30 mins of your time. Perhaps during the next SciGap meeting?
It would be best if everyone had a vnc viewer on their desktop to follow along.

-Borries


On Fri, Mar 14, 2014 at 04:49:15PM +0000, Miller, Mark wrote:
> Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.
> 
> I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.
> 
> The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.
> 
> Mark
> 
> 
> -----Original Message-----
> From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
> Sent: Friday, March 14, 2014 9:12 AM
> To: architecture@airavata.apache.org
> Subject: Re: Experiment Summary retrieval
> 
> I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.
> 
> In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.
> 
> -borries
> 
> On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> > Thanks for identifying the problem Sachith. To be precise potentially 
> > there are many subset variations of interested fields in the 
> > Experiment Data which different gateways will be interested in the 
> > same usecase or different usecase. We cannot scale by providing 
> > different functions for each of these scenarios.
> > 
> > 
> > On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> > 
> > > Hi all,
> > >
> > > Almost all gateways have a requirement of retrieving the experiment 
> > > summaries of all the experiments . The fields that are required  
> > > differ based on the gateway.
> > >
> > > For Example:
> > > CIPRES requires : Experiment name and status Some gateways requires 
> > > the Experiment name with the inputs only.
> > >
> > > But right now the getAllExperiments() method returns the list of all 
> > > the Experiments with all the experiment related attributes filled ( 
> > > the whole Experiment Model).
> > >
> > > It's costly to get the whole Experiment objects from the API rather 
> > > than getting the required few attributes.
> > >
> > > Any suggestions on how we could achieve this?
> > >
> > > One suggestion would be to have a getAllExperiments method with the 
> > > parameters as a list of required fields of the Experiments and 
> > > return only those fields.
> > >
> > > --
> > > Thanks,
> > > Sachith Withana
> > >

RE: Experiment Summary retrieval

Posted by "Miller, Mark" <mm...@sdsc.edu>.

Very interesting Borries. So if I read that right, delivery of results for your Gateway takes place by a completely separate system.

I think the RESTful access we provide will be similar in some ways.  We will be passing results directly to the user's application, which will be in effect like the UltraScan package, I think. In that case, what the user extracts from the data will be up to their application. Of course we will track the relevant usage parameters just as we do currently on our side.

The access to intermediate results for ongoing jobs is also critically important for us as well, and very much outside of the main Gateway software. But we don't currently support access to status of a list of running jobs. Although the idea of being able to provide users with a window into the status of all their tasks is certainly a nice feature, and we would likely consume it and present it to users if it was easy to access.

Mark

-----Original Message-----
From: Borries Demeler [mailto:demeler@biochem.uthscsa.edu] 
Sent: Friday, March 14, 2014 9:12 AM
To: architecture@airavata.apache.org
Subject: Re: Experiment Summary retrieval

I am not sure if this relates at all to the getAllExperiments() question, but I thought it may help to see how we deal with retrieving expt. data from the gateway in UltraScan.

In the case of UltraScan we have the HPC output very much separated from the information of the results that are useful to the user. All data are getting tar.gzipped into a single archive which is transported back out and automatically parsed into a relational database that is part of the UltraScan software, and not a function of the gateway. The user will then access the database with secondary software and create a type of meta-analysis and visualization of the results that ultimately are then useful to the user. Having said this, while data are being calculated on the HPC resource, the user can monitor the process of the calculation by reviewing a queue viewer that shows all pending and active jobs and their actual state. This state include a lot of information keeping the user apprised of the state. This information is continually being sent via UDP from the HPC resource and monitored by a daemon on our backend LIMS server, again, this is semi-independent from the gateway software. This may be very specialized for our use case, but it is one example of how the problem can be solved.

-borries

On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> Thanks for identifying the problem Sachith. To be precise potentially 
> there are many subset variations of interested fields in the 
> Experiment Data which different gateways will be interested in the 
> same usecase or different usecase. We cannot scale by providing 
> different functions for each of these scenarios.
> 
> 
> On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> 
> > Hi all,
> >
> > Almost all gateways have a requirement of retrieving the experiment 
> > summaries of all the experiments . The fields that are required  
> > differ based on the gateway.
> >
> > For Example:
> > CIPRES requires : Experiment name and status Some gateways requires 
> > the Experiment name with the inputs only.
> >
> > But right now the getAllExperiments() method returns the list of all 
> > the Experiments with all the experiment related attributes filled ( 
> > the whole Experiment Model).
> >
> > It's costly to get the whole Experiment objects from the API rather 
> > than getting the required few attributes.
> >
> > Any suggestions on how we could achieve this?
> >
> > One suggestion would be to have a getAllExperiments method with the 
> > parameters as a list of required fields of the Experiments and 
> > return only those fields.
> >
> > --
> > Thanks,
> > Sachith Withana
> >

Re: Experiment Summary retrieval

Posted by Borries Demeler <de...@biochem.uthscsa.edu>.

I am not sure if this relates at all to the getAllExperiments() question,
but I thought it may help to see how we deal with retrieving expt. data from
the gateway in UltraScan.

In the case of UltraScan we have the HPC output very much separated from the 
information of the results that are useful to the user. All data are getting
tar.gzipped into a single archive which is transported back out and automatically 
parsed into a relational database that is part of the UltraScan software, and not
a function of the gateway. The user will then access the database with secondary
software and create a type of meta-analysis and visualization of the results
that ultimately are then useful to the user. Having said this, while data are
being calculated on the HPC resource, the user can monitor the process of the
calculation by reviewing a queue viewer that shows all pending and active jobs
and their actual state. This state include a lot of information keeping the 
user apprised of the state. This information is continually being sent via UDP
from the HPC resource and monitored by a daemon on our backend LIMS server,
again, this is semi-independent from the gateway software. This may be very
specialized for our use case, but it is one example of how the problem can be 
solved.

-borries

On Fri, Mar 14, 2014 at 11:37:02AM -0400, Saminda Wijeratne wrote:
> Thanks for identifying the problem Sachith. To be precise potentially there
> are many subset variations of interested fields in the Experiment Data
> which different gateways will be interested in the same usecase or
> different usecase. We cannot scale by providing different functions for
> each of these scenarios.
> 
> 
> On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:
> 
> > Hi all,
> >
> > Almost all gateways have a requirement of retrieving the experiment
> > summaries of all the experiments . The fields that are required  differ
> > based on the gateway.
> >
> > For Example:
> > CIPRES requires : Experiment name and status
> > Some gateways requires the Experiment name with the inputs only.
> >
> > But right now the getAllExperiments() method returns the list of all the
> > Experiments with all the experiment related attributes filled ( the whole
> > Experiment Model).
> >
> > It's costly to get the whole Experiment objects from the API rather than
> > getting the required few attributes.
> >
> > Any suggestions on how we could achieve this?
> >
> > One suggestion would be to have a getAllExperiments method with the
> > parameters as a list of required fields of the Experiments and return only
> > those fields.
> >
> > --
> > Thanks,
> > Sachith Withana
> >

Re: Experiment Summary retrieval

Posted by Saminda Wijeratne <sa...@gmail.com>.

Thanks for identifying the problem Sachith. To be precise potentially there
are many subset variations of interested fields in the Experiment Data
which different gateways will be interested in the same usecase or
different usecase. We cannot scale by providing different functions for
each of these scenarios.

On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:

> Hi all,
>
> Almost all gateways have a requirement of retrieving the experiment
> summaries of all the experiments . The fields that are required  differ
> based on the gateway.
>
> For Example:
> CIPRES requires : Experiment name and status
> Some gateways requires the Experiment name with the inputs only.
>
> But right now the getAllExperiments() method returns the list of all the
> Experiments with all the experiment related attributes filled ( the whole
> Experiment Model).
>
> It's costly to get the whole Experiment objects from the API rather than
> getting the required few attributes.
>
> Any suggestions on how we could achieve this?
>
> One suggestion would be to have a getAllExperiments method with the
> parameters as a list of required fields of the Experiments and return only
> those fields.
>
> --
> Thanks,
> Sachith Withana
>

Re: Experiment Summary retrieval

Posted by Lahiru Gunathilake <gl...@gmail.com>.

On Fri, Mar 14, 2014 at 11:25 AM, Sachith Withana <sw...@gmail.com>wrote:

> Hi all,
>
> Almost all gateways have a requirement of retrieving the experiment
> summaries of all the experiments . The fields that are required  differ
> based on the gateway.
>
> For Example:
> CIPRES requires : Experiment name and status
> Some gateways requires the Experiment name with the inputs only.
>
> But right now the getAllExperiments() method returns the list of all the
> Experiments with all the experiment related attributes filled ( the whole
> Experiment Model).
>
> It's costly to get the whole Experiment objects from the API rather than
> getting the required few attributes.
>
> Any suggestions on how we could achieve this?
>
> One suggestion would be to have a getAllExperiments method with the
> parameters as a list of required fields of the Experiments and return only
> those fields.
>
+1, We should make the on-demand and unless number of experiments are very
low there's no way a gateway want to get all the experiments in one go. So
we should tell if experiment number is high api should allow to get few at
a time.

> --
> Thanks,
> Sachith Withana
>



-- 
System Analyst Programmer
PTI Lab
Indiana University