You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Saminda Wijeratne <sa...@gmail.com> on 2014/03/18 19:55:30 UTC

Retrieving Experiment Summaries

For performance issues a gateway should only request a subset of data of an
experiment from Airavata server to compile a summary view of the experiment
to the scientist. Based on my current experience I feel the following data
is required to compile a general summary.

  - Exp ID/Name
  - Status
  - Project
  - Owner/Creation time

We have seeing a direct relationship between the number of experiment data
records and the turnaround time. Thus we may need some paging when
requesting the experiment data.

wdyt? Your thoughts are welcome.

(Using JIRA [1] to track the status of this task)

A detailed discussion on the topic is on the Architecture mailing list [2].

Regards,
Saminda

1. https://issues.apache.org/jira/browse/AIRAVATA-995
2.
http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html

Re: Retrieving Experiment Summaries

Posted by Sachith Withana <sw...@gmail.com>.
Saminda,

This a really good implementation. As you said, this can be a very powerful extension of the API.

shouldn’t we also provide a default experiment summary object ( if no fields are set) containing only the Experiment name, status, submit date and the last status update ? Because these would be what most gateways would require essentially. 


On Mar 20, 2014, at 8:51 PM, Saminda Wijeratne <sa...@gmail.com> wrote:

> Emre, that looks exactly like what we want. Thanks for pointing it out. We were targeting for a simple version of "query" for 0.12, but after looking at mongodb docs I can see that if we plan ahead this can be a very powerful extension to the API with regards to data retrieval in future. 
> 
> Terri, you have a very valid point there. Right now its an implied "and". We are trying to keep the data model for SearchQuery as simple as possible. My current mode of thinking is hirarchical filter fields such as,
> 
> {Experiment.Project, "manhattan"}      
> =>  Experiment.Project=="manhattan"
> 
> {{Experiment.Project, "manhattan"}, {Experiment.Created, "03-19-2014",">"}}     
>  =>  Experiment.Project=="manhattan" && Experiment.Created >03-19-2014
> 
> {{Experiment.Project, "manhattan"}, {{Experiment.Status, "done","OR"},{Experiment.Status, "failed"}}, {Experiment.Created, "03-19-2014",">"}}      
> =>  Experiment.Project=="manhattan" && (Experiment.Status=="done" || Experiment.Status=="failed") && Experiment.Created >03-19-2014
> 
> Loosely defined, following could be the grammar of a filter criteria,
> 
> <FIELD_CRITERIA> = {<FIELD_CRITERIA>[,<FIELD_CRITERIA>*]}
> <FIELD_CRITERIA> = {<Field_Name> , <Field_Value> [,Field_Comparison][,Logical_Operator_With_Next_Criteria]}
> 
> wdyt?
> 
> Saminda
> 
> 
> 
> On Thu, Mar 20, 2014 at 8:27 PM, Schwartz, Terri <te...@sdsc.edu> wrote:
> Hi Saminda,
> 
> I'm working on a similar issue in the cipres rest api.  I'm curious about the syntax for filter criteria and how general purpose it will be.  Is there an implied 'and' between the individual criterion?  Can more elaborate boolean expressions be used?  
> 
> Terri
> From: Saminda Wijeratne [samindaw@gmail.com]
> Sent: Thursday, March 20, 2014 2:40 PM
> To: dev
> Subject: Re: Retrieving Experiment Summaries
> 
> In an offline discussion with Chathuri, we came up with a simple way for gateway developers to specify retrieving a filtered set of experiment data based on the requirements of the gateway user.
> 
> eg:
> 
> SearchQuery query = 
> new SearchQuery({Experiment.Name, Experiment.Status}, {{Experiment.Owner,"bob"},{Experiment.Project,"manhattan"}{Experiment.Created,"03-19-2014",">"})
> List<Experiment> experiments = thriftAPI.getExperiments(query);
> 
> Sample syntax
> sq = new SearchQuery(<list of fields that needs to be filled>, <list of filter criteria for the data>)
> 
> Further more the SearchQuery will have the capability to specify paging (eg; experiments from 11 to 20). 
> 
> wdyt?
> 
> Saminda
> 
> 
> 
> On Tue, Mar 18, 2014 at 3:04 PM, Lahiru Gunathilake <gl...@gmail.com> wrote:
>  On Tue, Mar 18, 2014 at 2:55 PM, Saminda Wijeratne <sa...@gmail.com> wrote:
> For performance issues a gateway should only request a subset of data of an experiment from Airavata server to compile a summary view of the experiment to the scientist. Based on my current experience I feel the following data is required to compile a general summary.
> 
>   - Exp ID/Name
>   - Status
>   - Project
>   - Owner/Creation time
> +1, We can show minimum data and give detailed view on-demand. But I think we need to support experiment search based on some criteria and develop an index for each search criteria, because if I ran jobs for 6 months and I would never want to get all my experiments, even thought we make it super fast will minimum data.
> 
> ex: I want to search the experiments I ran last week, or with some text base search. 
> 
> We can use the above solution Saminda suggested in searching too.
> 
> Lahiru
> 
> We have seeing a direct relationship between the number of experiment data records and the turnaround time. Thus we may need some paging when requesting the experiment data.
> 
> wdyt? Your thoughts are welcome.
> 
> (Using JIRA [1] to track the status of this task)
> 
> A detailed discussion on the topic is on the Architecture mailing list [2].
> 
> Regards,
> Saminda
> 
> 1. https://issues.apache.org/jira/browse/AIRAVATA-995
> 2. http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html
> 
> 
> 
> -- 
> System Analyst Programmer
> PTI Lab
> Indiana University
> 
> 


Re: Retrieving Experiment Summaries

Posted by Saminda Wijeratne <sa...@gmail.com>.
Emre, that looks exactly like what we want. Thanks for pointing it out. We
were targeting for a simple version of "query" for 0.12, but after looking
at mongodb docs I can see that if we plan ahead this can be a very powerful
extension to the API with regards to data retrieval in future.

Terri, you have a very valid point there. Right now its an implied "and".
We are trying to keep the data model for SearchQuery as simple as possible.
My current mode of thinking is hirarchical filter fields such as,

{Experiment.Project, "manhattan"}
=>  Experiment.Project=="manhattan"

{{Experiment.Project, "manhattan"}, {Experiment.Created, "03-19-2014",">"}}

 =>  Experiment.Project=="manhattan" && Experiment.Created >03-19-2014

{{Experiment.Project, "manhattan"}, {{Experiment.Status,
"done","OR"},{Experiment.Status, "failed"}}, {Experiment.Created,
"03-19-2014",">"}}
=>  Experiment.Project=="manhattan" && (Experiment.Status=="done" ||
Experiment.Status=="failed") && Experiment.Created >03-19-2014

Loosely defined, following could be the grammar of a filter criteria,

<FIELD_CRITERIA> = {<FIELD_CRITERIA>[,<FIELD_CRITERIA>*]}
<FIELD_CRITERIA> = {<Field_Name> , <Field_Value> [,Field_Comparison]
[,Logical_Operator_With_Next_Criteria]}

wdyt?

Saminda



On Thu, Mar 20, 2014 at 8:27 PM, Schwartz, Terri <te...@sdsc.edu> wrote:

>  Hi Saminda,
>
> I'm working on a similar issue in the cipres rest api.  I'm curious about
> the syntax for filter criteria and how general purpose it will be.  Is
> there an implied 'and' between the individual criterion?  Can more
> elaborate boolean expressions be used?
>
> Terri
>  ------------------------------
> *From:* Saminda Wijeratne [samindaw@gmail.com]
> *Sent:* Thursday, March 20, 2014 2:40 PM
> *To:* dev
> *Subject:* Re: Retrieving Experiment Summaries
>
>     In an offline discussion with Chathuri, we came up with a simple way
> for gateway developers to specify retrieving a filtered set of experiment
> data based on the requirements of the gateway user.
>
>  eg:
>
>  SearchQuery query =
>  new SearchQuery({Experiment.Name, Experiment.Status},
> {{Experiment.Owner,"bob"},{Experiment.Project,"manhattan"}{Experiment.Created,"03-19-2014",">"})
>   List<Experiment> experiments = thriftAPI.getExperiments(query);
>
> *Sample syntax*
> sq = new SearchQuery(<list of fields that needs to be filled>, <list of
> filter criteria for the data>)
>
>  Further more the SearchQuery will have the capability to specify paging
> (eg; experiments from 11 to 20).
>
> wdyt?
>
>  Saminda
>
>
>
> On Tue, Mar 18, 2014 at 3:04 PM, Lahiru Gunathilake <gl...@gmail.com>wrote:
>
>>   On Tue, Mar 18, 2014 at 2:55 PM, Saminda Wijeratne <sa...@gmail.com>wrote:
>>
>>>  For performance issues a gateway should only request a subset of data
>>> of an experiment from Airavata server to compile a summary view of the
>>> experiment to the scientist. Based on my current experience I feel the
>>> following data is required to compile a general summary.
>>>
>>>    - Exp ID/Name
>>>   - Status
>>>   - Project
>>>   - Owner/Creation time
>>>
>>  +1, We can show minimum data and give detailed view on-demand. But I
>> think we need to support experiment search based on some criteria and
>> develop an index for each search criteria, because if I ran jobs for 6
>> months and I would never want to get all my experiments, even thought we
>> make it super fast will minimum data.
>>
>>  ex: I want to search the experiments I ran last week, or with some text
>> base search.
>>
>>  We can use the above solution Saminda suggested in searching too.
>>
>>  Lahiru
>>
>>>
>>> We have seeing a direct relationship between the number of experiment
>>> data records and the turnaround time. Thus we may need some paging when
>>> requesting the experiment data.
>>>
>>>  wdyt? Your thoughts are welcome.
>>>
>>>  (Using JIRA [1] to track the status of this task)
>>>
>>>  A detailed discussion on the topic is on the Architecture mailing list
>>> [2].
>>>
>>>  Regards,
>>> Saminda
>>>
>>> 1. https://issues.apache.org/jira/browse/AIRAVATA-995
>>> 2.
>>> http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html
>>>
>>
>>
>>
>>  --
>> System Analyst Programmer
>> PTI Lab
>> Indiana University
>>
>
>

RE: Retrieving Experiment Summaries

Posted by "Schwartz, Terri" <te...@sdsc.edu>.
Hi Saminda,

I'm working on a similar issue in the cipres rest api.  I'm curious about the syntax for filter criteria and how general purpose it will be.  Is there an implied 'and' between the individual criterion?  Can more elaborate boolean expressions be used?

Terri
________________________________
From: Saminda Wijeratne [samindaw@gmail.com]
Sent: Thursday, March 20, 2014 2:40 PM
To: dev
Subject: Re: Retrieving Experiment Summaries

In an offline discussion with Chathuri, we came up with a simple way for gateway developers to specify retrieving a filtered set of experiment data based on the requirements of the gateway user.

eg:

SearchQuery query =
new SearchQuery({Experiment.Name, Experiment.Status}, {{Experiment.Owner,"bob"},{Experiment.Project,"manhattan"}{Experiment.Created,"03-19-2014",">"})
List<Experiment> experiments = thriftAPI.getExperiments(query);

Sample syntax
sq = new SearchQuery(<list of fields that needs to be filled>, <list of filter criteria for the data>)

Further more the SearchQuery will have the capability to specify paging (eg; experiments from 11 to 20).

wdyt?

Saminda



On Tue, Mar 18, 2014 at 3:04 PM, Lahiru Gunathilake <gl...@gmail.com>> wrote:
 On Tue, Mar 18, 2014 at 2:55 PM, Saminda Wijeratne <sa...@gmail.com>> wrote:
For performance issues a gateway should only request a subset of data of an experiment from Airavata server to compile a summary view of the experiment to the scientist. Based on my current experience I feel the following data is required to compile a general summary.

  - Exp ID/Name
  - Status
  - Project
  - Owner/Creation time
+1, We can show minimum data and give detailed view on-demand. But I think we need to support experiment search based on some criteria and develop an index for each search criteria, because if I ran jobs for 6 months and I would never want to get all my experiments, even thought we make it super fast will minimum data.

ex: I want to search the experiments I ran last week, or with some text base search.

We can use the above solution Saminda suggested in searching too.

Lahiru

We have seeing a direct relationship between the number of experiment data records and the turnaround time. Thus we may need some paging when requesting the experiment data.

wdyt? Your thoughts are welcome.

(Using JIRA [1] to track the status of this task)

A detailed discussion on the topic is on the Architecture mailing list [2].

Regards,
Saminda

1. https://issues.apache.org/jira/browse/AIRAVATA-995
2. http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html



--
System Analyst Programmer
PTI Lab
Indiana University


Re: Retrieving Experiment Summaries

Posted by Emre Brookes <em...@biochem.uthscsa.edu>.
You might want to have a look at mongodb's methods of find()ing 
distinct() and aggregate() data.
Similar, but different, layout and it provides some nice json such as { 
$in : { Experiment.Owner : ["bob","alice", ... ] }
and $gt : { Experiment.Runtime : 100 } etc so that logical operators 
could be incorporated.
Just wanted to point this out as a possible defined interface
.. it may or may not be worth the effort to provide some subset of these 
features.

-E.


Saminda Wijeratne wrote:
> In an offline discussion with Chathuri, we came up with a simple way 
> for gateway developers to specify retrieving a filtered set of 
> experiment data based on the requirements of the gateway user.
>
> eg:
>
> SearchQuery query =
> new SearchQuery({Experiment.Name, Experiment.Status}, 
> {{Experiment.Owner,"bob"},{Experiment.Project,"manhattan"}{Experiment.Created,"03-19-2014",">"})
> List<Experiment> experiments = thriftAPI.getExperiments(query);
>
> /Sample syntax/
> sq = new SearchQuery(<list of fields that needs to be filled>, <list 
> of filter criteria for the data>)
>
> Further more the SearchQuery will have the capability to specify 
> paging (eg; experiments from 11 to 20).
>
> wdyt?
>
> Saminda
>
>
>
> On Tue, Mar 18, 2014 at 3:04 PM, Lahiru Gunathilake <glahiru@gmail.com 
> <ma...@gmail.com>> wrote:
>
>      On Tue, Mar 18, 2014 at 2:55 PM, Saminda Wijeratne
>     <samindaw@gmail.com <ma...@gmail.com>> wrote:
>
>         For performance issues a gateway should only request a subset
>         of data of an experiment from Airavata server to compile a
>         summary view of the experiment to the scientist. Based on my
>         current experience I feel the following data is required to
>         compile a general summary.
>
>           - Exp ID/Name
>           - Status
>           - Project
>           - Owner/Creation time
>
>     +1, We can show minimum data and give detailed view on-demand. But
>     I think we need to support experiment search based on some
>     criteria and develop an index for each search criteria, because if
>     I ran jobs for 6 months and I would never want to get all my
>     experiments, even thought we make it super fast will minimum data.
>
>     ex: I want to search the experiments I ran last week, or with some
>     text base search.
>
>     We can use the above solution Saminda suggested in searching too.
>
>     Lahiru
>
>
>         We have seeing a direct relationship between the number of
>         experiment data records and the turnaround time. Thus we may
>         need some paging when requesting the experiment data.
>
>         wdyt? Your thoughts are welcome.
>
>         (Using JIRA [1] to track the status of this task)
>
>         A detailed discussion on the topic is on the Architecture
>         mailing list [2].
>
>         Regards,
>         Saminda
>
>         1. https://issues.apache.org/jira/browse/AIRAVATA-995
>         2.
>         http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html
>
>
>
>
>     -- 
>     System Analyst Programmer
>     PTI Lab
>     Indiana University
>
>


Re: Retrieving Experiment Summaries

Posted by Saminda Wijeratne <sa...@gmail.com>.
In an offline discussion with Chathuri, we came up with a simple way for
gateway developers to specify retrieving a filtered set of experiment data
based on the requirements of the gateway user.

eg:

SearchQuery query =
new SearchQuery({Experiment.Name, Experiment.Status},
{{Experiment.Owner,"bob"},{Experiment.Project,"manhattan"}{Experiment.Created,"03-19-2014",">"})
List<Experiment> experiments = thriftAPI.getExperiments(query);

*Sample syntax*
sq = new SearchQuery(<list of fields that needs to be filled>, <list of
filter criteria for the data>)

Further more the SearchQuery will have the capability to specify paging
(eg; experiments from 11 to 20).

wdyt?

Saminda



On Tue, Mar 18, 2014 at 3:04 PM, Lahiru Gunathilake <gl...@gmail.com>wrote:

>  On Tue, Mar 18, 2014 at 2:55 PM, Saminda Wijeratne <sa...@gmail.com>wrote:
>
>> For performance issues a gateway should only request a subset of data of
>> an experiment from Airavata server to compile a summary view of the
>> experiment to the scientist. Based on my current experience I feel the
>> following data is required to compile a general summary.
>>
>>   - Exp ID/Name
>>   - Status
>>   - Project
>>   - Owner/Creation time
>>
> +1, We can show minimum data and give detailed view on-demand. But I think
> we need to support experiment search based on some criteria and develop an
> index for each search criteria, because if I ran jobs for 6 months and I
> would never want to get all my experiments, even thought we make it super
> fast will minimum data.
>
> ex: I want to search the experiments I ran last week, or with some text
> base search.
>
> We can use the above solution Saminda suggested in searching too.
>
> Lahiru
>
>>
>> We have seeing a direct relationship between the number of experiment
>> data records and the turnaround time. Thus we may need some paging when
>> requesting the experiment data.
>>
>> wdyt? Your thoughts are welcome.
>>
>> (Using JIRA [1] to track the status of this task)
>>
>> A detailed discussion on the topic is on the Architecture mailing list
>> [2].
>>
>> Regards,
>> Saminda
>>
>> 1. https://issues.apache.org/jira/browse/AIRAVATA-995
>> 2.
>> http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html
>>
>
>
>
> --
> System Analyst Programmer
> PTI Lab
> Indiana University
>

Re: Retrieving Experiment Summaries

Posted by Lahiru Gunathilake <gl...@gmail.com>.
 On Tue, Mar 18, 2014 at 2:55 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> For performance issues a gateway should only request a subset of data of
> an experiment from Airavata server to compile a summary view of the
> experiment to the scientist. Based on my current experience I feel the
> following data is required to compile a general summary.
>
>   - Exp ID/Name
>   - Status
>   - Project
>   - Owner/Creation time
>
+1, We can show minimum data and give detailed view on-demand. But I think
we need to support experiment search based on some criteria and develop an
index for each search criteria, because if I ran jobs for 6 months and I
would never want to get all my experiments, even thought we make it super
fast will minimum data.

ex: I want to search the experiments I ran last week, or with some text
base search.

We can use the above solution Saminda suggested in searching too.

Lahiru

>
> We have seeing a direct relationship between the number of experiment data
> records and the turnaround time. Thus we may need some paging when
> requesting the experiment data.
>
> wdyt? Your thoughts are welcome.
>
> (Using JIRA [1] to track the status of this task)
>
> A detailed discussion on the topic is on the Architecture mailing list [2].
>
> Regards,
> Saminda
>
> 1. https://issues.apache.org/jira/browse/AIRAVATA-995
> 2.
> http://www.mail-archive.com/architecture@airavata.apache.org/msg00080.html
>



-- 
System Analyst Programmer
PTI Lab
Indiana University