You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by Saminda Wijeratne <sa...@gmail.com> on 2013/05/21 17:04:24 UTC

Persisting GFac job data

It has being apparent more and more that saving the data related to
executing a jobs from the GFac can be useful for many reasons such as,

debugging
retrying
to make smart decisions on reliability/cost etc.
statistical analysis

Thus we thought of saving the data related to GFac jobs in the registry in
order to facilitate feature such as above in the future.

However a GFac job is potentially any sort of computing resource access
(GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a generalized
data structure that can hold the data of any type of resource. Following
are the suggested data to save for a single GFac job execution,

*experiment id, workflow instance id, node id* - pinpoint the node
execution
*service, host, application description ids *- pinpoint the descriptors
responsible
*local job id* - the unique job id retrieved/generated per execution
[PRIMARY KEY]
*job data* - data related executing the job (eg: the rsl in GRAM)
*submitted, completed time*
*completed status* - whether the job was successfull or ran in to errors
etc.
*metadata* - custom field to add anything user wants

Your feedback is most welcome. The API related changes will also be
discussed once we have a proper data structure. We are hoping to implement
this within next few days.

Thanks,
Saminda

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

+1. Can we use the registry migration tool to copy the data from the
Gram_data table to GFac_job_data table? we can keep null to the new fields.
and then delete the gram_data table? I can deprecate the functions related
to gram table in the registry API and change the implementation of it to
point to gfac_job_data table.


On Tue, May 21, 2013 at 11:14 AM, Chathuri Wimalasena
<ka...@gmail.com>wrote:

> If we are going to get rid of the Gram_Data table later, we should find
> possible ways to migrate data from that table to new table (GFac_Job_Data).
> Since Gram_Data table does not have all the details that are specified in
> the new table, there is no way we can retrieve  submitted and completed
> time related information.
>
> Regards,
> Chathuri
>
>
> On Tue, May 21, 2013 at 11:04 AM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > It has being apparent more and more that saving the data related to
> > executing a jobs from the GFac can be useful for many reasons such as,
> >
> > debugging
> > retrying
> > to make smart decisions on reliability/cost etc.
> > statistical analysis
> >
> > Thus we thought of saving the data related to GFac jobs in the registry
> in
> > order to facilitate feature such as above in the future.
> >
> > However a GFac job is potentially any sort of computing resource access
> > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a generalized
> > data structure that can hold the data of any type of resource. Following
> > are the suggested data to save for a single GFac job execution,
> >
> > *experiment id, workflow instance id, node id* - pinpoint the node
> > execution
> > *service, host, application description ids *- pinpoint the descriptors
> > responsible
> > *local job id* - the unique job id retrieved/generated per execution
> > [PRIMARY KEY]
> > *job data* - data related executing the job (eg: the rsl in GRAM)
> > *submitted, completed time*
> > *completed status* - whether the job was successfull or ran in to errors
> > etc.
> > *metadata* - custom field to add anything user wants
> >
> > Your feedback is most welcome. The API related changes will also be
> > discussed once we have a proper data structure. We are hoping to
> implement
> > this within next few days.
> >
> > Thanks,
> > Saminda
> >
>

Re: Persisting GFac job data

Posted by Chathuri Wimalasena <ka...@gmail.com>.

If we are going to get rid of the Gram_Data table later, we should find
possible ways to migrate data from that table to new table (GFac_Job_Data).
Since Gram_Data table does not have all the details that are specified in
the new table, there is no way we can retrieve  submitted and completed
time related information.

Regards,
Chathuri


On Tue, May 21, 2013 at 11:04 AM, Saminda Wijeratne <sa...@gmail.com>wrote:

> It has being apparent more and more that saving the data related to
> executing a jobs from the GFac can be useful for many reasons such as,
>
> debugging
> retrying
> to make smart decisions on reliability/cost etc.
> statistical analysis
>
> Thus we thought of saving the data related to GFac jobs in the registry in
> order to facilitate feature such as above in the future.
>
> However a GFac job is potentially any sort of computing resource access
> (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a generalized
> data structure that can hold the data of any type of resource. Following
> are the suggested data to save for a single GFac job execution,
>
> *experiment id, workflow instance id, node id* - pinpoint the node
> execution
> *service, host, application description ids *- pinpoint the descriptors
> responsible
> *local job id* - the unique job id retrieved/generated per execution
> [PRIMARY KEY]
> *job data* - data related executing the job (eg: the rsl in GRAM)
> *submitted, completed time*
> *completed status* - whether the job was successfull or ran in to errors
> etc.
> *metadata* - custom field to add anything user wants
>
> Your feedback is most welcome. The API related changes will also be
> discussed once we have a proper data structure. We are hoping to implement
> this within next few days.
>
> Thanks,
> Saminda
>

Re: Persisting GFac job data

Posted by Danushka Menikkumbura <da...@gmail.com>.

Ideally what we are trying to achieve here is similar to what log4j does.
In essence, any entity should be able to define its persistence agent
object and call its methods (add, get, modify, delete) as it wishes. Thus,
it could be WFI, GFac or any individual entity.

In OSGi world for an instance, it would be a service where any entity could
subscribe to and make use of. Even without being in an OSGi environment,
implementing a service layer is not that hard.

Thanks,
Danushka


On Wed, May 22, 2013 at 2:17 AM, Amila Jayasekara
<th...@gmail.com>wrote:

> I think that should be handled at a more upper layer like Workflow
> Interpretter or GFac. In FT perspective it is better if providers are
> stateless. One reason is we dont have control over some providers and and
> there will be many places writing to disk if we implement the persistence
> logic at provider level.
>
> Thanks
> Amila
>
>
> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> > <th...@gmail.com>wrote:
> >
> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <samindaw@gmail.com
> > > >wrote:
> > >
> > > > Thanks for the feedback Amila. a few comments inline
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > > > <th...@gmail.com>wrote:
> > > >
> > > > > Hi Saminda,
> > > > >
> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > > > > serialize/de-serilized data.
> > > > > Few suggestions,
> > > > > 1. In addition to successful/error statuses we need other status
> for
> > > > nodes
> > > > > & workflows
> > > > > and workflows.
> > > > > E . g :-
> > > > >    node - started, submitted, in-progress, failed, successful etc
> ...
> > > > >
> > > > Sorry if I was too vague. Yes we have more fine-grain statuses for
> > > workflow
> > > > and node[1]. We will have a much fine-grained level of granuality
> for a
> > > > GFacJob status.
> > > >     public static enum GFacJobStatus{
> > > >         SUBMITTED, //job is submitted, possibly waiting to start
> > > executing
> > > >         EXECUTING, //submitted job is being executed
> > > >         CANCELLED, //job was cancelled
> > > >         PAUSED, //job was paused
> > > >         WAITING_FOR_DATA, // job is waiting for data to continue
> > > executing
> > > >         FAILED, // error occurred while job was executing and the job
> > > > stopped
> > > >         FINISHED, // job completed successfully
> > > >         UNKNOWN // unknown status. lookup the metadata for more
> > details.
> > > >     }
> > > >
> > > >
> > > > 2. This data will be useful in implementing FT and Load Balancing in
> > each
> > > > > component. Sometime back we had discussions to make GFac stateless.
> > So
> > > > who
> > > > > is going to populate this data structure and persist it ?
> > > > >
> > > > That is a very good question... :). This summer is going to be a long
> > > > one... ;)
> > > >
> > >
> > > What I meant is which component is doing persistence ? (GFac or WF
> > > Interpretter). Not the actual person who is going to implement it :).
> > >
> > hih hih....
> > Well its going to be whatever the provider respondible for managing the
> job
> > lifecycle. For example GRAMProvider should be responsible for recording
> all
> > the data relating to the GRAM jobs its working with.
> >
> > >
> > >
> > > >
> > > > 1.
> > > >
> > > >
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> > > >
> > > > >
> > > > > Thanks
> > > > > Amila
> > > > >
> > > > >
> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> > > samindaw@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Thats is an excellent idea. We can have the job data field to be
> > the
> > > > > > designated GFac job serialized data. The whatever GFacProvider
> > should
> > > > > > adhere to it.
> > > > > >
> > > > > > I'm still inclined to have the rest of the fields to ease of
> > querying
> > > > for
> > > > > > the required data. For example if we wanted all attempts on
> > executing
> > > > > for a
> > > > > > particular node of a workflow or if we wanted to know which
> > > application
> > > > > > descriptions are faster in execution or more reliable etc. we can
> > let
> > > > the
> > > > > > query language deal with it. wdyt?
> > > > > >
> > > > > >
> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > > > > > danushka.menikkumbura@gmail.com> wrote:
> > > > > >
> > > > > > > Saminda,
> > > > > > >
> > > > > > > I think the data container does not need to have a generic
> > format.
> > > We
> > > > > can
> > > > > > > have a base class that facilitate object
> > > > serialization/deserialization
> > > > > > and
> > > > > > > let specific meta data structure implement them as required. We
> > get
> > > > the
> > > > > > > Registry API to serialize objects and save them in a meta data
> > > table
> > > > > > (with
> > > > > > > just two columns?) and to deserialize as they are loaded off
> the
> > > > > > registry.
> > > > > > >
> > > > > > > Danushka
> > > > > > >
> > > > > > >
> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > > > samindaw@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > It has being apparent more and more that saving the data
> > related
> > > to
> > > > > > > > executing a jobs from the GFac can be useful for many reasons
> > > such
> > > > > as,
> > > > > > > >
> > > > > > > > debugging
> > > > > > > > retrying
> > > > > > > > to make smart decisions on reliability/cost etc.
> > > > > > > > statistical analysis
> > > > > > > >
> > > > > > > > Thus we thought of saving the data related to GFac jobs in
> the
> > > > > registry
> > > > > > > in
> > > > > > > > order to facilitate feature such as above in the future.
> > > > > > > >
> > > > > > > > However a GFac job is potentially any sort of computing
> > resource
> > > > > access
> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> > > > > > generalized
> > > > > > > > data structure that can hold the data of any type of
> resource.
> > > > > > Following
> > > > > > > > are the suggested data to save for a single GFac job
> execution,
> > > > > > > >
> > > > > > > > *experiment id, workflow instance id, node id* - pinpoint the
> > > node
> > > > > > > > execution
> > > > > > > > *service, host, application description ids *- pinpoint the
> > > > > descriptors
> > > > > > > > responsible
> > > > > > > > *local job id* - the unique job id retrieved/generated per
> > > > execution
> > > > > > > > [PRIMARY KEY]
> > > > > > > > *job data* - data related executing the job (eg: the rsl in
> > GRAM)
> > > > > > > > *submitted, completed time*
> > > > > > > > *completed status* - whether the job was successfull or ran
> in
> > to
> > > > > > errors
> > > > > > > > etc.
> > > > > > > > *metadata* - custom field to add anything user wants
> > > > > > > >
> > > > > > > > Your feedback is most welcome. The API related changes will
> > also
> > > be
> > > > > > > > discussed once we have a proper data structure. We are hoping
> > to
> > > > > > > implement
> > > > > > > > this within next few days.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Saminda
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

On Wed, May 22, 2013 at 10:11 AM, Raminder Singh
<ra...@gmail.com>wrote:

> These look good to me.  Can you please explain usage of
> getGFacJobsFromDescriptors method?  How is this different from getting the
> descriptors from registry? and who should register this data?
>
Seems the function name is confusing.
This is to filter all the gfac jobs saved in the registry using the
descriptors they correspond to. For example if we want to decide whats the
fastest result computational resource to execute for Gaussian Service
descriptor, we can make a calculated decision from looking at the past
records by filtering all the gfac jobs that corresponds to this service
descriptor and looking at the duration of execution to see which gfac job
took least time to complete. As you can see its not a common usecase for
everyday user.

>
> Also a typo in updateGFacJobMetadta method name.
>
oops... Thanks Raminder.

>
> Thanks
> Raminder
>
> On May 21, 2013, at 11:28 PM, Saminda Wijeratne wrote:
>
> > Following API functions are added for the ProvenanceManager[2],
> >
> > boolean isGFacJobExists(String gfacJobId)
> > void addGFacJob(GFacJob job)
> > void updateGFacJob(GFacJob job)
> > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
> > void updateGFacJobData(String gfacJobId, String jobdata)
> > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
> > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
> > void updateGFacJobMetadta(String gfacJobId, String metadata)
> > GFacJob getGFacJob(String gfacJobId)
> > List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId,
> String
> > hostDescriptionId, String applicationDescriptionId)
> > List<GFacJob> getGFacJobs(String experimentId, String
> workflowExecutionId,
> > String nodeId)
> >
> > Thoughts are welcome!!!
> >
> >
> > 2.
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
> >
> >
> > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
> >
> >> But I thought the providers are part of the GFac (not as a separate
> >> service). If not then the providers should report to GFac. Orelse there
> is
> >> no way the GFac knows what status to update which data to update etc.
> Does
> >> the current GFac implementation support this?
> >>
> >>
> >> On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
> thejaka.amila@gmail.com
> >>> wrote:
> >>
> >>> I think that should be handled at a more upper layer like Workflow
> >>> Interpretter or GFac. In FT perspective it is better if providers are
> >>> stateless. One reason is we dont have control over some providers and
> and
> >>> there will be many places writing to disk if we implement the
> persistence
> >>> logic at provider level.
> >>>
> >>> Thanks
> >>> Amila
> >>>
> >>>
> >>> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <samindaw@gmail.com
> >>>> wrote:
> >>>
> >>>> On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> >>>> <th...@gmail.com>wrote:
> >>>>
> >>>>> On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
> >>> samindaw@gmail.com
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks for the feedback Amila. a few comments inline
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> >>>>>> <th...@gmail.com>wrote:
> >>>>>>
> >>>>>>> Hi Saminda,
> >>>>>>>
> >>>>>>> Great suggestion. Also +1 for Dhanushka's proposal to have
> >>>>>>> serialize/de-serilized data.
> >>>>>>> Few suggestions,
> >>>>>>> 1. In addition to successful/error statuses we need other status
> >>> for
> >>>>>> nodes
> >>>>>>> & workflows
> >>>>>>> and workflows.
> >>>>>>> E . g :-
> >>>>>>>   node - started, submitted, in-progress, failed, successful etc
> >>> ...
> >>>>>>>
> >>>>>> Sorry if I was too vague. Yes we have more fine-grain statuses for
> >>>>> workflow
> >>>>>> and node[1]. We will have a much fine-grained level of granuality
> >>> for a
> >>>>>> GFacJob status.
> >>>>>>    public static enum GFacJobStatus{
> >>>>>>        SUBMITTED, //job is submitted, possibly waiting to start
> >>>>> executing
> >>>>>>        EXECUTING, //submitted job is being executed
> >>>>>>        CANCELLED, //job was cancelled
> >>>>>>        PAUSED, //job was paused
> >>>>>>        WAITING_FOR_DATA, // job is waiting for data to continue
> >>>>> executing
> >>>>>>        FAILED, // error occurred while job was executing and the
> >>> job
> >>>>>> stopped
> >>>>>>        FINISHED, // job completed successfully
> >>>>>>        UNKNOWN // unknown status. lookup the metadata for more
> >>>> details.
> >>>>>>    }
> >>>>>>
> >>>>>>
> >>>>>> 2. This data will be useful in implementing FT and Load Balancing in
> >>>> each
> >>>>>>> component. Sometime back we had discussions to make GFac
> >>> stateless.
> >>>> So
> >>>>>> who
> >>>>>>> is going to populate this data structure and persist it ?
> >>>>>>>
> >>>>>> That is a very good question... :). This summer is going to be a
> >>> long
> >>>>>> one... ;)
> >>>>>>
> >>>>>
> >>>>> What I meant is which component is doing persistence ? (GFac or WF
> >>>>> Interpretter). Not the actual person who is going to implement it :).
> >>>>>
> >>>> hih hih....
> >>>> Well its going to be whatever the provider respondible for managing
> the
> >>> job
> >>>> lifecycle. For example GRAMProvider should be responsible for
> recording
> >>> all
> >>>> the data relating to the GRAM jobs its working with.
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> 1.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Amila
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> >>>>> samindaw@gmail.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thats is an excellent idea. We can have the job data field to be
> >>>> the
> >>>>>>>> designated GFac job serialized data. The whatever GFacProvider
> >>>> should
> >>>>>>>> adhere to it.
> >>>>>>>>
> >>>>>>>> I'm still inclined to have the rest of the fields to ease of
> >>>> querying
> >>>>>> for
> >>>>>>>> the required data. For example if we wanted all attempts on
> >>>> executing
> >>>>>>> for a
> >>>>>>>> particular node of a workflow or if we wanted to know which
> >>>>> application
> >>>>>>>> descriptions are faster in execution or more reliable etc. we
> >>> can
> >>>> let
> >>>>>> the
> >>>>>>>> query language deal with it. wdyt?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> >>>>>>>> danushka.menikkumbura@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Saminda,
> >>>>>>>>>
> >>>>>>>>> I think the data container does not need to have a generic
> >>>> format.
> >>>>> We
> >>>>>>> can
> >>>>>>>>> have a base class that facilitate object
> >>>>>> serialization/deserialization
> >>>>>>>> and
> >>>>>>>>> let specific meta data structure implement them as required.
> >>> We
> >>>> get
> >>>>>> the
> >>>>>>>>> Registry API to serialize objects and save them in a meta data
> >>>>> table
> >>>>>>>> (with
> >>>>>>>>> just two columns?) and to deserialize as they are loaded off
> >>> the
> >>>>>>>> registry.
> >>>>>>>>>
> >>>>>>>>> Danushka
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> >>>>>> samindaw@gmail.com
> >>>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> It has being apparent more and more that saving the data
> >>>> related
> >>>>> to
> >>>>>>>>>> executing a jobs from the GFac can be useful for many
> >>> reasons
> >>>>> such
> >>>>>>> as,
> >>>>>>>>>>
> >>>>>>>>>> debugging
> >>>>>>>>>> retrying
> >>>>>>>>>> to make smart decisions on reliability/cost etc.
> >>>>>>>>>> statistical analysis
> >>>>>>>>>>
> >>>>>>>>>> Thus we thought of saving the data related to GFac jobs in
> >>> the
> >>>>>>> registry
> >>>>>>>>> in
> >>>>>>>>>> order to facilitate feature such as above in the future.
> >>>>>>>>>>
> >>>>>>>>>> However a GFac job is potentially any sort of computing
> >>>> resource
> >>>>>>> access
> >>>>>>>>>> (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> >>>>>>>> generalized
> >>>>>>>>>> data structure that can hold the data of any type of
> >>> resource.
> >>>>>>>> Following
> >>>>>>>>>> are the suggested data to save for a single GFac job
> >>> execution,
> >>>>>>>>>>
> >>>>>>>>>> *experiment id, workflow instance id, node id* - pinpoint
> >>> the
> >>>>> node
> >>>>>>>>>> execution
> >>>>>>>>>> *service, host, application description ids *- pinpoint the
> >>>>>>> descriptors
> >>>>>>>>>> responsible
> >>>>>>>>>> *local job id* - the unique job id retrieved/generated per
> >>>>>> execution
> >>>>>>>>>> [PRIMARY KEY]
> >>>>>>>>>> *job data* - data related executing the job (eg: the rsl in
> >>>> GRAM)
> >>>>>>>>>> *submitted, completed time*
> >>>>>>>>>> *completed status* - whether the job was successfull or ran
> >>> in
> >>>> to
> >>>>>>>> errors
> >>>>>>>>>> etc.
> >>>>>>>>>> *metadata* - custom field to add anything user wants
> >>>>>>>>>>
> >>>>>>>>>> Your feedback is most welcome. The API related changes will
> >>>> also
> >>>>> be
> >>>>>>>>>> discussed once we have a proper data structure. We are
> >>> hoping
> >>>> to
> >>>>>>>>> implement
> >>>>>>>>>> this within next few days.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Saminda
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: Persisting GFac job data

Posted by Raminder Singh <ra...@gmail.com>.

These look good to me.  Can you please explain usage of getGFacJobsFromDescriptors method?  How is this different from getting the descriptors from registry? and who should register this data?

Also a typo in updateGFacJobMetadta method name. 

Thanks
Raminder
 
On May 21, 2013, at 11:28 PM, Saminda Wijeratne wrote:

> Following API functions are added for the ProvenanceManager[2],
> 
> boolean isGFacJobExists(String gfacJobId)
> void addGFacJob(GFacJob job)
> void updateGFacJob(GFacJob job)
> void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
> void updateGFacJobData(String gfacJobId, String jobdata)
> void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
> void updateGFacJobCompletedTime(String gfacJobId, Date completed)
> void updateGFacJobMetadta(String gfacJobId, String metadata)
> GFacJob getGFacJob(String gfacJobId)
> List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId, String
> hostDescriptionId, String applicationDescriptionId)
> List<GFacJob> getGFacJobs(String experimentId, String workflowExecutionId,
> String nodeId)
> 
> Thoughts are welcome!!!
> 
> 
> 2.
> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
> 
> 
> On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <sa...@gmail.com>wrote:
> 
>> But I thought the providers are part of the GFac (not as a separate
>> service). If not then the providers should report to GFac. Orelse there is
>> no way the GFac knows what status to update which data to update etc. Does
>> the current GFac implementation support this?
>> 
>> 
>> On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <thejaka.amila@gmail.com
>>> wrote:
>> 
>>> I think that should be handled at a more upper layer like Workflow
>>> Interpretter or GFac. In FT perspective it is better if providers are
>>> stateless. One reason is we dont have control over some providers and and
>>> there will be many places writing to disk if we implement the persistence
>>> logic at provider level.
>>> 
>>> Thanks
>>> Amila
>>> 
>>> 
>>> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <samindaw@gmail.com
>>>> wrote:
>>> 
>>>> On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
>>>> <th...@gmail.com>wrote:
>>>> 
>>>>> On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
>>> samindaw@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> Thanks for the feedback Amila. a few comments inline
>>>>>> 
>>>>>> 
>>>>>> On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
>>>>>> <th...@gmail.com>wrote:
>>>>>> 
>>>>>>> Hi Saminda,
>>>>>>> 
>>>>>>> Great suggestion. Also +1 for Dhanushka's proposal to have
>>>>>>> serialize/de-serilized data.
>>>>>>> Few suggestions,
>>>>>>> 1. In addition to successful/error statuses we need other status
>>> for
>>>>>> nodes
>>>>>>> & workflows
>>>>>>> and workflows.
>>>>>>> E . g :-
>>>>>>>   node - started, submitted, in-progress, failed, successful etc
>>> ...
>>>>>>> 
>>>>>> Sorry if I was too vague. Yes we have more fine-grain statuses for
>>>>> workflow
>>>>>> and node[1]. We will have a much fine-grained level of granuality
>>> for a
>>>>>> GFacJob status.
>>>>>>    public static enum GFacJobStatus{
>>>>>>        SUBMITTED, //job is submitted, possibly waiting to start
>>>>> executing
>>>>>>        EXECUTING, //submitted job is being executed
>>>>>>        CANCELLED, //job was cancelled
>>>>>>        PAUSED, //job was paused
>>>>>>        WAITING_FOR_DATA, // job is waiting for data to continue
>>>>> executing
>>>>>>        FAILED, // error occurred while job was executing and the
>>> job
>>>>>> stopped
>>>>>>        FINISHED, // job completed successfully
>>>>>>        UNKNOWN // unknown status. lookup the metadata for more
>>>> details.
>>>>>>    }
>>>>>> 
>>>>>> 
>>>>>> 2. This data will be useful in implementing FT and Load Balancing in
>>>> each
>>>>>>> component. Sometime back we had discussions to make GFac
>>> stateless.
>>>> So
>>>>>> who
>>>>>>> is going to populate this data structure and persist it ?
>>>>>>> 
>>>>>> That is a very good question... :). This summer is going to be a
>>> long
>>>>>> one... ;)
>>>>>> 
>>>>> 
>>>>> What I meant is which component is doing persistence ? (GFac or WF
>>>>> Interpretter). Not the actual person who is going to implement it :).
>>>>> 
>>>> hih hih....
>>>> Well its going to be whatever the provider respondible for managing the
>>> job
>>>> lifecycle. For example GRAMProvider should be responsible for recording
>>> all
>>>> the data relating to the GRAM jobs its working with.
>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 1.
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Amila
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
>>>>> samindaw@gmail.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thats is an excellent idea. We can have the job data field to be
>>>> the
>>>>>>>> designated GFac job serialized data. The whatever GFacProvider
>>>> should
>>>>>>>> adhere to it.
>>>>>>>> 
>>>>>>>> I'm still inclined to have the rest of the fields to ease of
>>>> querying
>>>>>> for
>>>>>>>> the required data. For example if we wanted all attempts on
>>>> executing
>>>>>>> for a
>>>>>>>> particular node of a workflow or if we wanted to know which
>>>>> application
>>>>>>>> descriptions are faster in execution or more reliable etc. we
>>> can
>>>> let
>>>>>> the
>>>>>>>> query language deal with it. wdyt?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
>>>>>>>> danushka.menikkumbura@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Saminda,
>>>>>>>>> 
>>>>>>>>> I think the data container does not need to have a generic
>>>> format.
>>>>> We
>>>>>>> can
>>>>>>>>> have a base class that facilitate object
>>>>>> serialization/deserialization
>>>>>>>> and
>>>>>>>>> let specific meta data structure implement them as required.
>>> We
>>>> get
>>>>>> the
>>>>>>>>> Registry API to serialize objects and save them in a meta data
>>>>> table
>>>>>>>> (with
>>>>>>>>> just two columns?) and to deserialize as they are loaded off
>>> the
>>>>>>>> registry.
>>>>>>>>> 
>>>>>>>>> Danushka
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
>>>>>> samindaw@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> It has being apparent more and more that saving the data
>>>> related
>>>>> to
>>>>>>>>>> executing a jobs from the GFac can be useful for many
>>> reasons
>>>>> such
>>>>>>> as,
>>>>>>>>>> 
>>>>>>>>>> debugging
>>>>>>>>>> retrying
>>>>>>>>>> to make smart decisions on reliability/cost etc.
>>>>>>>>>> statistical analysis
>>>>>>>>>> 
>>>>>>>>>> Thus we thought of saving the data related to GFac jobs in
>>> the
>>>>>>> registry
>>>>>>>>> in
>>>>>>>>>> order to facilitate feature such as above in the future.
>>>>>>>>>> 
>>>>>>>>>> However a GFac job is potentially any sort of computing
>>>> resource
>>>>>>> access
>>>>>>>>>> (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
>>>>>>>> generalized
>>>>>>>>>> data structure that can hold the data of any type of
>>> resource.
>>>>>>>> Following
>>>>>>>>>> are the suggested data to save for a single GFac job
>>> execution,
>>>>>>>>>> 
>>>>>>>>>> *experiment id, workflow instance id, node id* - pinpoint
>>> the
>>>>> node
>>>>>>>>>> execution
>>>>>>>>>> *service, host, application description ids *- pinpoint the
>>>>>>> descriptors
>>>>>>>>>> responsible
>>>>>>>>>> *local job id* - the unique job id retrieved/generated per
>>>>>> execution
>>>>>>>>>> [PRIMARY KEY]
>>>>>>>>>> *job data* - data related executing the job (eg: the rsl in
>>>> GRAM)
>>>>>>>>>> *submitted, completed time*
>>>>>>>>>> *completed status* - whether the job was successfull or ran
>>> in
>>>> to
>>>>>>>> errors
>>>>>>>>>> etc.
>>>>>>>>>> *metadata* - custom field to add anything user wants
>>>>>>>>>> 
>>>>>>>>>> Your feedback is most welcome. The API related changes will
>>>> also
>>>>> be
>>>>>>>>>> discussed once we have a proper data structure. We are
>>> hoping
>>>> to
>>>>>>>>> implement
>>>>>>>>>> this within next few days.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Saminda
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

I'm thinking of injecting code to save GFac job data from our Airavata code
itself. So far it seems only at the Provider level this is possible since
the required data is only available at that point.
For instance as I see, to record gram data I need to have code in the
execute function in GramProvider class where jobid becomes available along
with other data. Is it really the place or is there a generic place where
we can do this for all providers?


On Thu, May 30, 2013 at 2:23 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> I updated the names in both the API function sets for GFacJobData and
> GFacJobErrorData in ExecutionManager and ProvenanceManager. If anyone has
> being using these functions outside of the Airavata trunk please update
> your code to reflect this change. Basically change "GFac" to "Application"
>
> Thanks,
> Saminda
>
>
> On Wed, May 29, 2013 at 5:19 PM, Saminda Wijeratne <sa...@gmail.com>wrote:
>
>> Hi Guys,
>>
>> Since there is no objection for the suggested name pattern (
>> addApplicationJob(...)) and since we are sort of running short of time
>> we are going ahead of with that name.
>>
>> If you a much better suggestion please respond within today or tomorrow
>> so that we can incorporate any changes for 0.8 release without delay.
>>
>> Thanks,
>> Saminda
>>
>>
>> On Wed, May 22, 2013 at 11:02 PM, Saminda Wijeratne <sa...@gmail.com>wrote:
>>
>>> Application. But in our case we may have to use both. eg:
>>> addApplicationJob(...) or addApplicationSubmission(...). The name addApplication(...)
>>> is misleading I think. wdyt?
>>>
>>>
>>> On Wed, May 22, 2013 at 1:43 PM, Amila Jayasekara <
>>> thejaka.amila@gmail.com> wrote:
>>>
>>>> What is more familiar ? "Application" or "Job" ?
>>>>
>>>> Thanks
>>>> Amila
>>>>
>>>>
>>>> On Wed, May 22, 2013 at 11:28 AM, Saminda Wijeratne <samindaw@gmail.com
>>>> >wrote:
>>>>
>>>> > On Wed, May 22, 2013 at 11:22 AM, Amila Jayasekara
>>>> > <th...@gmail.com>wrote:
>>>> >
>>>> > > I am bit concerned about the names. Are we assuming that API users
>>>> has
>>>> > > knowledge about GFac ?
>>>> > > OR else we can just remove "GFac" substring and have method names
>>>> like
>>>> > > "void
>>>> > > updateJobMetadta(..)"
>>>> > >
>>>> > You have a point there Amila. Perhaps we can name them as
>>>> "Application"
>>>> > rather than GFac since we already have the notion of an application
>>>> > descriptor in the API. wdyt?
>>>> >
>>>> >
>>>> > > Thanks
>>>> > > Amila
>>>> > >
>>>> > >
>>>> > > On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <
>>>> samindaw@gmail.com
>>>> > > >wrote:
>>>> > >
>>>> > > > Following API functions are added for the ProvenanceManager[2],
>>>> > > >
>>>> > > > boolean isGFacJobExists(String gfacJobId)
>>>> > > > void addGFacJob(GFacJob job)
>>>> > > > void updateGFacJob(GFacJob job)
>>>> > > > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
>>>> > > > void updateGFacJobData(String gfacJobId, String jobdata)
>>>> > > > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
>>>> > > > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
>>>> > > > void updateGFacJobMetadta(String gfacJobId, String metadata)
>>>> > > > GFacJob getGFacJob(String gfacJobId)
>>>> > > > List<GFacJob> getGFacJobsForDescriptors(String
>>>> serviceDescriptionId,
>>>> > > String
>>>> > > > hostDescriptionId, String applicationDescriptionId)
>>>> > > > List<GFacJob> getGFacJobs(String experimentId, String
>>>> > > workflowExecutionId,
>>>> > > > String nodeId)
>>>> > > >
>>>> > > > Thoughts are welcome!!!
>>>> > > >
>>>> > > >
>>>> > > > 2.
>>>> > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
>>>> > > >
>>>> > > >
>>>> > > > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <
>>>> samindaw@gmail.com
>>>> > > > >wrote:
>>>> > > >
>>>> > > > > But I thought the providers are part of the GFac (not as a
>>>> separate
>>>> > > > > service). If not then the providers should report to GFac.
>>>> Orelse
>>>> > there
>>>> > > > is
>>>> > > > > no way the GFac knows what status to update which data to
>>>> update etc.
>>>> > > > Does
>>>> > > > > the current GFac implementation support this?
>>>> > > > >
>>>> > > > >
>>>> > > > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
>>>> > > > thejaka.amila@gmail.com
>>>> > > > > > wrote:
>>>> > > > >
>>>> > > > >> I think that should be handled at a more upper layer like
>>>> Workflow
>>>> > > > >> Interpretter or GFac. In FT perspective it is better if
>>>> providers
>>>> > are
>>>> > > > >> stateless. One reason is we dont have control over some
>>>> providers
>>>> > and
>>>> > > > and
>>>> > > > >> there will be many places writing to disk if we implement the
>>>> > > > persistence
>>>> > > > >> logic at provider level.
>>>> > > > >>
>>>> > > > >> Thanks
>>>> > > > >> Amila
>>>> > > > >>
>>>> > > > >>
>>>> > > > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <
>>>> > > samindaw@gmail.com
>>>> > > > >> >wrote:
>>>> > > > >>
>>>> > > > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
>>>> > > > >> > <th...@gmail.com>wrote:
>>>> > > > >> >
>>>> > > > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
>>>> > > > >> samindaw@gmail.com
>>>> > > > >> > > >wrote:
>>>> > > > >> > >
>>>> > > > >> > > > Thanks for the feedback Amila. a few comments inline
>>>> > > > >> > > >
>>>> > > > >> > > >
>>>> > > > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
>>>> > > > >> > > > <th...@gmail.com>wrote:
>>>> > > > >> > > >
>>>> > > > >> > > > > Hi Saminda,
>>>> > > > >> > > > >
>>>> > > > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to
>>>> have
>>>> > > > >> > > > > serialize/de-serilized data.
>>>> > > > >> > > > > Few suggestions,
>>>> > > > >> > > > > 1. In addition to successful/error statuses we need
>>>> other
>>>> > > status
>>>> > > > >> for
>>>> > > > >> > > > nodes
>>>> > > > >> > > > > & workflows
>>>> > > > >> > > > > and workflows.
>>>> > > > >> > > > > E . g :-
>>>> > > > >> > > > >    node - started, submitted, in-progress, failed,
>>>> > successful
>>>> > > > etc
>>>> > > > >> ...
>>>> > > > >> > > > >
>>>> > > > >> > > > Sorry if I was too vague. Yes we have more fine-grain
>>>> statuses
>>>> > > for
>>>> > > > >> > > workflow
>>>> > > > >> > > > and node[1]. We will have a much fine-grained level of
>>>> > > granuality
>>>> > > > >> for a
>>>> > > > >> > > > GFacJob status.
>>>> > > > >> > > >     public static enum GFacJobStatus{
>>>> > > > >> > > >         SUBMITTED, //job is submitted, possibly waiting
>>>> to
>>>> > start
>>>> > > > >> > > executing
>>>> > > > >> > > >         EXECUTING, //submitted job is being executed
>>>> > > > >> > > >         CANCELLED, //job was cancelled
>>>> > > > >> > > >         PAUSED, //job was paused
>>>> > > > >> > > >         WAITING_FOR_DATA, // job is waiting for data to
>>>> > continue
>>>> > > > >> > > executing
>>>> > > > >> > > >         FAILED, // error occurred while job was
>>>> executing and
>>>> > > the
>>>> > > > >> job
>>>> > > > >> > > > stopped
>>>> > > > >> > > >         FINISHED, // job completed successfully
>>>> > > > >> > > >         UNKNOWN // unknown status. lookup the metadata
>>>> for
>>>> > more
>>>> > > > >> > details.
>>>> > > > >> > > >     }
>>>> > > > >> > > >
>>>> > > > >> > > >
>>>> > > > >> > > > 2. This data will be useful in implementing FT and Load
>>>> > > Balancing
>>>> > > > in
>>>> > > > >> > each
>>>> > > > >> > > > > component. Sometime back we had discussions to make
>>>> GFac
>>>> > > > >> stateless.
>>>> > > > >> > So
>>>> > > > >> > > > who
>>>> > > > >> > > > > is going to populate this data structure and persist
>>>> it ?
>>>> > > > >> > > > >
>>>> > > > >> > > > That is a very good question... :). This summer is going
>>>> to
>>>> > be a
>>>> > > > >> long
>>>> > > > >> > > > one... ;)
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> > > What I meant is which component is doing persistence ?
>>>> (GFac or
>>>> > WF
>>>> > > > >> > > Interpretter). Not the actual person who is going to
>>>> implement
>>>> > it
>>>> > > > :).
>>>> > > > >> > >
>>>> > > > >> > hih hih....
>>>> > > > >> > Well its going to be whatever the provider respondible for
>>>> > managing
>>>> > > > the
>>>> > > > >> job
>>>> > > > >> > lifecycle. For example GRAMProvider should be responsible for
>>>> > > > recording
>>>> > > > >> all
>>>> > > > >> > the data relating to the GRAM jobs its working with.
>>>> > > > >> >
>>>> > > > >> > >
>>>> > > > >> > >
>>>> > > > >> > > >
>>>> > > > >> > > > 1.
>>>> > > > >> > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
>>>> > > > >> > > >
>>>> > > > >> > > > >
>>>> > > > >> > > > > Thanks
>>>> > > > >> > > > > Amila
>>>> > > > >> > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
>>>> > > > >> > > samindaw@gmail.com
>>>> > > > >> > > > > >wrote:
>>>> > > > >> > > > >
>>>> > > > >> > > > > > Thats is an excellent idea. We can have the job data
>>>> field
>>>> > > to
>>>> > > > be
>>>> > > > >> > the
>>>> > > > >> > > > > > designated GFac job serialized data. The whatever
>>>> > > GFacProvider
>>>> > > > >> > should
>>>> > > > >> > > > > > adhere to it.
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > I'm still inclined to have the rest of the fields to
>>>> ease
>>>> > of
>>>> > > > >> > querying
>>>> > > > >> > > > for
>>>> > > > >> > > > > > the required data. For example if we wanted all
>>>> attempts
>>>> > on
>>>> > > > >> > executing
>>>> > > > >> > > > > for a
>>>> > > > >> > > > > > particular node of a workflow or if we wanted to know
>>>> > which
>>>> > > > >> > > application
>>>> > > > >> > > > > > descriptions are faster in execution or more
>>>> reliable etc.
>>>> > > we
>>>> > > > >> can
>>>> > > > >> > let
>>>> > > > >> > > > the
>>>> > > > >> > > > > > query language deal with it. wdyt?
>>>> > > > >> > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka
>>>> Menikkumbura <
>>>> > > > >> > > > > > danushka.menikkumbura@gmail.com> wrote:
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > > Saminda,
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > I think the data container does not need to have a
>>>> > generic
>>>> > > > >> > format.
>>>> > > > >> > > We
>>>> > > > >> > > > > can
>>>> > > > >> > > > > > > have a base class that facilitate object
>>>> > > > >> > > > serialization/deserialization
>>>> > > > >> > > > > > and
>>>> > > > >> > > > > > > let specific meta data structure implement them as
>>>> > > required.
>>>> > > > >> We
>>>> > > > >> > get
>>>> > > > >> > > > the
>>>> > > > >> > > > > > > Registry API to serialize objects and save them in
>>>> a
>>>> > meta
>>>> > > > data
>>>> > > > >> > > table
>>>> > > > >> > > > > > (with
>>>> > > > >> > > > > > > just two columns?) and to deserialize as they are
>>>> loaded
>>>> > > off
>>>> > > > >> the
>>>> > > > >> > > > > > registry.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Danushka
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne
>>>> <
>>>> > > > >> > > > samindaw@gmail.com
>>>> > > > >> > > > > > > >wrote:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > > It has being apparent more and more that saving
>>>> the
>>>> > data
>>>> > > > >> > related
>>>> > > > >> > > to
>>>> > > > >> > > > > > > > executing a jobs from the GFac can be useful for
>>>> many
>>>> > > > >> reasons
>>>> > > > >> > > such
>>>> > > > >> > > > > as,
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > debugging
>>>> > > > >> > > > > > > > retrying
>>>> > > > >> > > > > > > > to make smart decisions on reliability/cost etc.
>>>> > > > >> > > > > > > > statistical analysis
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > Thus we thought of saving the data related to
>>>> GFac
>>>> > jobs
>>>> > > in
>>>> > > > >> the
>>>> > > > >> > > > > registry
>>>> > > > >> > > > > > > in
>>>> > > > >> > > > > > > > order to facilitate feature such as above in the
>>>> > future.
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > However a GFac job is potentially any sort of
>>>> > computing
>>>> > > > >> > resource
>>>> > > > >> > > > > access
>>>> > > > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to
>>>> come up
>>>> > > > with a
>>>> > > > >> > > > > > generalized
>>>> > > > >> > > > > > > > data structure that can hold the data of any
>>>> type of
>>>> > > > >> resource.
>>>> > > > >> > > > > > Following
>>>> > > > >> > > > > > > > are the suggested data to save for a single GFac
>>>> job
>>>> > > > >> execution,
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > *experiment id, workflow instance id, node id* -
>>>> > > pinpoint
>>>> > > > >> the
>>>> > > > >> > > node
>>>> > > > >> > > > > > > > execution
>>>> > > > >> > > > > > > > *service, host, application description ids *-
>>>> > pinpoint
>>>> > > > the
>>>> > > > >> > > > > descriptors
>>>> > > > >> > > > > > > > responsible
>>>> > > > >> > > > > > > > *local job id* - the unique job id
>>>> retrieved/generated
>>>> > > per
>>>> > > > >> > > > execution
>>>> > > > >> > > > > > > > [PRIMARY KEY]
>>>> > > > >> > > > > > > > *job data* - data related executing the job (eg:
>>>> the
>>>> > rsl
>>>> > > > in
>>>> > > > >> > GRAM)
>>>> > > > >> > > > > > > > *submitted, completed time*
>>>> > > > >> > > > > > > > *completed status* - whether the job was
>>>> successfull
>>>> > or
>>>> > > > ran
>>>> > > > >> in
>>>> > > > >> > to
>>>> > > > >> > > > > > errors
>>>> > > > >> > > > > > > > etc.
>>>> > > > >> > > > > > > > *metadata* - custom field to add anything user
>>>> wants
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > Your feedback is most welcome. The API related
>>>> changes
>>>> > > > will
>>>> > > > >> > also
>>>> > > > >> > > be
>>>> > > > >> > > > > > > > discussed once we have a proper data structure.
>>>> We are
>>>> > > > >> hoping
>>>> > > > >> > to
>>>> > > > >> > > > > > > implement
>>>> > > > >> > > > > > > > this within next few days.
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > Thanks,
>>>> > > > >> > > > > > > > Saminda
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

I updated the names in both the API function sets for GFacJobData and
GFacJobErrorData in ExecutionManager and ProvenanceManager. If anyone has
being using these functions outside of the Airavata trunk please update
your code to reflect this change. Basically change "GFac" to "Application"

Thanks,
Saminda


On Wed, May 29, 2013 at 5:19 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> Hi Guys,
>
> Since there is no objection for the suggested name pattern (
> addApplicationJob(...)) and since we are sort of running short of time we
> are going ahead of with that name.
>
> If you a much better suggestion please respond within today or tomorrow so
> that we can incorporate any changes for 0.8 release without delay.
>
> Thanks,
> Saminda
>
>
> On Wed, May 22, 2013 at 11:02 PM, Saminda Wijeratne <sa...@gmail.com>wrote:
>
>> Application. But in our case we may have to use both. eg:
>> addApplicationJob(...) or addApplicationSubmission(...). The name addApplication(...)
>> is misleading I think. wdyt?
>>
>>
>> On Wed, May 22, 2013 at 1:43 PM, Amila Jayasekara <
>> thejaka.amila@gmail.com> wrote:
>>
>>> What is more familiar ? "Application" or "Job" ?
>>>
>>> Thanks
>>> Amila
>>>
>>>
>>> On Wed, May 22, 2013 at 11:28 AM, Saminda Wijeratne <samindaw@gmail.com
>>> >wrote:
>>>
>>> > On Wed, May 22, 2013 at 11:22 AM, Amila Jayasekara
>>> > <th...@gmail.com>wrote:
>>> >
>>> > > I am bit concerned about the names. Are we assuming that API users
>>> has
>>> > > knowledge about GFac ?
>>> > > OR else we can just remove "GFac" substring and have method names
>>> like
>>> > > "void
>>> > > updateJobMetadta(..)"
>>> > >
>>> > You have a point there Amila. Perhaps we can name them as "Application"
>>> > rather than GFac since we already have the notion of an application
>>> > descriptor in the API. wdyt?
>>> >
>>> >
>>> > > Thanks
>>> > > Amila
>>> > >
>>> > >
>>> > > On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <
>>> samindaw@gmail.com
>>> > > >wrote:
>>> > >
>>> > > > Following API functions are added for the ProvenanceManager[2],
>>> > > >
>>> > > > boolean isGFacJobExists(String gfacJobId)
>>> > > > void addGFacJob(GFacJob job)
>>> > > > void updateGFacJob(GFacJob job)
>>> > > > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
>>> > > > void updateGFacJobData(String gfacJobId, String jobdata)
>>> > > > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
>>> > > > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
>>> > > > void updateGFacJobMetadta(String gfacJobId, String metadata)
>>> > > > GFacJob getGFacJob(String gfacJobId)
>>> > > > List<GFacJob> getGFacJobsForDescriptors(String
>>> serviceDescriptionId,
>>> > > String
>>> > > > hostDescriptionId, String applicationDescriptionId)
>>> > > > List<GFacJob> getGFacJobs(String experimentId, String
>>> > > workflowExecutionId,
>>> > > > String nodeId)
>>> > > >
>>> > > > Thoughts are welcome!!!
>>> > > >
>>> > > >
>>> > > > 2.
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
>>> > > >
>>> > > >
>>> > > > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <
>>> samindaw@gmail.com
>>> > > > >wrote:
>>> > > >
>>> > > > > But I thought the providers are part of the GFac (not as a
>>> separate
>>> > > > > service). If not then the providers should report to GFac. Orelse
>>> > there
>>> > > > is
>>> > > > > no way the GFac knows what status to update which data to update
>>> etc.
>>> > > > Does
>>> > > > > the current GFac implementation support this?
>>> > > > >
>>> > > > >
>>> > > > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
>>> > > > thejaka.amila@gmail.com
>>> > > > > > wrote:
>>> > > > >
>>> > > > >> I think that should be handled at a more upper layer like
>>> Workflow
>>> > > > >> Interpretter or GFac. In FT perspective it is better if
>>> providers
>>> > are
>>> > > > >> stateless. One reason is we dont have control over some
>>> providers
>>> > and
>>> > > > and
>>> > > > >> there will be many places writing to disk if we implement the
>>> > > > persistence
>>> > > > >> logic at provider level.
>>> > > > >>
>>> > > > >> Thanks
>>> > > > >> Amila
>>> > > > >>
>>> > > > >>
>>> > > > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <
>>> > > samindaw@gmail.com
>>> > > > >> >wrote:
>>> > > > >>
>>> > > > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
>>> > > > >> > <th...@gmail.com>wrote:
>>> > > > >> >
>>> > > > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
>>> > > > >> samindaw@gmail.com
>>> > > > >> > > >wrote:
>>> > > > >> > >
>>> > > > >> > > > Thanks for the feedback Amila. a few comments inline
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
>>> > > > >> > > > <th...@gmail.com>wrote:
>>> > > > >> > > >
>>> > > > >> > > > > Hi Saminda,
>>> > > > >> > > > >
>>> > > > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to
>>> have
>>> > > > >> > > > > serialize/de-serilized data.
>>> > > > >> > > > > Few suggestions,
>>> > > > >> > > > > 1. In addition to successful/error statuses we need
>>> other
>>> > > status
>>> > > > >> for
>>> > > > >> > > > nodes
>>> > > > >> > > > > & workflows
>>> > > > >> > > > > and workflows.
>>> > > > >> > > > > E . g :-
>>> > > > >> > > > >    node - started, submitted, in-progress, failed,
>>> > successful
>>> > > > etc
>>> > > > >> ...
>>> > > > >> > > > >
>>> > > > >> > > > Sorry if I was too vague. Yes we have more fine-grain
>>> statuses
>>> > > for
>>> > > > >> > > workflow
>>> > > > >> > > > and node[1]. We will have a much fine-grained level of
>>> > > granuality
>>> > > > >> for a
>>> > > > >> > > > GFacJob status.
>>> > > > >> > > >     public static enum GFacJobStatus{
>>> > > > >> > > >         SUBMITTED, //job is submitted, possibly waiting to
>>> > start
>>> > > > >> > > executing
>>> > > > >> > > >         EXECUTING, //submitted job is being executed
>>> > > > >> > > >         CANCELLED, //job was cancelled
>>> > > > >> > > >         PAUSED, //job was paused
>>> > > > >> > > >         WAITING_FOR_DATA, // job is waiting for data to
>>> > continue
>>> > > > >> > > executing
>>> > > > >> > > >         FAILED, // error occurred while job was executing
>>> and
>>> > > the
>>> > > > >> job
>>> > > > >> > > > stopped
>>> > > > >> > > >         FINISHED, // job completed successfully
>>> > > > >> > > >         UNKNOWN // unknown status. lookup the metadata for
>>> > more
>>> > > > >> > details.
>>> > > > >> > > >     }
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > 2. This data will be useful in implementing FT and Load
>>> > > Balancing
>>> > > > in
>>> > > > >> > each
>>> > > > >> > > > > component. Sometime back we had discussions to make GFac
>>> > > > >> stateless.
>>> > > > >> > So
>>> > > > >> > > > who
>>> > > > >> > > > > is going to populate this data structure and persist it
>>> ?
>>> > > > >> > > > >
>>> > > > >> > > > That is a very good question... :). This summer is going
>>> to
>>> > be a
>>> > > > >> long
>>> > > > >> > > > one... ;)
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> > > What I meant is which component is doing persistence ?
>>> (GFac or
>>> > WF
>>> > > > >> > > Interpretter). Not the actual person who is going to
>>> implement
>>> > it
>>> > > > :).
>>> > > > >> > >
>>> > > > >> > hih hih....
>>> > > > >> > Well its going to be whatever the provider respondible for
>>> > managing
>>> > > > the
>>> > > > >> job
>>> > > > >> > lifecycle. For example GRAMProvider should be responsible for
>>> > > > recording
>>> > > > >> all
>>> > > > >> > the data relating to the GRAM jobs its working with.
>>> > > > >> >
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > > >
>>> > > > >> > > > 1.
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
>>> > > > >> > > >
>>> > > > >> > > > >
>>> > > > >> > > > > Thanks
>>> > > > >> > > > > Amila
>>> > > > >> > > > >
>>> > > > >> > > > >
>>> > > > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
>>> > > > >> > > samindaw@gmail.com
>>> > > > >> > > > > >wrote:
>>> > > > >> > > > >
>>> > > > >> > > > > > Thats is an excellent idea. We can have the job data
>>> field
>>> > > to
>>> > > > be
>>> > > > >> > the
>>> > > > >> > > > > > designated GFac job serialized data. The whatever
>>> > > GFacProvider
>>> > > > >> > should
>>> > > > >> > > > > > adhere to it.
>>> > > > >> > > > > >
>>> > > > >> > > > > > I'm still inclined to have the rest of the fields to
>>> ease
>>> > of
>>> > > > >> > querying
>>> > > > >> > > > for
>>> > > > >> > > > > > the required data. For example if we wanted all
>>> attempts
>>> > on
>>> > > > >> > executing
>>> > > > >> > > > > for a
>>> > > > >> > > > > > particular node of a workflow or if we wanted to know
>>> > which
>>> > > > >> > > application
>>> > > > >> > > > > > descriptions are faster in execution or more reliable
>>> etc.
>>> > > we
>>> > > > >> can
>>> > > > >> > let
>>> > > > >> > > > the
>>> > > > >> > > > > > query language deal with it. wdyt?
>>> > > > >> > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka
>>> Menikkumbura <
>>> > > > >> > > > > > danushka.menikkumbura@gmail.com> wrote:
>>> > > > >> > > > > >
>>> > > > >> > > > > > > Saminda,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think the data container does not need to have a
>>> > generic
>>> > > > >> > format.
>>> > > > >> > > We
>>> > > > >> > > > > can
>>> > > > >> > > > > > > have a base class that facilitate object
>>> > > > >> > > > serialization/deserialization
>>> > > > >> > > > > > and
>>> > > > >> > > > > > > let specific meta data structure implement them as
>>> > > required.
>>> > > > >> We
>>> > > > >> > get
>>> > > > >> > > > the
>>> > > > >> > > > > > > Registry API to serialize objects and save them in a
>>> > meta
>>> > > > data
>>> > > > >> > > table
>>> > > > >> > > > > > (with
>>> > > > >> > > > > > > just two columns?) and to deserialize as they are
>>> loaded
>>> > > off
>>> > > > >> the
>>> > > > >> > > > > > registry.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Danushka
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
>>> > > > >> > > > samindaw@gmail.com
>>> > > > >> > > > > > > >wrote:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > It has being apparent more and more that saving
>>> the
>>> > data
>>> > > > >> > related
>>> > > > >> > > to
>>> > > > >> > > > > > > > executing a jobs from the GFac can be useful for
>>> many
>>> > > > >> reasons
>>> > > > >> > > such
>>> > > > >> > > > > as,
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > debugging
>>> > > > >> > > > > > > > retrying
>>> > > > >> > > > > > > > to make smart decisions on reliability/cost etc.
>>> > > > >> > > > > > > > statistical analysis
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Thus we thought of saving the data related to GFac
>>> > jobs
>>> > > in
>>> > > > >> the
>>> > > > >> > > > > registry
>>> > > > >> > > > > > > in
>>> > > > >> > > > > > > > order to facilitate feature such as above in the
>>> > future.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > However a GFac job is potentially any sort of
>>> > computing
>>> > > > >> > resource
>>> > > > >> > > > > access
>>> > > > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to
>>> come up
>>> > > > with a
>>> > > > >> > > > > > generalized
>>> > > > >> > > > > > > > data structure that can hold the data of any type
>>> of
>>> > > > >> resource.
>>> > > > >> > > > > > Following
>>> > > > >> > > > > > > > are the suggested data to save for a single GFac
>>> job
>>> > > > >> execution,
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > *experiment id, workflow instance id, node id* -
>>> > > pinpoint
>>> > > > >> the
>>> > > > >> > > node
>>> > > > >> > > > > > > > execution
>>> > > > >> > > > > > > > *service, host, application description ids *-
>>> > pinpoint
>>> > > > the
>>> > > > >> > > > > descriptors
>>> > > > >> > > > > > > > responsible
>>> > > > >> > > > > > > > *local job id* - the unique job id
>>> retrieved/generated
>>> > > per
>>> > > > >> > > > execution
>>> > > > >> > > > > > > > [PRIMARY KEY]
>>> > > > >> > > > > > > > *job data* - data related executing the job (eg:
>>> the
>>> > rsl
>>> > > > in
>>> > > > >> > GRAM)
>>> > > > >> > > > > > > > *submitted, completed time*
>>> > > > >> > > > > > > > *completed status* - whether the job was
>>> successfull
>>> > or
>>> > > > ran
>>> > > > >> in
>>> > > > >> > to
>>> > > > >> > > > > > errors
>>> > > > >> > > > > > > > etc.
>>> > > > >> > > > > > > > *metadata* - custom field to add anything user
>>> wants
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Your feedback is most welcome. The API related
>>> changes
>>> > > > will
>>> > > > >> > also
>>> > > > >> > > be
>>> > > > >> > > > > > > > discussed once we have a proper data structure.
>>> We are
>>> > > > >> hoping
>>> > > > >> > to
>>> > > > >> > > > > > > implement
>>> > > > >> > > > > > > > this within next few days.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Thanks,
>>> > > > >> > > > > > > > Saminda
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

Hi Guys,

Since there is no objection for the suggested name pattern (
addApplicationJob(...)) and since we are sort of running short of time we
are going ahead of with that name.

If you a much better suggestion please respond within today or tomorrow so
that we can incorporate any changes for 0.8 release without delay.

Thanks,
Saminda


On Wed, May 22, 2013 at 11:02 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> Application. But in our case we may have to use both. eg:
> addApplicationJob(...) or addApplicationSubmission(...). The name addApplication(...)
> is misleading I think. wdyt?
>
>
> On Wed, May 22, 2013 at 1:43 PM, Amila Jayasekara <thejaka.amila@gmail.com
> > wrote:
>
>> What is more familiar ? "Application" or "Job" ?
>>
>> Thanks
>> Amila
>>
>>
>> On Wed, May 22, 2013 at 11:28 AM, Saminda Wijeratne <samindaw@gmail.com
>> >wrote:
>>
>> > On Wed, May 22, 2013 at 11:22 AM, Amila Jayasekara
>> > <th...@gmail.com>wrote:
>> >
>> > > I am bit concerned about the names. Are we assuming that API users has
>> > > knowledge about GFac ?
>> > > OR else we can just remove "GFac" substring and have method names like
>> > > "void
>> > > updateJobMetadta(..)"
>> > >
>> > You have a point there Amila. Perhaps we can name them as "Application"
>> > rather than GFac since we already have the notion of an application
>> > descriptor in the API. wdyt?
>> >
>> >
>> > > Thanks
>> > > Amila
>> > >
>> > >
>> > > On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <
>> samindaw@gmail.com
>> > > >wrote:
>> > >
>> > > > Following API functions are added for the ProvenanceManager[2],
>> > > >
>> > > > boolean isGFacJobExists(String gfacJobId)
>> > > > void addGFacJob(GFacJob job)
>> > > > void updateGFacJob(GFacJob job)
>> > > > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
>> > > > void updateGFacJobData(String gfacJobId, String jobdata)
>> > > > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
>> > > > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
>> > > > void updateGFacJobMetadta(String gfacJobId, String metadata)
>> > > > GFacJob getGFacJob(String gfacJobId)
>> > > > List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId,
>> > > String
>> > > > hostDescriptionId, String applicationDescriptionId)
>> > > > List<GFacJob> getGFacJobs(String experimentId, String
>> > > workflowExecutionId,
>> > > > String nodeId)
>> > > >
>> > > > Thoughts are welcome!!!
>> > > >
>> > > >
>> > > > 2.
>> > > >
>> > > >
>> > >
>> >
>> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
>> > > >
>> > > >
>> > > > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <
>> samindaw@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > But I thought the providers are part of the GFac (not as a
>> separate
>> > > > > service). If not then the providers should report to GFac. Orelse
>> > there
>> > > > is
>> > > > > no way the GFac knows what status to update which data to update
>> etc.
>> > > > Does
>> > > > > the current GFac implementation support this?
>> > > > >
>> > > > >
>> > > > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
>> > > > thejaka.amila@gmail.com
>> > > > > > wrote:
>> > > > >
>> > > > >> I think that should be handled at a more upper layer like
>> Workflow
>> > > > >> Interpretter or GFac. In FT perspective it is better if providers
>> > are
>> > > > >> stateless. One reason is we dont have control over some providers
>> > and
>> > > > and
>> > > > >> there will be many places writing to disk if we implement the
>> > > > persistence
>> > > > >> logic at provider level.
>> > > > >>
>> > > > >> Thanks
>> > > > >> Amila
>> > > > >>
>> > > > >>
>> > > > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <
>> > > samindaw@gmail.com
>> > > > >> >wrote:
>> > > > >>
>> > > > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
>> > > > >> > <th...@gmail.com>wrote:
>> > > > >> >
>> > > > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
>> > > > >> samindaw@gmail.com
>> > > > >> > > >wrote:
>> > > > >> > >
>> > > > >> > > > Thanks for the feedback Amila. a few comments inline
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
>> > > > >> > > > <th...@gmail.com>wrote:
>> > > > >> > > >
>> > > > >> > > > > Hi Saminda,
>> > > > >> > > > >
>> > > > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to
>> have
>> > > > >> > > > > serialize/de-serilized data.
>> > > > >> > > > > Few suggestions,
>> > > > >> > > > > 1. In addition to successful/error statuses we need other
>> > > status
>> > > > >> for
>> > > > >> > > > nodes
>> > > > >> > > > > & workflows
>> > > > >> > > > > and workflows.
>> > > > >> > > > > E . g :-
>> > > > >> > > > >    node - started, submitted, in-progress, failed,
>> > successful
>> > > > etc
>> > > > >> ...
>> > > > >> > > > >
>> > > > >> > > > Sorry if I was too vague. Yes we have more fine-grain
>> statuses
>> > > for
>> > > > >> > > workflow
>> > > > >> > > > and node[1]. We will have a much fine-grained level of
>> > > granuality
>> > > > >> for a
>> > > > >> > > > GFacJob status.
>> > > > >> > > >     public static enum GFacJobStatus{
>> > > > >> > > >         SUBMITTED, //job is submitted, possibly waiting to
>> > start
>> > > > >> > > executing
>> > > > >> > > >         EXECUTING, //submitted job is being executed
>> > > > >> > > >         CANCELLED, //job was cancelled
>> > > > >> > > >         PAUSED, //job was paused
>> > > > >> > > >         WAITING_FOR_DATA, // job is waiting for data to
>> > continue
>> > > > >> > > executing
>> > > > >> > > >         FAILED, // error occurred while job was executing
>> and
>> > > the
>> > > > >> job
>> > > > >> > > > stopped
>> > > > >> > > >         FINISHED, // job completed successfully
>> > > > >> > > >         UNKNOWN // unknown status. lookup the metadata for
>> > more
>> > > > >> > details.
>> > > > >> > > >     }
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > 2. This data will be useful in implementing FT and Load
>> > > Balancing
>> > > > in
>> > > > >> > each
>> > > > >> > > > > component. Sometime back we had discussions to make GFac
>> > > > >> stateless.
>> > > > >> > So
>> > > > >> > > > who
>> > > > >> > > > > is going to populate this data structure and persist it ?
>> > > > >> > > > >
>> > > > >> > > > That is a very good question... :). This summer is going to
>> > be a
>> > > > >> long
>> > > > >> > > > one... ;)
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > > What I meant is which component is doing persistence ? (GFac
>> or
>> > WF
>> > > > >> > > Interpretter). Not the actual person who is going to
>> implement
>> > it
>> > > > :).
>> > > > >> > >
>> > > > >> > hih hih....
>> > > > >> > Well its going to be whatever the provider respondible for
>> > managing
>> > > > the
>> > > > >> job
>> > > > >> > lifecycle. For example GRAMProvider should be responsible for
>> > > > recording
>> > > > >> all
>> > > > >> > the data relating to the GRAM jobs its working with.
>> > > > >> >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > >
>> > > > >> > > > 1.
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
>> > > > >> > > >
>> > > > >> > > > >
>> > > > >> > > > > Thanks
>> > > > >> > > > > Amila
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
>> > > > >> > > samindaw@gmail.com
>> > > > >> > > > > >wrote:
>> > > > >> > > > >
>> > > > >> > > > > > Thats is an excellent idea. We can have the job data
>> field
>> > > to
>> > > > be
>> > > > >> > the
>> > > > >> > > > > > designated GFac job serialized data. The whatever
>> > > GFacProvider
>> > > > >> > should
>> > > > >> > > > > > adhere to it.
>> > > > >> > > > > >
>> > > > >> > > > > > I'm still inclined to have the rest of the fields to
>> ease
>> > of
>> > > > >> > querying
>> > > > >> > > > for
>> > > > >> > > > > > the required data. For example if we wanted all
>> attempts
>> > on
>> > > > >> > executing
>> > > > >> > > > > for a
>> > > > >> > > > > > particular node of a workflow or if we wanted to know
>> > which
>> > > > >> > > application
>> > > > >> > > > > > descriptions are faster in execution or more reliable
>> etc.
>> > > we
>> > > > >> can
>> > > > >> > let
>> > > > >> > > > the
>> > > > >> > > > > > query language deal with it. wdyt?
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka
>> Menikkumbura <
>> > > > >> > > > > > danushka.menikkumbura@gmail.com> wrote:
>> > > > >> > > > > >
>> > > > >> > > > > > > Saminda,
>> > > > >> > > > > > >
>> > > > >> > > > > > > I think the data container does not need to have a
>> > generic
>> > > > >> > format.
>> > > > >> > > We
>> > > > >> > > > > can
>> > > > >> > > > > > > have a base class that facilitate object
>> > > > >> > > > serialization/deserialization
>> > > > >> > > > > > and
>> > > > >> > > > > > > let specific meta data structure implement them as
>> > > required.
>> > > > >> We
>> > > > >> > get
>> > > > >> > > > the
>> > > > >> > > > > > > Registry API to serialize objects and save them in a
>> > meta
>> > > > data
>> > > > >> > > table
>> > > > >> > > > > > (with
>> > > > >> > > > > > > just two columns?) and to deserialize as they are
>> loaded
>> > > off
>> > > > >> the
>> > > > >> > > > > > registry.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Danushka
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
>> > > > >> > > > samindaw@gmail.com
>> > > > >> > > > > > > >wrote:
>> > > > >> > > > > > >
>> > > > >> > > > > > > > It has being apparent more and more that saving the
>> > data
>> > > > >> > related
>> > > > >> > > to
>> > > > >> > > > > > > > executing a jobs from the GFac can be useful for
>> many
>> > > > >> reasons
>> > > > >> > > such
>> > > > >> > > > > as,
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > debugging
>> > > > >> > > > > > > > retrying
>> > > > >> > > > > > > > to make smart decisions on reliability/cost etc.
>> > > > >> > > > > > > > statistical analysis
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > Thus we thought of saving the data related to GFac
>> > jobs
>> > > in
>> > > > >> the
>> > > > >> > > > > registry
>> > > > >> > > > > > > in
>> > > > >> > > > > > > > order to facilitate feature such as above in the
>> > future.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > However a GFac job is potentially any sort of
>> > computing
>> > > > >> > resource
>> > > > >> > > > > access
>> > > > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come
>> up
>> > > > with a
>> > > > >> > > > > > generalized
>> > > > >> > > > > > > > data structure that can hold the data of any type
>> of
>> > > > >> resource.
>> > > > >> > > > > > Following
>> > > > >> > > > > > > > are the suggested data to save for a single GFac
>> job
>> > > > >> execution,
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > *experiment id, workflow instance id, node id* -
>> > > pinpoint
>> > > > >> the
>> > > > >> > > node
>> > > > >> > > > > > > > execution
>> > > > >> > > > > > > > *service, host, application description ids *-
>> > pinpoint
>> > > > the
>> > > > >> > > > > descriptors
>> > > > >> > > > > > > > responsible
>> > > > >> > > > > > > > *local job id* - the unique job id
>> retrieved/generated
>> > > per
>> > > > >> > > > execution
>> > > > >> > > > > > > > [PRIMARY KEY]
>> > > > >> > > > > > > > *job data* - data related executing the job (eg:
>> the
>> > rsl
>> > > > in
>> > > > >> > GRAM)
>> > > > >> > > > > > > > *submitted, completed time*
>> > > > >> > > > > > > > *completed status* - whether the job was
>> successfull
>> > or
>> > > > ran
>> > > > >> in
>> > > > >> > to
>> > > > >> > > > > > errors
>> > > > >> > > > > > > > etc.
>> > > > >> > > > > > > > *metadata* - custom field to add anything user
>> wants
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > Your feedback is most welcome. The API related
>> changes
>> > > > will
>> > > > >> > also
>> > > > >> > > be
>> > > > >> > > > > > > > discussed once we have a proper data structure. We
>> are
>> > > > >> hoping
>> > > > >> > to
>> > > > >> > > > > > > implement
>> > > > >> > > > > > > > this within next few days.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > Thanks,
>> > > > >> > > > > > > > Saminda
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

Application. But in our case we may have to use both. eg:
addApplicationJob(...) or addApplicationSubmission(...). The name
addApplication(...)
is misleading I think. wdyt?


On Wed, May 22, 2013 at 1:43 PM, Amila Jayasekara
<th...@gmail.com>wrote:

> What is more familiar ? "Application" or "Job" ?
>
> Thanks
> Amila
>
>
> On Wed, May 22, 2013 at 11:28 AM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > On Wed, May 22, 2013 at 11:22 AM, Amila Jayasekara
> > <th...@gmail.com>wrote:
> >
> > > I am bit concerned about the names. Are we assuming that API users has
> > > knowledge about GFac ?
> > > OR else we can just remove "GFac" substring and have method names like
> > > "void
> > > updateJobMetadta(..)"
> > >
> > You have a point there Amila. Perhaps we can name them as "Application"
> > rather than GFac since we already have the notion of an application
> > descriptor in the API. wdyt?
> >
> >
> > > Thanks
> > > Amila
> > >
> > >
> > > On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <
> samindaw@gmail.com
> > > >wrote:
> > >
> > > > Following API functions are added for the ProvenanceManager[2],
> > > >
> > > > boolean isGFacJobExists(String gfacJobId)
> > > > void addGFacJob(GFacJob job)
> > > > void updateGFacJob(GFacJob job)
> > > > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
> > > > void updateGFacJobData(String gfacJobId, String jobdata)
> > > > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
> > > > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
> > > > void updateGFacJobMetadta(String gfacJobId, String metadata)
> > > > GFacJob getGFacJob(String gfacJobId)
> > > > List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId,
> > > String
> > > > hostDescriptionId, String applicationDescriptionId)
> > > > List<GFacJob> getGFacJobs(String experimentId, String
> > > workflowExecutionId,
> > > > String nodeId)
> > > >
> > > > Thoughts are welcome!!!
> > > >
> > > >
> > > > 2.
> > > >
> > > >
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <
> samindaw@gmail.com
> > > > >wrote:
> > > >
> > > > > But I thought the providers are part of the GFac (not as a separate
> > > > > service). If not then the providers should report to GFac. Orelse
> > there
> > > > is
> > > > > no way the GFac knows what status to update which data to update
> etc.
> > > > Does
> > > > > the current GFac implementation support this?
> > > > >
> > > > >
> > > > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
> > > > thejaka.amila@gmail.com
> > > > > > wrote:
> > > > >
> > > > >> I think that should be handled at a more upper layer like Workflow
> > > > >> Interpretter or GFac. In FT perspective it is better if providers
> > are
> > > > >> stateless. One reason is we dont have control over some providers
> > and
> > > > and
> > > > >> there will be many places writing to disk if we implement the
> > > > persistence
> > > > >> logic at provider level.
> > > > >>
> > > > >> Thanks
> > > > >> Amila
> > > > >>
> > > > >>
> > > > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <
> > > samindaw@gmail.com
> > > > >> >wrote:
> > > > >>
> > > > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> > > > >> > <th...@gmail.com>wrote:
> > > > >> >
> > > > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
> > > > >> samindaw@gmail.com
> > > > >> > > >wrote:
> > > > >> > >
> > > > >> > > > Thanks for the feedback Amila. a few comments inline
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > > > >> > > > <th...@gmail.com>wrote:
> > > > >> > > >
> > > > >> > > > > Hi Saminda,
> > > > >> > > > >
> > > > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > > > >> > > > > serialize/de-serilized data.
> > > > >> > > > > Few suggestions,
> > > > >> > > > > 1. In addition to successful/error statuses we need other
> > > status
> > > > >> for
> > > > >> > > > nodes
> > > > >> > > > > & workflows
> > > > >> > > > > and workflows.
> > > > >> > > > > E . g :-
> > > > >> > > > >    node - started, submitted, in-progress, failed,
> > successful
> > > > etc
> > > > >> ...
> > > > >> > > > >
> > > > >> > > > Sorry if I was too vague. Yes we have more fine-grain
> statuses
> > > for
> > > > >> > > workflow
> > > > >> > > > and node[1]. We will have a much fine-grained level of
> > > granuality
> > > > >> for a
> > > > >> > > > GFacJob status.
> > > > >> > > >     public static enum GFacJobStatus{
> > > > >> > > >         SUBMITTED, //job is submitted, possibly waiting to
> > start
> > > > >> > > executing
> > > > >> > > >         EXECUTING, //submitted job is being executed
> > > > >> > > >         CANCELLED, //job was cancelled
> > > > >> > > >         PAUSED, //job was paused
> > > > >> > > >         WAITING_FOR_DATA, // job is waiting for data to
> > continue
> > > > >> > > executing
> > > > >> > > >         FAILED, // error occurred while job was executing
> and
> > > the
> > > > >> job
> > > > >> > > > stopped
> > > > >> > > >         FINISHED, // job completed successfully
> > > > >> > > >         UNKNOWN // unknown status. lookup the metadata for
> > more
> > > > >> > details.
> > > > >> > > >     }
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2. This data will be useful in implementing FT and Load
> > > Balancing
> > > > in
> > > > >> > each
> > > > >> > > > > component. Sometime back we had discussions to make GFac
> > > > >> stateless.
> > > > >> > So
> > > > >> > > > who
> > > > >> > > > > is going to populate this data structure and persist it ?
> > > > >> > > > >
> > > > >> > > > That is a very good question... :). This summer is going to
> > be a
> > > > >> long
> > > > >> > > > one... ;)
> > > > >> > > >
> > > > >> > >
> > > > >> > > What I meant is which component is doing persistence ? (GFac
> or
> > WF
> > > > >> > > Interpretter). Not the actual person who is going to implement
> > it
> > > > :).
> > > > >> > >
> > > > >> > hih hih....
> > > > >> > Well its going to be whatever the provider respondible for
> > managing
> > > > the
> > > > >> job
> > > > >> > lifecycle. For example GRAMProvider should be responsible for
> > > > recording
> > > > >> all
> > > > >> > the data relating to the GRAM jobs its working with.
> > > > >> >
> > > > >> > >
> > > > >> > >
> > > > >> > > >
> > > > >> > > > 1.
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> > > > >> > > >
> > > > >> > > > >
> > > > >> > > > > Thanks
> > > > >> > > > > Amila
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> > > > >> > > samindaw@gmail.com
> > > > >> > > > > >wrote:
> > > > >> > > > >
> > > > >> > > > > > Thats is an excellent idea. We can have the job data
> field
> > > to
> > > > be
> > > > >> > the
> > > > >> > > > > > designated GFac job serialized data. The whatever
> > > GFacProvider
> > > > >> > should
> > > > >> > > > > > adhere to it.
> > > > >> > > > > >
> > > > >> > > > > > I'm still inclined to have the rest of the fields to
> ease
> > of
> > > > >> > querying
> > > > >> > > > for
> > > > >> > > > > > the required data. For example if we wanted all attempts
> > on
> > > > >> > executing
> > > > >> > > > > for a
> > > > >> > > > > > particular node of a workflow or if we wanted to know
> > which
> > > > >> > > application
> > > > >> > > > > > descriptions are faster in execution or more reliable
> etc.
> > > we
> > > > >> can
> > > > >> > let
> > > > >> > > > the
> > > > >> > > > > > query language deal with it. wdyt?
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura
> <
> > > > >> > > > > > danushka.menikkumbura@gmail.com> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Saminda,
> > > > >> > > > > > >
> > > > >> > > > > > > I think the data container does not need to have a
> > generic
> > > > >> > format.
> > > > >> > > We
> > > > >> > > > > can
> > > > >> > > > > > > have a base class that facilitate object
> > > > >> > > > serialization/deserialization
> > > > >> > > > > > and
> > > > >> > > > > > > let specific meta data structure implement them as
> > > required.
> > > > >> We
> > > > >> > get
> > > > >> > > > the
> > > > >> > > > > > > Registry API to serialize objects and save them in a
> > meta
> > > > data
> > > > >> > > table
> > > > >> > > > > > (with
> > > > >> > > > > > > just two columns?) and to deserialize as they are
> loaded
> > > off
> > > > >> the
> > > > >> > > > > > registry.
> > > > >> > > > > > >
> > > > >> > > > > > > Danushka
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > > > >> > > > samindaw@gmail.com
> > > > >> > > > > > > >wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > It has being apparent more and more that saving the
> > data
> > > > >> > related
> > > > >> > > to
> > > > >> > > > > > > > executing a jobs from the GFac can be useful for
> many
> > > > >> reasons
> > > > >> > > such
> > > > >> > > > > as,
> > > > >> > > > > > > >
> > > > >> > > > > > > > debugging
> > > > >> > > > > > > > retrying
> > > > >> > > > > > > > to make smart decisions on reliability/cost etc.
> > > > >> > > > > > > > statistical analysis
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thus we thought of saving the data related to GFac
> > jobs
> > > in
> > > > >> the
> > > > >> > > > > registry
> > > > >> > > > > > > in
> > > > >> > > > > > > > order to facilitate feature such as above in the
> > future.
> > > > >> > > > > > > >
> > > > >> > > > > > > > However a GFac job is potentially any sort of
> > computing
> > > > >> > resource
> > > > >> > > > > access
> > > > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come
> up
> > > > with a
> > > > >> > > > > > generalized
> > > > >> > > > > > > > data structure that can hold the data of any type of
> > > > >> resource.
> > > > >> > > > > > Following
> > > > >> > > > > > > > are the suggested data to save for a single GFac job
> > > > >> execution,
> > > > >> > > > > > > >
> > > > >> > > > > > > > *experiment id, workflow instance id, node id* -
> > > pinpoint
> > > > >> the
> > > > >> > > node
> > > > >> > > > > > > > execution
> > > > >> > > > > > > > *service, host, application description ids *-
> > pinpoint
> > > > the
> > > > >> > > > > descriptors
> > > > >> > > > > > > > responsible
> > > > >> > > > > > > > *local job id* - the unique job id
> retrieved/generated
> > > per
> > > > >> > > > execution
> > > > >> > > > > > > > [PRIMARY KEY]
> > > > >> > > > > > > > *job data* - data related executing the job (eg: the
> > rsl
> > > > in
> > > > >> > GRAM)
> > > > >> > > > > > > > *submitted, completed time*
> > > > >> > > > > > > > *completed status* - whether the job was successfull
> > or
> > > > ran
> > > > >> in
> > > > >> > to
> > > > >> > > > > > errors
> > > > >> > > > > > > > etc.
> > > > >> > > > > > > > *metadata* - custom field to add anything user wants
> > > > >> > > > > > > >
> > > > >> > > > > > > > Your feedback is most welcome. The API related
> changes
> > > > will
> > > > >> > also
> > > > >> > > be
> > > > >> > > > > > > > discussed once we have a proper data structure. We
> are
> > > > >> hoping
> > > > >> > to
> > > > >> > > > > > > implement
> > > > >> > > > > > > > this within next few days.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > > Saminda
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Amila Jayasekara <th...@gmail.com>.

What is more familiar ? "Application" or "Job" ?

Thanks
Amila


On Wed, May 22, 2013 at 11:28 AM, Saminda Wijeratne <sa...@gmail.com>wrote:

> On Wed, May 22, 2013 at 11:22 AM, Amila Jayasekara
> <th...@gmail.com>wrote:
>
> > I am bit concerned about the names. Are we assuming that API users has
> > knowledge about GFac ?
> > OR else we can just remove "GFac" substring and have method names like
> > "void
> > updateJobMetadta(..)"
> >
> You have a point there Amila. Perhaps we can name them as "Application"
> rather than GFac since we already have the notion of an application
> descriptor in the API. wdyt?
>
>
> > Thanks
> > Amila
> >
> >
> > On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <samindaw@gmail.com
> > >wrote:
> >
> > > Following API functions are added for the ProvenanceManager[2],
> > >
> > > boolean isGFacJobExists(String gfacJobId)
> > > void addGFacJob(GFacJob job)
> > > void updateGFacJob(GFacJob job)
> > > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
> > > void updateGFacJobData(String gfacJobId, String jobdata)
> > > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
> > > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
> > > void updateGFacJobMetadta(String gfacJobId, String metadata)
> > > GFacJob getGFacJob(String gfacJobId)
> > > List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId,
> > String
> > > hostDescriptionId, String applicationDescriptionId)
> > > List<GFacJob> getGFacJobs(String experimentId, String
> > workflowExecutionId,
> > > String nodeId)
> > >
> > > Thoughts are welcome!!!
> > >
> > >
> > > 2.
> > >
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
> > >
> > >
> > > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <samindaw@gmail.com
> > > >wrote:
> > >
> > > > But I thought the providers are part of the GFac (not as a separate
> > > > service). If not then the providers should report to GFac. Orelse
> there
> > > is
> > > > no way the GFac knows what status to update which data to update etc.
> > > Does
> > > > the current GFac implementation support this?
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
> > > thejaka.amila@gmail.com
> > > > > wrote:
> > > >
> > > >> I think that should be handled at a more upper layer like Workflow
> > > >> Interpretter or GFac. In FT perspective it is better if providers
> are
> > > >> stateless. One reason is we dont have control over some providers
> and
> > > and
> > > >> there will be many places writing to disk if we implement the
> > > persistence
> > > >> logic at provider level.
> > > >>
> > > >> Thanks
> > > >> Amila
> > > >>
> > > >>
> > > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <
> > samindaw@gmail.com
> > > >> >wrote:
> > > >>
> > > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> > > >> > <th...@gmail.com>wrote:
> > > >> >
> > > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
> > > >> samindaw@gmail.com
> > > >> > > >wrote:
> > > >> > >
> > > >> > > > Thanks for the feedback Amila. a few comments inline
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > > >> > > > <th...@gmail.com>wrote:
> > > >> > > >
> > > >> > > > > Hi Saminda,
> > > >> > > > >
> > > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > > >> > > > > serialize/de-serilized data.
> > > >> > > > > Few suggestions,
> > > >> > > > > 1. In addition to successful/error statuses we need other
> > status
> > > >> for
> > > >> > > > nodes
> > > >> > > > > & workflows
> > > >> > > > > and workflows.
> > > >> > > > > E . g :-
> > > >> > > > >    node - started, submitted, in-progress, failed,
> successful
> > > etc
> > > >> ...
> > > >> > > > >
> > > >> > > > Sorry if I was too vague. Yes we have more fine-grain statuses
> > for
> > > >> > > workflow
> > > >> > > > and node[1]. We will have a much fine-grained level of
> > granuality
> > > >> for a
> > > >> > > > GFacJob status.
> > > >> > > >     public static enum GFacJobStatus{
> > > >> > > >         SUBMITTED, //job is submitted, possibly waiting to
> start
> > > >> > > executing
> > > >> > > >         EXECUTING, //submitted job is being executed
> > > >> > > >         CANCELLED, //job was cancelled
> > > >> > > >         PAUSED, //job was paused
> > > >> > > >         WAITING_FOR_DATA, // job is waiting for data to
> continue
> > > >> > > executing
> > > >> > > >         FAILED, // error occurred while job was executing and
> > the
> > > >> job
> > > >> > > > stopped
> > > >> > > >         FINISHED, // job completed successfully
> > > >> > > >         UNKNOWN // unknown status. lookup the metadata for
> more
> > > >> > details.
> > > >> > > >     }
> > > >> > > >
> > > >> > > >
> > > >> > > > 2. This data will be useful in implementing FT and Load
> > Balancing
> > > in
> > > >> > each
> > > >> > > > > component. Sometime back we had discussions to make GFac
> > > >> stateless.
> > > >> > So
> > > >> > > > who
> > > >> > > > > is going to populate this data structure and persist it ?
> > > >> > > > >
> > > >> > > > That is a very good question... :). This summer is going to
> be a
> > > >> long
> > > >> > > > one... ;)
> > > >> > > >
> > > >> > >
> > > >> > > What I meant is which component is doing persistence ? (GFac or
> WF
> > > >> > > Interpretter). Not the actual person who is going to implement
> it
> > > :).
> > > >> > >
> > > >> > hih hih....
> > > >> > Well its going to be whatever the provider respondible for
> managing
> > > the
> > > >> job
> > > >> > lifecycle. For example GRAMProvider should be responsible for
> > > recording
> > > >> all
> > > >> > the data relating to the GRAM jobs its working with.
> > > >> >
> > > >> > >
> > > >> > >
> > > >> > > >
> > > >> > > > 1.
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> > > >> > > >
> > > >> > > > >
> > > >> > > > > Thanks
> > > >> > > > > Amila
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> > > >> > > samindaw@gmail.com
> > > >> > > > > >wrote:
> > > >> > > > >
> > > >> > > > > > Thats is an excellent idea. We can have the job data field
> > to
> > > be
> > > >> > the
> > > >> > > > > > designated GFac job serialized data. The whatever
> > GFacProvider
> > > >> > should
> > > >> > > > > > adhere to it.
> > > >> > > > > >
> > > >> > > > > > I'm still inclined to have the rest of the fields to ease
> of
> > > >> > querying
> > > >> > > > for
> > > >> > > > > > the required data. For example if we wanted all attempts
> on
> > > >> > executing
> > > >> > > > > for a
> > > >> > > > > > particular node of a workflow or if we wanted to know
> which
> > > >> > > application
> > > >> > > > > > descriptions are faster in execution or more reliable etc.
> > we
> > > >> can
> > > >> > let
> > > >> > > > the
> > > >> > > > > > query language deal with it. wdyt?
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > > >> > > > > > danushka.menikkumbura@gmail.com> wrote:
> > > >> > > > > >
> > > >> > > > > > > Saminda,
> > > >> > > > > > >
> > > >> > > > > > > I think the data container does not need to have a
> generic
> > > >> > format.
> > > >> > > We
> > > >> > > > > can
> > > >> > > > > > > have a base class that facilitate object
> > > >> > > > serialization/deserialization
> > > >> > > > > > and
> > > >> > > > > > > let specific meta data structure implement them as
> > required.
> > > >> We
> > > >> > get
> > > >> > > > the
> > > >> > > > > > > Registry API to serialize objects and save them in a
> meta
> > > data
> > > >> > > table
> > > >> > > > > > (with
> > > >> > > > > > > just two columns?) and to deserialize as they are loaded
> > off
> > > >> the
> > > >> > > > > > registry.
> > > >> > > > > > >
> > > >> > > > > > > Danushka
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > > >> > > > samindaw@gmail.com
> > > >> > > > > > > >wrote:
> > > >> > > > > > >
> > > >> > > > > > > > It has being apparent more and more that saving the
> data
> > > >> > related
> > > >> > > to
> > > >> > > > > > > > executing a jobs from the GFac can be useful for many
> > > >> reasons
> > > >> > > such
> > > >> > > > > as,
> > > >> > > > > > > >
> > > >> > > > > > > > debugging
> > > >> > > > > > > > retrying
> > > >> > > > > > > > to make smart decisions on reliability/cost etc.
> > > >> > > > > > > > statistical analysis
> > > >> > > > > > > >
> > > >> > > > > > > > Thus we thought of saving the data related to GFac
> jobs
> > in
> > > >> the
> > > >> > > > > registry
> > > >> > > > > > > in
> > > >> > > > > > > > order to facilitate feature such as above in the
> future.
> > > >> > > > > > > >
> > > >> > > > > > > > However a GFac job is potentially any sort of
> computing
> > > >> > resource
> > > >> > > > > access
> > > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up
> > > with a
> > > >> > > > > > generalized
> > > >> > > > > > > > data structure that can hold the data of any type of
> > > >> resource.
> > > >> > > > > > Following
> > > >> > > > > > > > are the suggested data to save for a single GFac job
> > > >> execution,
> > > >> > > > > > > >
> > > >> > > > > > > > *experiment id, workflow instance id, node id* -
> > pinpoint
> > > >> the
> > > >> > > node
> > > >> > > > > > > > execution
> > > >> > > > > > > > *service, host, application description ids *-
> pinpoint
> > > the
> > > >> > > > > descriptors
> > > >> > > > > > > > responsible
> > > >> > > > > > > > *local job id* - the unique job id retrieved/generated
> > per
> > > >> > > > execution
> > > >> > > > > > > > [PRIMARY KEY]
> > > >> > > > > > > > *job data* - data related executing the job (eg: the
> rsl
> > > in
> > > >> > GRAM)
> > > >> > > > > > > > *submitted, completed time*
> > > >> > > > > > > > *completed status* - whether the job was successfull
> or
> > > ran
> > > >> in
> > > >> > to
> > > >> > > > > > errors
> > > >> > > > > > > > etc.
> > > >> > > > > > > > *metadata* - custom field to add anything user wants
> > > >> > > > > > > >
> > > >> > > > > > > > Your feedback is most welcome. The API related changes
> > > will
> > > >> > also
> > > >> > > be
> > > >> > > > > > > > discussed once we have a proper data structure. We are
> > > >> hoping
> > > >> > to
> > > >> > > > > > > implement
> > > >> > > > > > > > this within next few days.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Saminda
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

On Wed, May 22, 2013 at 11:22 AM, Amila Jayasekara
<th...@gmail.com>wrote:

> I am bit concerned about the names. Are we assuming that API users has
> knowledge about GFac ?
> OR else we can just remove "GFac" substring and have method names like
> "void
> updateJobMetadta(..)"
>
You have a point there Amila. Perhaps we can name them as "Application"
rather than GFac since we already have the notion of an application
descriptor in the API. wdyt?


> Thanks
> Amila
>
>
> On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > Following API functions are added for the ProvenanceManager[2],
> >
> > boolean isGFacJobExists(String gfacJobId)
> > void addGFacJob(GFacJob job)
> > void updateGFacJob(GFacJob job)
> > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
> > void updateGFacJobData(String gfacJobId, String jobdata)
> > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
> > void updateGFacJobCompletedTime(String gfacJobId, Date completed)
> > void updateGFacJobMetadta(String gfacJobId, String metadata)
> > GFacJob getGFacJob(String gfacJobId)
> > List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId,
> String
> > hostDescriptionId, String applicationDescriptionId)
> > List<GFacJob> getGFacJobs(String experimentId, String
> workflowExecutionId,
> > String nodeId)
> >
> > Thoughts are welcome!!!
> >
> >
> > 2.
> >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
> >
> >
> > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <samindaw@gmail.com
> > >wrote:
> >
> > > But I thought the providers are part of the GFac (not as a separate
> > > service). If not then the providers should report to GFac. Orelse there
> > is
> > > no way the GFac knows what status to update which data to update etc.
> > Does
> > > the current GFac implementation support this?
> > >
> > >
> > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
> > thejaka.amila@gmail.com
> > > > wrote:
> > >
> > >> I think that should be handled at a more upper layer like Workflow
> > >> Interpretter or GFac. In FT perspective it is better if providers are
> > >> stateless. One reason is we dont have control over some providers and
> > and
> > >> there will be many places writing to disk if we implement the
> > persistence
> > >> logic at provider level.
> > >>
> > >> Thanks
> > >> Amila
> > >>
> > >>
> > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <
> samindaw@gmail.com
> > >> >wrote:
> > >>
> > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> > >> > <th...@gmail.com>wrote:
> > >> >
> > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
> > >> samindaw@gmail.com
> > >> > > >wrote:
> > >> > >
> > >> > > > Thanks for the feedback Amila. a few comments inline
> > >> > > >
> > >> > > >
> > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > >> > > > <th...@gmail.com>wrote:
> > >> > > >
> > >> > > > > Hi Saminda,
> > >> > > > >
> > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > >> > > > > serialize/de-serilized data.
> > >> > > > > Few suggestions,
> > >> > > > > 1. In addition to successful/error statuses we need other
> status
> > >> for
> > >> > > > nodes
> > >> > > > > & workflows
> > >> > > > > and workflows.
> > >> > > > > E . g :-
> > >> > > > >    node - started, submitted, in-progress, failed, successful
> > etc
> > >> ...
> > >> > > > >
> > >> > > > Sorry if I was too vague. Yes we have more fine-grain statuses
> for
> > >> > > workflow
> > >> > > > and node[1]. We will have a much fine-grained level of
> granuality
> > >> for a
> > >> > > > GFacJob status.
> > >> > > >     public static enum GFacJobStatus{
> > >> > > >         SUBMITTED, //job is submitted, possibly waiting to start
> > >> > > executing
> > >> > > >         EXECUTING, //submitted job is being executed
> > >> > > >         CANCELLED, //job was cancelled
> > >> > > >         PAUSED, //job was paused
> > >> > > >         WAITING_FOR_DATA, // job is waiting for data to continue
> > >> > > executing
> > >> > > >         FAILED, // error occurred while job was executing and
> the
> > >> job
> > >> > > > stopped
> > >> > > >         FINISHED, // job completed successfully
> > >> > > >         UNKNOWN // unknown status. lookup the metadata for more
> > >> > details.
> > >> > > >     }
> > >> > > >
> > >> > > >
> > >> > > > 2. This data will be useful in implementing FT and Load
> Balancing
> > in
> > >> > each
> > >> > > > > component. Sometime back we had discussions to make GFac
> > >> stateless.
> > >> > So
> > >> > > > who
> > >> > > > > is going to populate this data structure and persist it ?
> > >> > > > >
> > >> > > > That is a very good question... :). This summer is going to be a
> > >> long
> > >> > > > one... ;)
> > >> > > >
> > >> > >
> > >> > > What I meant is which component is doing persistence ? (GFac or WF
> > >> > > Interpretter). Not the actual person who is going to implement it
> > :).
> > >> > >
> > >> > hih hih....
> > >> > Well its going to be whatever the provider respondible for managing
> > the
> > >> job
> > >> > lifecycle. For example GRAMProvider should be responsible for
> > recording
> > >> all
> > >> > the data relating to the GRAM jobs its working with.
> > >> >
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > 1.
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> > >> > > >
> > >> > > > >
> > >> > > > > Thanks
> > >> > > > > Amila
> > >> > > > >
> > >> > > > >
> > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> > >> > > samindaw@gmail.com
> > >> > > > > >wrote:
> > >> > > > >
> > >> > > > > > Thats is an excellent idea. We can have the job data field
> to
> > be
> > >> > the
> > >> > > > > > designated GFac job serialized data. The whatever
> GFacProvider
> > >> > should
> > >> > > > > > adhere to it.
> > >> > > > > >
> > >> > > > > > I'm still inclined to have the rest of the fields to ease of
> > >> > querying
> > >> > > > for
> > >> > > > > > the required data. For example if we wanted all attempts on
> > >> > executing
> > >> > > > > for a
> > >> > > > > > particular node of a workflow or if we wanted to know which
> > >> > > application
> > >> > > > > > descriptions are faster in execution or more reliable etc.
> we
> > >> can
> > >> > let
> > >> > > > the
> > >> > > > > > query language deal with it. wdyt?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > >> > > > > > danushka.menikkumbura@gmail.com> wrote:
> > >> > > > > >
> > >> > > > > > > Saminda,
> > >> > > > > > >
> > >> > > > > > > I think the data container does not need to have a generic
> > >> > format.
> > >> > > We
> > >> > > > > can
> > >> > > > > > > have a base class that facilitate object
> > >> > > > serialization/deserialization
> > >> > > > > > and
> > >> > > > > > > let specific meta data structure implement them as
> required.
> > >> We
> > >> > get
> > >> > > > the
> > >> > > > > > > Registry API to serialize objects and save them in a meta
> > data
> > >> > > table
> > >> > > > > > (with
> > >> > > > > > > just two columns?) and to deserialize as they are loaded
> off
> > >> the
> > >> > > > > > registry.
> > >> > > > > > >
> > >> > > > > > > Danushka
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > >> > > > samindaw@gmail.com
> > >> > > > > > > >wrote:
> > >> > > > > > >
> > >> > > > > > > > It has being apparent more and more that saving the data
> > >> > related
> > >> > > to
> > >> > > > > > > > executing a jobs from the GFac can be useful for many
> > >> reasons
> > >> > > such
> > >> > > > > as,
> > >> > > > > > > >
> > >> > > > > > > > debugging
> > >> > > > > > > > retrying
> > >> > > > > > > > to make smart decisions on reliability/cost etc.
> > >> > > > > > > > statistical analysis
> > >> > > > > > > >
> > >> > > > > > > > Thus we thought of saving the data related to GFac jobs
> in
> > >> the
> > >> > > > > registry
> > >> > > > > > > in
> > >> > > > > > > > order to facilitate feature such as above in the future.
> > >> > > > > > > >
> > >> > > > > > > > However a GFac job is potentially any sort of computing
> > >> > resource
> > >> > > > > access
> > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up
> > with a
> > >> > > > > > generalized
> > >> > > > > > > > data structure that can hold the data of any type of
> > >> resource.
> > >> > > > > > Following
> > >> > > > > > > > are the suggested data to save for a single GFac job
> > >> execution,
> > >> > > > > > > >
> > >> > > > > > > > *experiment id, workflow instance id, node id* -
> pinpoint
> > >> the
> > >> > > node
> > >> > > > > > > > execution
> > >> > > > > > > > *service, host, application description ids *- pinpoint
> > the
> > >> > > > > descriptors
> > >> > > > > > > > responsible
> > >> > > > > > > > *local job id* - the unique job id retrieved/generated
> per
> > >> > > > execution
> > >> > > > > > > > [PRIMARY KEY]
> > >> > > > > > > > *job data* - data related executing the job (eg: the rsl
> > in
> > >> > GRAM)
> > >> > > > > > > > *submitted, completed time*
> > >> > > > > > > > *completed status* - whether the job was successfull or
> > ran
> > >> in
> > >> > to
> > >> > > > > > errors
> > >> > > > > > > > etc.
> > >> > > > > > > > *metadata* - custom field to add anything user wants
> > >> > > > > > > >
> > >> > > > > > > > Your feedback is most welcome. The API related changes
> > will
> > >> > also
> > >> > > be
> > >> > > > > > > > discussed once we have a proper data structure. We are
> > >> hoping
> > >> > to
> > >> > > > > > > implement
> > >> > > > > > > > this within next few days.
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Saminda
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Amila Jayasekara <th...@gmail.com>.

I am bit concerned about the names. Are we assuming that API users has
knowledge about GFac ?
OR else we can just remove "GFac" substring and have method names like "void
updateJobMetadta(..)"

Thanks
Amila


On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> Following API functions are added for the ProvenanceManager[2],
>
> boolean isGFacJobExists(String gfacJobId)
> void addGFacJob(GFacJob job)
> void updateGFacJob(GFacJob job)
> void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
> void updateGFacJobData(String gfacJobId, String jobdata)
> void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
> void updateGFacJobCompletedTime(String gfacJobId, Date completed)
> void updateGFacJobMetadta(String gfacJobId, String metadata)
> GFacJob getGFacJob(String gfacJobId)
> List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId, String
> hostDescriptionId, String applicationDescriptionId)
> List<GFacJob> getGFacJobs(String experimentId, String workflowExecutionId,
> String nodeId)
>
> Thoughts are welcome!!!
>
>
> 2.
>
> https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java
>
>
> On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > But I thought the providers are part of the GFac (not as a separate
> > service). If not then the providers should report to GFac. Orelse there
> is
> > no way the GFac knows what status to update which data to update etc.
> Does
> > the current GFac implementation support this?
> >
> >
> > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <
> thejaka.amila@gmail.com
> > > wrote:
> >
> >> I think that should be handled at a more upper layer like Workflow
> >> Interpretter or GFac. In FT perspective it is better if providers are
> >> stateless. One reason is we dont have control over some providers and
> and
> >> there will be many places writing to disk if we implement the
> persistence
> >> logic at provider level.
> >>
> >> Thanks
> >> Amila
> >>
> >>
> >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <samindaw@gmail.com
> >> >wrote:
> >>
> >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> >> > <th...@gmail.com>wrote:
> >> >
> >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
> >> samindaw@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > Thanks for the feedback Amila. a few comments inline
> >> > > >
> >> > > >
> >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> >> > > > <th...@gmail.com>wrote:
> >> > > >
> >> > > > > Hi Saminda,
> >> > > > >
> >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> >> > > > > serialize/de-serilized data.
> >> > > > > Few suggestions,
> >> > > > > 1. In addition to successful/error statuses we need other status
> >> for
> >> > > > nodes
> >> > > > > & workflows
> >> > > > > and workflows.
> >> > > > > E . g :-
> >> > > > >    node - started, submitted, in-progress, failed, successful
> etc
> >> ...
> >> > > > >
> >> > > > Sorry if I was too vague. Yes we have more fine-grain statuses for
> >> > > workflow
> >> > > > and node[1]. We will have a much fine-grained level of granuality
> >> for a
> >> > > > GFacJob status.
> >> > > >     public static enum GFacJobStatus{
> >> > > >         SUBMITTED, //job is submitted, possibly waiting to start
> >> > > executing
> >> > > >         EXECUTING, //submitted job is being executed
> >> > > >         CANCELLED, //job was cancelled
> >> > > >         PAUSED, //job was paused
> >> > > >         WAITING_FOR_DATA, // job is waiting for data to continue
> >> > > executing
> >> > > >         FAILED, // error occurred while job was executing and the
> >> job
> >> > > > stopped
> >> > > >         FINISHED, // job completed successfully
> >> > > >         UNKNOWN // unknown status. lookup the metadata for more
> >> > details.
> >> > > >     }
> >> > > >
> >> > > >
> >> > > > 2. This data will be useful in implementing FT and Load Balancing
> in
> >> > each
> >> > > > > component. Sometime back we had discussions to make GFac
> >> stateless.
> >> > So
> >> > > > who
> >> > > > > is going to populate this data structure and persist it ?
> >> > > > >
> >> > > > That is a very good question... :). This summer is going to be a
> >> long
> >> > > > one... ;)
> >> > > >
> >> > >
> >> > > What I meant is which component is doing persistence ? (GFac or WF
> >> > > Interpretter). Not the actual person who is going to implement it
> :).
> >> > >
> >> > hih hih....
> >> > Well its going to be whatever the provider respondible for managing
> the
> >> job
> >> > lifecycle. For example GRAMProvider should be responsible for
> recording
> >> all
> >> > the data relating to the GRAM jobs its working with.
> >> >
> >> > >
> >> > >
> >> > > >
> >> > > > 1.
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> >> > > >
> >> > > > >
> >> > > > > Thanks
> >> > > > > Amila
> >> > > > >
> >> > > > >
> >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> >> > > samindaw@gmail.com
> >> > > > > >wrote:
> >> > > > >
> >> > > > > > Thats is an excellent idea. We can have the job data field to
> be
> >> > the
> >> > > > > > designated GFac job serialized data. The whatever GFacProvider
> >> > should
> >> > > > > > adhere to it.
> >> > > > > >
> >> > > > > > I'm still inclined to have the rest of the fields to ease of
> >> > querying
> >> > > > for
> >> > > > > > the required data. For example if we wanted all attempts on
> >> > executing
> >> > > > > for a
> >> > > > > > particular node of a workflow or if we wanted to know which
> >> > > application
> >> > > > > > descriptions are faster in execution or more reliable etc. we
> >> can
> >> > let
> >> > > > the
> >> > > > > > query language deal with it. wdyt?
> >> > > > > >
> >> > > > > >
> >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> >> > > > > > danushka.menikkumbura@gmail.com> wrote:
> >> > > > > >
> >> > > > > > > Saminda,
> >> > > > > > >
> >> > > > > > > I think the data container does not need to have a generic
> >> > format.
> >> > > We
> >> > > > > can
> >> > > > > > > have a base class that facilitate object
> >> > > > serialization/deserialization
> >> > > > > > and
> >> > > > > > > let specific meta data structure implement them as required.
> >> We
> >> > get
> >> > > > the
> >> > > > > > > Registry API to serialize objects and save them in a meta
> data
> >> > > table
> >> > > > > > (with
> >> > > > > > > just two columns?) and to deserialize as they are loaded off
> >> the
> >> > > > > > registry.
> >> > > > > > >
> >> > > > > > > Danushka
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> >> > > > samindaw@gmail.com
> >> > > > > > > >wrote:
> >> > > > > > >
> >> > > > > > > > It has being apparent more and more that saving the data
> >> > related
> >> > > to
> >> > > > > > > > executing a jobs from the GFac can be useful for many
> >> reasons
> >> > > such
> >> > > > > as,
> >> > > > > > > >
> >> > > > > > > > debugging
> >> > > > > > > > retrying
> >> > > > > > > > to make smart decisions on reliability/cost etc.
> >> > > > > > > > statistical analysis
> >> > > > > > > >
> >> > > > > > > > Thus we thought of saving the data related to GFac jobs in
> >> the
> >> > > > > registry
> >> > > > > > > in
> >> > > > > > > > order to facilitate feature such as above in the future.
> >> > > > > > > >
> >> > > > > > > > However a GFac job is potentially any sort of computing
> >> > resource
> >> > > > > access
> >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up
> with a
> >> > > > > > generalized
> >> > > > > > > > data structure that can hold the data of any type of
> >> resource.
> >> > > > > > Following
> >> > > > > > > > are the suggested data to save for a single GFac job
> >> execution,
> >> > > > > > > >
> >> > > > > > > > *experiment id, workflow instance id, node id* - pinpoint
> >> the
> >> > > node
> >> > > > > > > > execution
> >> > > > > > > > *service, host, application description ids *- pinpoint
> the
> >> > > > > descriptors
> >> > > > > > > > responsible
> >> > > > > > > > *local job id* - the unique job id retrieved/generated per
> >> > > > execution
> >> > > > > > > > [PRIMARY KEY]
> >> > > > > > > > *job data* - data related executing the job (eg: the rsl
> in
> >> > GRAM)
> >> > > > > > > > *submitted, completed time*
> >> > > > > > > > *completed status* - whether the job was successfull or
> ran
> >> in
> >> > to
> >> > > > > > errors
> >> > > > > > > > etc.
> >> > > > > > > > *metadata* - custom field to add anything user wants
> >> > > > > > > >
> >> > > > > > > > Your feedback is most welcome. The API related changes
> will
> >> > also
> >> > > be
> >> > > > > > > > discussed once we have a proper data structure. We are
> >> hoping
> >> > to
> >> > > > > > > implement
> >> > > > > > > > this within next few days.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Saminda
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

Following API functions are added for the ProvenanceManager[2],

boolean isGFacJobExists(String gfacJobId)
void addGFacJob(GFacJob job)
void updateGFacJob(GFacJob job)
void updateGFacJobStatus(String gfacJobId, GFacJobStatus status)
void updateGFacJobData(String gfacJobId, String jobdata)
void updateGFacJobSubmittedTime(String gfacJobId, Date submitted)
void updateGFacJobCompletedTime(String gfacJobId, Date completed)
void updateGFacJobMetadta(String gfacJobId, String metadata)
GFacJob getGFacJob(String gfacJobId)
List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId, String
hostDescriptionId, String applicationDescriptionId)
List<GFacJob> getGFacJobs(String experimentId, String workflowExecutionId,
String nodeId)

Thoughts are welcome!!!


2.
https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java


On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> But I thought the providers are part of the GFac (not as a separate
> service). If not then the providers should report to GFac. Orelse there is
> no way the GFac knows what status to update which data to update etc. Does
> the current GFac implementation support this?
>
>
> On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara <thejaka.amila@gmail.com
> > wrote:
>
>> I think that should be handled at a more upper layer like Workflow
>> Interpretter or GFac. In FT perspective it is better if providers are
>> stateless. One reason is we dont have control over some providers and and
>> there will be many places writing to disk if we implement the persistence
>> logic at provider level.
>>
>> Thanks
>> Amila
>>
>>
>> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <samindaw@gmail.com
>> >wrote:
>>
>> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
>> > <th...@gmail.com>wrote:
>> >
>> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <
>> samindaw@gmail.com
>> > > >wrote:
>> > >
>> > > > Thanks for the feedback Amila. a few comments inline
>> > > >
>> > > >
>> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
>> > > > <th...@gmail.com>wrote:
>> > > >
>> > > > > Hi Saminda,
>> > > > >
>> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
>> > > > > serialize/de-serilized data.
>> > > > > Few suggestions,
>> > > > > 1. In addition to successful/error statuses we need other status
>> for
>> > > > nodes
>> > > > > & workflows
>> > > > > and workflows.
>> > > > > E . g :-
>> > > > >    node - started, submitted, in-progress, failed, successful etc
>> ...
>> > > > >
>> > > > Sorry if I was too vague. Yes we have more fine-grain statuses for
>> > > workflow
>> > > > and node[1]. We will have a much fine-grained level of granuality
>> for a
>> > > > GFacJob status.
>> > > >     public static enum GFacJobStatus{
>> > > >         SUBMITTED, //job is submitted, possibly waiting to start
>> > > executing
>> > > >         EXECUTING, //submitted job is being executed
>> > > >         CANCELLED, //job was cancelled
>> > > >         PAUSED, //job was paused
>> > > >         WAITING_FOR_DATA, // job is waiting for data to continue
>> > > executing
>> > > >         FAILED, // error occurred while job was executing and the
>> job
>> > > > stopped
>> > > >         FINISHED, // job completed successfully
>> > > >         UNKNOWN // unknown status. lookup the metadata for more
>> > details.
>> > > >     }
>> > > >
>> > > >
>> > > > 2. This data will be useful in implementing FT and Load Balancing in
>> > each
>> > > > > component. Sometime back we had discussions to make GFac
>> stateless.
>> > So
>> > > > who
>> > > > > is going to populate this data structure and persist it ?
>> > > > >
>> > > > That is a very good question... :). This summer is going to be a
>> long
>> > > > one... ;)
>> > > >
>> > >
>> > > What I meant is which component is doing persistence ? (GFac or WF
>> > > Interpretter). Not the actual person who is going to implement it :).
>> > >
>> > hih hih....
>> > Well its going to be whatever the provider respondible for managing the
>> job
>> > lifecycle. For example GRAMProvider should be responsible for recording
>> all
>> > the data relating to the GRAM jobs its working with.
>> >
>> > >
>> > >
>> > > >
>> > > > 1.
>> > > >
>> > > >
>> > >
>> >
>> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
>> > > >
>> > > > >
>> > > > > Thanks
>> > > > > Amila
>> > > > >
>> > > > >
>> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
>> > > samindaw@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > Thats is an excellent idea. We can have the job data field to be
>> > the
>> > > > > > designated GFac job serialized data. The whatever GFacProvider
>> > should
>> > > > > > adhere to it.
>> > > > > >
>> > > > > > I'm still inclined to have the rest of the fields to ease of
>> > querying
>> > > > for
>> > > > > > the required data. For example if we wanted all attempts on
>> > executing
>> > > > > for a
>> > > > > > particular node of a workflow or if we wanted to know which
>> > > application
>> > > > > > descriptions are faster in execution or more reliable etc. we
>> can
>> > let
>> > > > the
>> > > > > > query language deal with it. wdyt?
>> > > > > >
>> > > > > >
>> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
>> > > > > > danushka.menikkumbura@gmail.com> wrote:
>> > > > > >
>> > > > > > > Saminda,
>> > > > > > >
>> > > > > > > I think the data container does not need to have a generic
>> > format.
>> > > We
>> > > > > can
>> > > > > > > have a base class that facilitate object
>> > > > serialization/deserialization
>> > > > > > and
>> > > > > > > let specific meta data structure implement them as required.
>> We
>> > get
>> > > > the
>> > > > > > > Registry API to serialize objects and save them in a meta data
>> > > table
>> > > > > > (with
>> > > > > > > just two columns?) and to deserialize as they are loaded off
>> the
>> > > > > > registry.
>> > > > > > >
>> > > > > > > Danushka
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
>> > > > samindaw@gmail.com
>> > > > > > > >wrote:
>> > > > > > >
>> > > > > > > > It has being apparent more and more that saving the data
>> > related
>> > > to
>> > > > > > > > executing a jobs from the GFac can be useful for many
>> reasons
>> > > such
>> > > > > as,
>> > > > > > > >
>> > > > > > > > debugging
>> > > > > > > > retrying
>> > > > > > > > to make smart decisions on reliability/cost etc.
>> > > > > > > > statistical analysis
>> > > > > > > >
>> > > > > > > > Thus we thought of saving the data related to GFac jobs in
>> the
>> > > > > registry
>> > > > > > > in
>> > > > > > > > order to facilitate feature such as above in the future.
>> > > > > > > >
>> > > > > > > > However a GFac job is potentially any sort of computing
>> > resource
>> > > > > access
>> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
>> > > > > > generalized
>> > > > > > > > data structure that can hold the data of any type of
>> resource.
>> > > > > > Following
>> > > > > > > > are the suggested data to save for a single GFac job
>> execution,
>> > > > > > > >
>> > > > > > > > *experiment id, workflow instance id, node id* - pinpoint
>> the
>> > > node
>> > > > > > > > execution
>> > > > > > > > *service, host, application description ids *- pinpoint the
>> > > > > descriptors
>> > > > > > > > responsible
>> > > > > > > > *local job id* - the unique job id retrieved/generated per
>> > > > execution
>> > > > > > > > [PRIMARY KEY]
>> > > > > > > > *job data* - data related executing the job (eg: the rsl in
>> > GRAM)
>> > > > > > > > *submitted, completed time*
>> > > > > > > > *completed status* - whether the job was successfull or ran
>> in
>> > to
>> > > > > > errors
>> > > > > > > > etc.
>> > > > > > > > *metadata* - custom field to add anything user wants
>> > > > > > > >
>> > > > > > > > Your feedback is most welcome. The API related changes will
>> > also
>> > > be
>> > > > > > > > discussed once we have a proper data structure. We are
>> hoping
>> > to
>> > > > > > > implement
>> > > > > > > > this within next few days.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Saminda
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

But I thought the providers are part of the GFac (not as a separate
service). If not then the providers should report to GFac. Orelse there is
no way the GFac knows what status to update which data to update etc. Does
the current GFac implementation support this?


On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara
<th...@gmail.com>wrote:

> I think that should be handled at a more upper layer like Workflow
> Interpretter or GFac. In FT perspective it is better if providers are
> stateless. One reason is we dont have control over some providers and and
> there will be many places writing to disk if we implement the persistence
> logic at provider level.
>
> Thanks
> Amila
>
>
> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> > <th...@gmail.com>wrote:
> >
> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <samindaw@gmail.com
> > > >wrote:
> > >
> > > > Thanks for the feedback Amila. a few comments inline
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > > > <th...@gmail.com>wrote:
> > > >
> > > > > Hi Saminda,
> > > > >
> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > > > > serialize/de-serilized data.
> > > > > Few suggestions,
> > > > > 1. In addition to successful/error statuses we need other status
> for
> > > > nodes
> > > > > & workflows
> > > > > and workflows.
> > > > > E . g :-
> > > > >    node - started, submitted, in-progress, failed, successful etc
> ...
> > > > >
> > > > Sorry if I was too vague. Yes we have more fine-grain statuses for
> > > workflow
> > > > and node[1]. We will have a much fine-grained level of granuality
> for a
> > > > GFacJob status.
> > > >     public static enum GFacJobStatus{
> > > >         SUBMITTED, //job is submitted, possibly waiting to start
> > > executing
> > > >         EXECUTING, //submitted job is being executed
> > > >         CANCELLED, //job was cancelled
> > > >         PAUSED, //job was paused
> > > >         WAITING_FOR_DATA, // job is waiting for data to continue
> > > executing
> > > >         FAILED, // error occurred while job was executing and the job
> > > > stopped
> > > >         FINISHED, // job completed successfully
> > > >         UNKNOWN // unknown status. lookup the metadata for more
> > details.
> > > >     }
> > > >
> > > >
> > > > 2. This data will be useful in implementing FT and Load Balancing in
> > each
> > > > > component. Sometime back we had discussions to make GFac stateless.
> > So
> > > > who
> > > > > is going to populate this data structure and persist it ?
> > > > >
> > > > That is a very good question... :). This summer is going to be a long
> > > > one... ;)
> > > >
> > >
> > > What I meant is which component is doing persistence ? (GFac or WF
> > > Interpretter). Not the actual person who is going to implement it :).
> > >
> > hih hih....
> > Well its going to be whatever the provider respondible for managing the
> job
> > lifecycle. For example GRAMProvider should be responsible for recording
> all
> > the data relating to the GRAM jobs its working with.
> >
> > >
> > >
> > > >
> > > > 1.
> > > >
> > > >
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> > > >
> > > > >
> > > > > Thanks
> > > > > Amila
> > > > >
> > > > >
> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> > > samindaw@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Thats is an excellent idea. We can have the job data field to be
> > the
> > > > > > designated GFac job serialized data. The whatever GFacProvider
> > should
> > > > > > adhere to it.
> > > > > >
> > > > > > I'm still inclined to have the rest of the fields to ease of
> > querying
> > > > for
> > > > > > the required data. For example if we wanted all attempts on
> > executing
> > > > > for a
> > > > > > particular node of a workflow or if we wanted to know which
> > > application
> > > > > > descriptions are faster in execution or more reliable etc. we can
> > let
> > > > the
> > > > > > query language deal with it. wdyt?
> > > > > >
> > > > > >
> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > > > > > danushka.menikkumbura@gmail.com> wrote:
> > > > > >
> > > > > > > Saminda,
> > > > > > >
> > > > > > > I think the data container does not need to have a generic
> > format.
> > > We
> > > > > can
> > > > > > > have a base class that facilitate object
> > > > serialization/deserialization
> > > > > > and
> > > > > > > let specific meta data structure implement them as required. We
> > get
> > > > the
> > > > > > > Registry API to serialize objects and save them in a meta data
> > > table
> > > > > > (with
> > > > > > > just two columns?) and to deserialize as they are loaded off
> the
> > > > > > registry.
> > > > > > >
> > > > > > > Danushka
> > > > > > >
> > > > > > >
> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > > > samindaw@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > It has being apparent more and more that saving the data
> > related
> > > to
> > > > > > > > executing a jobs from the GFac can be useful for many reasons
> > > such
> > > > > as,
> > > > > > > >
> > > > > > > > debugging
> > > > > > > > retrying
> > > > > > > > to make smart decisions on reliability/cost etc.
> > > > > > > > statistical analysis
> > > > > > > >
> > > > > > > > Thus we thought of saving the data related to GFac jobs in
> the
> > > > > registry
> > > > > > > in
> > > > > > > > order to facilitate feature such as above in the future.
> > > > > > > >
> > > > > > > > However a GFac job is potentially any sort of computing
> > resource
> > > > > access
> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> > > > > > generalized
> > > > > > > > data structure that can hold the data of any type of
> resource.
> > > > > > Following
> > > > > > > > are the suggested data to save for a single GFac job
> execution,
> > > > > > > >
> > > > > > > > *experiment id, workflow instance id, node id* - pinpoint the
> > > node
> > > > > > > > execution
> > > > > > > > *service, host, application description ids *- pinpoint the
> > > > > descriptors
> > > > > > > > responsible
> > > > > > > > *local job id* - the unique job id retrieved/generated per
> > > > execution
> > > > > > > > [PRIMARY KEY]
> > > > > > > > *job data* - data related executing the job (eg: the rsl in
> > GRAM)
> > > > > > > > *submitted, completed time*
> > > > > > > > *completed status* - whether the job was successfull or ran
> in
> > to
> > > > > > errors
> > > > > > > > etc.
> > > > > > > > *metadata* - custom field to add anything user wants
> > > > > > > >
> > > > > > > > Your feedback is most welcome. The API related changes will
> > also
> > > be
> > > > > > > > discussed once we have a proper data structure. We are hoping
> > to
> > > > > > > implement
> > > > > > > > this within next few days.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Saminda
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Amila Jayasekara <th...@gmail.com>.

I think that should be handled at a more upper layer like Workflow
Interpretter or GFac. In FT perspective it is better if providers are
stateless. One reason is we dont have control over some providers and and
there will be many places writing to disk if we implement the persistence
logic at provider level.

Thanks
Amila


On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
> <th...@gmail.com>wrote:
>
> > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <samindaw@gmail.com
> > >wrote:
> >
> > > Thanks for the feedback Amila. a few comments inline
> > >
> > >
> > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > > <th...@gmail.com>wrote:
> > >
> > > > Hi Saminda,
> > > >
> > > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > > > serialize/de-serilized data.
> > > > Few suggestions,
> > > > 1. In addition to successful/error statuses we need other status for
> > > nodes
> > > > & workflows
> > > > and workflows.
> > > > E . g :-
> > > >    node - started, submitted, in-progress, failed, successful etc ...
> > > >
> > > Sorry if I was too vague. Yes we have more fine-grain statuses for
> > workflow
> > > and node[1]. We will have a much fine-grained level of granuality for a
> > > GFacJob status.
> > >     public static enum GFacJobStatus{
> > >         SUBMITTED, //job is submitted, possibly waiting to start
> > executing
> > >         EXECUTING, //submitted job is being executed
> > >         CANCELLED, //job was cancelled
> > >         PAUSED, //job was paused
> > >         WAITING_FOR_DATA, // job is waiting for data to continue
> > executing
> > >         FAILED, // error occurred while job was executing and the job
> > > stopped
> > >         FINISHED, // job completed successfully
> > >         UNKNOWN // unknown status. lookup the metadata for more
> details.
> > >     }
> > >
> > >
> > > 2. This data will be useful in implementing FT and Load Balancing in
> each
> > > > component. Sometime back we had discussions to make GFac stateless.
> So
> > > who
> > > > is going to populate this data structure and persist it ?
> > > >
> > > That is a very good question... :). This summer is going to be a long
> > > one... ;)
> > >
> >
> > What I meant is which component is doing persistence ? (GFac or WF
> > Interpretter). Not the actual person who is going to implement it :).
> >
> hih hih....
> Well its going to be whatever the provider respondible for managing the job
> lifecycle. For example GRAMProvider should be responsible for recording all
> the data relating to the GRAM jobs its working with.
>
> >
> >
> > >
> > > 1.
> > >
> > >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> > >
> > > >
> > > > Thanks
> > > > Amila
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> > samindaw@gmail.com
> > > > >wrote:
> > > >
> > > > > Thats is an excellent idea. We can have the job data field to be
> the
> > > > > designated GFac job serialized data. The whatever GFacProvider
> should
> > > > > adhere to it.
> > > > >
> > > > > I'm still inclined to have the rest of the fields to ease of
> querying
> > > for
> > > > > the required data. For example if we wanted all attempts on
> executing
> > > > for a
> > > > > particular node of a workflow or if we wanted to know which
> > application
> > > > > descriptions are faster in execution or more reliable etc. we can
> let
> > > the
> > > > > query language deal with it. wdyt?
> > > > >
> > > > >
> > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > > > > danushka.menikkumbura@gmail.com> wrote:
> > > > >
> > > > > > Saminda,
> > > > > >
> > > > > > I think the data container does not need to have a generic
> format.
> > We
> > > > can
> > > > > > have a base class that facilitate object
> > > serialization/deserialization
> > > > > and
> > > > > > let specific meta data structure implement them as required. We
> get
> > > the
> > > > > > Registry API to serialize objects and save them in a meta data
> > table
> > > > > (with
> > > > > > just two columns?) and to deserialize as they are loaded off the
> > > > > registry.
> > > > > >
> > > > > > Danushka
> > > > > >
> > > > > >
> > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > > samindaw@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > It has being apparent more and more that saving the data
> related
> > to
> > > > > > > executing a jobs from the GFac can be useful for many reasons
> > such
> > > > as,
> > > > > > >
> > > > > > > debugging
> > > > > > > retrying
> > > > > > > to make smart decisions on reliability/cost etc.
> > > > > > > statistical analysis
> > > > > > >
> > > > > > > Thus we thought of saving the data related to GFac jobs in the
> > > > registry
> > > > > > in
> > > > > > > order to facilitate feature such as above in the future.
> > > > > > >
> > > > > > > However a GFac job is potentially any sort of computing
> resource
> > > > access
> > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> > > > > generalized
> > > > > > > data structure that can hold the data of any type of resource.
> > > > > Following
> > > > > > > are the suggested data to save for a single GFac job execution,
> > > > > > >
> > > > > > > *experiment id, workflow instance id, node id* - pinpoint the
> > node
> > > > > > > execution
> > > > > > > *service, host, application description ids *- pinpoint the
> > > > descriptors
> > > > > > > responsible
> > > > > > > *local job id* - the unique job id retrieved/generated per
> > > execution
> > > > > > > [PRIMARY KEY]
> > > > > > > *job data* - data related executing the job (eg: the rsl in
> GRAM)
> > > > > > > *submitted, completed time*
> > > > > > > *completed status* - whether the job was successfull or ran in
> to
> > > > > errors
> > > > > > > etc.
> > > > > > > *metadata* - custom field to add anything user wants
> > > > > > >
> > > > > > > Your feedback is most welcome. The API related changes will
> also
> > be
> > > > > > > discussed once we have a proper data structure. We are hoping
> to
> > > > > > implement
> > > > > > > this within next few days.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Saminda
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara
<th...@gmail.com>wrote:

> On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > Thanks for the feedback Amila. a few comments inline
> >
> >
> > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> > <th...@gmail.com>wrote:
> >
> > > Hi Saminda,
> > >
> > > Great suggestion. Also +1 for Dhanushka's proposal to have
> > > serialize/de-serilized data.
> > > Few suggestions,
> > > 1. In addition to successful/error statuses we need other status for
> > nodes
> > > & workflows
> > > and workflows.
> > > E . g :-
> > >    node - started, submitted, in-progress, failed, successful etc ...
> > >
> > Sorry if I was too vague. Yes we have more fine-grain statuses for
> workflow
> > and node[1]. We will have a much fine-grained level of granuality for a
> > GFacJob status.
> >     public static enum GFacJobStatus{
> >         SUBMITTED, //job is submitted, possibly waiting to start
> executing
> >         EXECUTING, //submitted job is being executed
> >         CANCELLED, //job was cancelled
> >         PAUSED, //job was paused
> >         WAITING_FOR_DATA, // job is waiting for data to continue
> executing
> >         FAILED, // error occurred while job was executing and the job
> > stopped
> >         FINISHED, // job completed successfully
> >         UNKNOWN // unknown status. lookup the metadata for more details.
> >     }
> >
> >
> > 2. This data will be useful in implementing FT and Load Balancing in each
> > > component. Sometime back we had discussions to make GFac stateless. So
> > who
> > > is going to populate this data structure and persist it ?
> > >
> > That is a very good question... :). This summer is going to be a long
> > one... ;)
> >
>
> What I meant is which component is doing persistence ? (GFac or WF
> Interpretter). Not the actual person who is going to implement it :).
>
hih hih....
Well its going to be whatever the provider respondible for managing the job
lifecycle. For example GRAMProvider should be responsible for recording all
the data relating to the GRAM jobs its working with.

>
>
> >
> > 1.
> >
> >
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
> >
> > >
> > > Thanks
> > > Amila
> > >
> > >
> > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <
> samindaw@gmail.com
> > > >wrote:
> > >
> > > > Thats is an excellent idea. We can have the job data field to be the
> > > > designated GFac job serialized data. The whatever GFacProvider should
> > > > adhere to it.
> > > >
> > > > I'm still inclined to have the rest of the fields to ease of querying
> > for
> > > > the required data. For example if we wanted all attempts on executing
> > > for a
> > > > particular node of a workflow or if we wanted to know which
> application
> > > > descriptions are faster in execution or more reliable etc. we can let
> > the
> > > > query language deal with it. wdyt?
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > > > danushka.menikkumbura@gmail.com> wrote:
> > > >
> > > > > Saminda,
> > > > >
> > > > > I think the data container does not need to have a generic format.
> We
> > > can
> > > > > have a base class that facilitate object
> > serialization/deserialization
> > > > and
> > > > > let specific meta data structure implement them as required. We get
> > the
> > > > > Registry API to serialize objects and save them in a meta data
> table
> > > > (with
> > > > > just two columns?) and to deserialize as they are loaded off the
> > > > registry.
> > > > >
> > > > > Danushka
> > > > >
> > > > >
> > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> > samindaw@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > It has being apparent more and more that saving the data related
> to
> > > > > > executing a jobs from the GFac can be useful for many reasons
> such
> > > as,
> > > > > >
> > > > > > debugging
> > > > > > retrying
> > > > > > to make smart decisions on reliability/cost etc.
> > > > > > statistical analysis
> > > > > >
> > > > > > Thus we thought of saving the data related to GFac jobs in the
> > > registry
> > > > > in
> > > > > > order to facilitate feature such as above in the future.
> > > > > >
> > > > > > However a GFac job is potentially any sort of computing resource
> > > access
> > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> > > > generalized
> > > > > > data structure that can hold the data of any type of resource.
> > > > Following
> > > > > > are the suggested data to save for a single GFac job execution,
> > > > > >
> > > > > > *experiment id, workflow instance id, node id* - pinpoint the
> node
> > > > > > execution
> > > > > > *service, host, application description ids *- pinpoint the
> > > descriptors
> > > > > > responsible
> > > > > > *local job id* - the unique job id retrieved/generated per
> > execution
> > > > > > [PRIMARY KEY]
> > > > > > *job data* - data related executing the job (eg: the rsl in GRAM)
> > > > > > *submitted, completed time*
> > > > > > *completed status* - whether the job was successfull or ran in to
> > > > errors
> > > > > > etc.
> > > > > > *metadata* - custom field to add anything user wants
> > > > > >
> > > > > > Your feedback is most welcome. The API related changes will also
> be
> > > > > > discussed once we have a proper data structure. We are hoping to
> > > > > implement
> > > > > > this within next few days.
> > > > > >
> > > > > > Thanks,
> > > > > > Saminda
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Amila Jayasekara <th...@gmail.com>.

On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> Thanks for the feedback Amila. a few comments inline
>
>
> On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
> <th...@gmail.com>wrote:
>
> > Hi Saminda,
> >
> > Great suggestion. Also +1 for Dhanushka's proposal to have
> > serialize/de-serilized data.
> > Few suggestions,
> > 1. In addition to successful/error statuses we need other status for
> nodes
> > & workflows
> > and workflows.
> > E . g :-
> >    node - started, submitted, in-progress, failed, successful etc ...
> >
> Sorry if I was too vague. Yes we have more fine-grain statuses for workflow
> and node[1]. We will have a much fine-grained level of granuality for a
> GFacJob status.
>     public static enum GFacJobStatus{
>         SUBMITTED, //job is submitted, possibly waiting to start executing
>         EXECUTING, //submitted job is being executed
>         CANCELLED, //job was cancelled
>         PAUSED, //job was paused
>         WAITING_FOR_DATA, // job is waiting for data to continue executing
>         FAILED, // error occurred while job was executing and the job
> stopped
>         FINISHED, // job completed successfully
>         UNKNOWN // unknown status. lookup the metadata for more details.
>     }
>
>
> 2. This data will be useful in implementing FT and Load Balancing in each
> > component. Sometime back we had discussions to make GFac stateless. So
> who
> > is going to populate this data structure and persist it ?
> >
> That is a very good question... :). This summer is going to be a long
> one... ;)
>

What I meant is which component is doing persistence ? (GFac or WF
Interpretter). Not the actual person who is going to implement it :).


>
> 1.
>
> https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java
>
> >
> > Thanks
> > Amila
> >
> >
> > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <samindaw@gmail.com
> > >wrote:
> >
> > > Thats is an excellent idea. We can have the job data field to be the
> > > designated GFac job serialized data. The whatever GFacProvider should
> > > adhere to it.
> > >
> > > I'm still inclined to have the rest of the fields to ease of querying
> for
> > > the required data. For example if we wanted all attempts on executing
> > for a
> > > particular node of a workflow or if we wanted to know which application
> > > descriptions are faster in execution or more reliable etc. we can let
> the
> > > query language deal with it. wdyt?
> > >
> > >
> > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > > danushka.menikkumbura@gmail.com> wrote:
> > >
> > > > Saminda,
> > > >
> > > > I think the data container does not need to have a generic format. We
> > can
> > > > have a base class that facilitate object
> serialization/deserialization
> > > and
> > > > let specific meta data structure implement them as required. We get
> the
> > > > Registry API to serialize objects and save them in a meta data table
> > > (with
> > > > just two columns?) and to deserialize as they are loaded off the
> > > registry.
> > > >
> > > > Danushka
> > > >
> > > >
> > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <
> samindaw@gmail.com
> > > > >wrote:
> > > >
> > > > > It has being apparent more and more that saving the data related to
> > > > > executing a jobs from the GFac can be useful for many reasons such
> > as,
> > > > >
> > > > > debugging
> > > > > retrying
> > > > > to make smart decisions on reliability/cost etc.
> > > > > statistical analysis
> > > > >
> > > > > Thus we thought of saving the data related to GFac jobs in the
> > registry
> > > > in
> > > > > order to facilitate feature such as above in the future.
> > > > >
> > > > > However a GFac job is potentially any sort of computing resource
> > access
> > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> > > generalized
> > > > > data structure that can hold the data of any type of resource.
> > > Following
> > > > > are the suggested data to save for a single GFac job execution,
> > > > >
> > > > > *experiment id, workflow instance id, node id* - pinpoint the node
> > > > > execution
> > > > > *service, host, application description ids *- pinpoint the
> > descriptors
> > > > > responsible
> > > > > *local job id* - the unique job id retrieved/generated per
> execution
> > > > > [PRIMARY KEY]
> > > > > *job data* - data related executing the job (eg: the rsl in GRAM)
> > > > > *submitted, completed time*
> > > > > *completed status* - whether the job was successfull or ran in to
> > > errors
> > > > > etc.
> > > > > *metadata* - custom field to add anything user wants
> > > > >
> > > > > Your feedback is most welcome. The API related changes will also be
> > > > > discussed once we have a proper data structure. We are hoping to
> > > > implement
> > > > > this within next few days.
> > > > >
> > > > > Thanks,
> > > > > Saminda
> > > > >
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

Thanks for the feedback Amila. a few comments inline


On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara
<th...@gmail.com>wrote:

> Hi Saminda,
>
> Great suggestion. Also +1 for Dhanushka's proposal to have
> serialize/de-serilized data.
> Few suggestions,
> 1. In addition to successful/error statuses we need other status for nodes
> & workflows
> and workflows.
> E . g :-
>    node - started, submitted, in-progress, failed, successful etc ...
>
Sorry if I was too vague. Yes we have more fine-grain statuses for workflow
and node[1]. We will have a much fine-grained level of granuality for a
GFacJob status.
    public static enum GFacJobStatus{
        SUBMITTED, //job is submitted, possibly waiting to start executing
        EXECUTING, //submitted job is being executed
        CANCELLED, //job was cancelled
        PAUSED, //job was paused
        WAITING_FOR_DATA, // job is waiting for data to continue executing
        FAILED, // error occurred while job was executing and the job
stopped
        FINISHED, // job completed successfully
        UNKNOWN // unknown status. lookup the metadata for more details.
    }


2. This data will be useful in implementing FT and Load Balancing in each
> component. Sometime back we had discussions to make GFac stateless. So who
> is going to populate this data structure and persist it ?
>
That is a very good question... :). This summer is going to be a long
one... ;)

1.
https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java

>
> Thanks
> Amila
>
>
> On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > Thats is an excellent idea. We can have the job data field to be the
> > designated GFac job serialized data. The whatever GFacProvider should
> > adhere to it.
> >
> > I'm still inclined to have the rest of the fields to ease of querying for
> > the required data. For example if we wanted all attempts on executing
> for a
> > particular node of a workflow or if we wanted to know which application
> > descriptions are faster in execution or more reliable etc. we can let the
> > query language deal with it. wdyt?
> >
> >
> > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> > danushka.menikkumbura@gmail.com> wrote:
> >
> > > Saminda,
> > >
> > > I think the data container does not need to have a generic format. We
> can
> > > have a base class that facilitate object serialization/deserialization
> > and
> > > let specific meta data structure implement them as required. We get the
> > > Registry API to serialize objects and save them in a meta data table
> > (with
> > > just two columns?) and to deserialize as they are loaded off the
> > registry.
> > >
> > > Danushka
> > >
> > >
> > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <samindaw@gmail.com
> > > >wrote:
> > >
> > > > It has being apparent more and more that saving the data related to
> > > > executing a jobs from the GFac can be useful for many reasons such
> as,
> > > >
> > > > debugging
> > > > retrying
> > > > to make smart decisions on reliability/cost etc.
> > > > statistical analysis
> > > >
> > > > Thus we thought of saving the data related to GFac jobs in the
> registry
> > > in
> > > > order to facilitate feature such as above in the future.
> > > >
> > > > However a GFac job is potentially any sort of computing resource
> access
> > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> > generalized
> > > > data structure that can hold the data of any type of resource.
> > Following
> > > > are the suggested data to save for a single GFac job execution,
> > > >
> > > > *experiment id, workflow instance id, node id* - pinpoint the node
> > > > execution
> > > > *service, host, application description ids *- pinpoint the
> descriptors
> > > > responsible
> > > > *local job id* - the unique job id retrieved/generated per execution
> > > > [PRIMARY KEY]
> > > > *job data* - data related executing the job (eg: the rsl in GRAM)
> > > > *submitted, completed time*
> > > > *completed status* - whether the job was successfull or ran in to
> > errors
> > > > etc.
> > > > *metadata* - custom field to add anything user wants
> > > >
> > > > Your feedback is most welcome. The API related changes will also be
> > > > discussed once we have a proper data structure. We are hoping to
> > > implement
> > > > this within next few days.
> > > >
> > > > Thanks,
> > > > Saminda
> > > >
> > >
> >
>

Re: Persisting GFac job data

Posted by Amila Jayasekara <th...@gmail.com>.

Hi Saminda,

Great suggestion. Also +1 for Dhanushka's proposal to have
serialize/de-serilized data.
Few suggestions,
1. In addition to successful/error statuses we need other status for nodes
and workflows.
E . g :-
   node - started, submitted, in-progress, failed, successful etc ...
2. This data will be useful in implementing FT and Load Balancing in each
component. Sometime back we had discussions to make GFac stateless. So who
is going to populate this data structure and persist it ?

Thanks
Amila


On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <sa...@gmail.com>wrote:

> Thats is an excellent idea. We can have the job data field to be the
> designated GFac job serialized data. The whatever GFacProvider should
> adhere to it.
>
> I'm still inclined to have the rest of the fields to ease of querying for
> the required data. For example if we wanted all attempts on executing for a
> particular node of a workflow or if we wanted to know which application
> descriptions are faster in execution or more reliable etc. we can let the
> query language deal with it. wdyt?
>
>
> On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> danushka.menikkumbura@gmail.com> wrote:
>
> > Saminda,
> >
> > I think the data container does not need to have a generic format. We can
> > have a base class that facilitate object serialization/deserialization
> and
> > let specific meta data structure implement them as required. We get the
> > Registry API to serialize objects and save them in a meta data table
> (with
> > just two columns?) and to deserialize as they are loaded off the
> registry.
> >
> > Danushka
> >
> >
> > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <samindaw@gmail.com
> > >wrote:
> >
> > > It has being apparent more and more that saving the data related to
> > > executing a jobs from the GFac can be useful for many reasons such as,
> > >
> > > debugging
> > > retrying
> > > to make smart decisions on reliability/cost etc.
> > > statistical analysis
> > >
> > > Thus we thought of saving the data related to GFac jobs in the registry
> > in
> > > order to facilitate feature such as above in the future.
> > >
> > > However a GFac job is potentially any sort of computing resource access
> > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> generalized
> > > data structure that can hold the data of any type of resource.
> Following
> > > are the suggested data to save for a single GFac job execution,
> > >
> > > *experiment id, workflow instance id, node id* - pinpoint the node
> > > execution
> > > *service, host, application description ids *- pinpoint the descriptors
> > > responsible
> > > *local job id* - the unique job id retrieved/generated per execution
> > > [PRIMARY KEY]
> > > *job data* - data related executing the job (eg: the rsl in GRAM)
> > > *submitted, completed time*
> > > *completed status* - whether the job was successfull or ran in to
> errors
> > > etc.
> > > *metadata* - custom field to add anything user wants
> > >
> > > Your feedback is most welcome. The API related changes will also be
> > > discussed once we have a proper data structure. We are hoping to
> > implement
> > > this within next few days.
> > >
> > > Thanks,
> > > Saminda
> > >
> >
>

Re: Persisting GFac job data

Posted by Saminda Wijeratne <sa...@gmail.com>.

Thats is an excellent idea. We can have the job data field to be the
designated GFac job serialized data. The whatever GFacProvider should
adhere to it.

I'm still inclined to have the rest of the fields to ease of querying for
the required data. For example if we wanted all attempts on executing for a
particular node of a workflow or if we wanted to know which application
descriptions are faster in execution or more reliable etc. we can let the
query language deal with it. wdyt?


On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
danushka.menikkumbura@gmail.com> wrote:

> Saminda,
>
> I think the data container does not need to have a generic format. We can
> have a base class that facilitate object serialization/deserialization and
> let specific meta data structure implement them as required. We get the
> Registry API to serialize objects and save them in a meta data table (with
> just two columns?) and to deserialize as they are loaded off the registry.
>
> Danushka
>
>
> On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <samindaw@gmail.com
> >wrote:
>
> > It has being apparent more and more that saving the data related to
> > executing a jobs from the GFac can be useful for many reasons such as,
> >
> > debugging
> > retrying
> > to make smart decisions on reliability/cost etc.
> > statistical analysis
> >
> > Thus we thought of saving the data related to GFac jobs in the registry
> in
> > order to facilitate feature such as above in the future.
> >
> > However a GFac job is potentially any sort of computing resource access
> > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a generalized
> > data structure that can hold the data of any type of resource. Following
> > are the suggested data to save for a single GFac job execution,
> >
> > *experiment id, workflow instance id, node id* - pinpoint the node
> > execution
> > *service, host, application description ids *- pinpoint the descriptors
> > responsible
> > *local job id* - the unique job id retrieved/generated per execution
> > [PRIMARY KEY]
> > *job data* - data related executing the job (eg: the rsl in GRAM)
> > *submitted, completed time*
> > *completed status* - whether the job was successfull or ran in to errors
> > etc.
> > *metadata* - custom field to add anything user wants
> >
> > Your feedback is most welcome. The API related changes will also be
> > discussed once we have a proper data structure. We are hoping to
> implement
> > this within next few days.
> >
> > Thanks,
> > Saminda
> >
>

Re: Persisting GFac job data

Posted by Danushka Menikkumbura <da...@gmail.com>.

Saminda,

I think the data container does not need to have a generic format. We can
have a base class that facilitate object serialization/deserialization and
let specific meta data structure implement them as required. We get the
Registry API to serialize objects and save them in a meta data table (with
just two columns?) and to deserialize as they are loaded off the registry.

Danushka


On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <sa...@gmail.com>wrote:

> It has being apparent more and more that saving the data related to
> executing a jobs from the GFac can be useful for many reasons such as,
>
> debugging
> retrying
> to make smart decisions on reliability/cost etc.
> statistical analysis
>
> Thus we thought of saving the data related to GFac jobs in the registry in
> order to facilitate feature such as above in the future.
>
> However a GFac job is potentially any sort of computing resource access
> (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a generalized
> data structure that can hold the data of any type of resource. Following
> are the suggested data to save for a single GFac job execution,
>
> *experiment id, workflow instance id, node id* - pinpoint the node
> execution
> *service, host, application description ids *- pinpoint the descriptors
> responsible
> *local job id* - the unique job id retrieved/generated per execution
> [PRIMARY KEY]
> *job data* - data related executing the job (eg: the rsl in GRAM)
> *submitted, completed time*
> *completed status* - whether the job was successfull or ran in to errors
> etc.
> *metadata* - custom field to add anything user wants
>
> Your feedback is most welcome. The API related changes will also be
> discussed once we have a proper data structure. We are hoping to implement
> this within next few days.
>
> Thanks,
> Saminda
>