You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@taverna.apache.org by Stian Soiland-Reyes <so...@cs.manchester.ac.uk> on 2014/11/03 17:22:21 UTC

Taverna Player provenance

Nikos Minadakis and myself had a chat about provenance requirements
for when running workflows through the Taverna Player - in particular
at the http://portal.biovel.eu/


Nikos would want to get this kind of provenance out of a workflow run
in the Player/portal:

* modification events
* who did these modifications
* when
* inputs and outputs

(and presumably which workflow :) )


An example is the Data Refinement Workflow
https://portal.biovel.eu/workflows/641 which has several user
interactions that should be tracked.

Nikos would like to have access to the provenance primarily as
machine-accessible, but preferably also in a human-readable kind of
report. In particular he would like to mix-in his own
provenance-specific schema (crm digital ?).


I described how the Taverna Server can capture provenance of the
details of a workflow run and expose that as a Data Bundle -
https://github.com/taverna/taverna-prov#structure-of-exported-provenance

This is however basically a trace of every step of the workflow - and
would include the user modification as a series of events, like:

1) at 15:42:00 the workflow 1298319283 was started as run 2781721
2)  at 15:42:12 in run 2781721, the Interaction service named
"Ask_user_5" in workflow 1298319283 responded an output value 51231.
The value contains "GBIF".
3) at 15:42:53 in run 2781721, the Interaction service named
"run_analysis" in workflow 1298319283 used an input value 51231. The
value contains "GBIF".


Nikos would like to connect these interactions with who was using the
Portal. I checked, and the BioVel Portal is not yet using the version
of Taverna Server that allows capturing/export of Provenance (2.5.4) -
but is planning that upgrade soon.

The Taverna Server does not know who started the run from the
Player/Portal - so the Player would need to inject that additional
provenance afterwards.  (This should be doable within the same bundle
as it is a ZIP file where you can just add files).

I don't believe the Portal is capturing who is doing which interaction
- so if the run is shared with multiple people that might be something
additional to add.

It might be needed to mark up or understand the workflow so that the
resulting provenance only focus only on the steps that are 'important'
scientifically.



ACTION Nikos: To respond to this email with a deeper list of
requirements / queries that the provenance should be able to
capture/expose.

ACTION Nikos: Ping back to this thread in a week's time so we won't forget :)

ACTION Stian/Rob: Ask Rob if it is possible to turn on provenance for
workflow runs in the development instance of the portal

ACTION Rob/Alan: Is it possible for the player/portal to know who does
which interaction? Can they tell the server or should it be injected
after the fact?



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

Re: Taverna Player provenance

Posted by alaninmcr <al...@googlemail.com>.
On 02/12/2014 09:45, Nikos Minadakis wrote:
> Hello Alan,
>
> thank you for your answer.
>
> Regarding the versions question, Stian mentioned that Biovels portal is
> not using the version of Taverna Server that allows capturing/export of
> provenance but is planning that upgrade soon.

That may have been correct when you spoke with Stian, but the portal is 
now using version 2.5.4 of the server. (So his "soon" has happened.) The 
server has provenance exposure see 
http://dev.mygrid.org.uk/wiki/display/tav250/REST+API#RESTAPI-Resource:/runs/{id}/run-bundle

> So, I am asking if there
> is a time-plan for this.

It is done.

There have been some changes to the RO bundle specification and we will 
soon be releasing Taverna 2.5.1 that conforms to the specification. 
There will also be a new version of the server to accompany the new 
version of Taverna.

> Same for the capturing of who is doing which
> interaction.

That is not currently planned - we would need some well-described use 
cases. You are very welcome to specify use cases and to contribute to 
changes to the interaction activity.

> So, I will wait for Stians classifications for the rest of the answers.
>
> Thank you again for your response.
>
> Best Wishes,
>
> Nikos

Alan

Re: Taverna Player provenance

Posted by Nikos Minadakis <mi...@ics.forth.gr>.
Hello Alan,

thank you for your answer.

Regarding the versions question, Stian mentioned that Biovels portal is 
not using the version of Taverna Server that allows capturing/export of 
provenance but is planning that upgrade soon. So, I am asking if there 
is a time-plan for this. Same for the capturing of who is doing which 
interaction.

So, I will wait for Stians classifications for the rest of the answers.

Thank you again for your response.

Best Wishes,

Nikos

On 2014-12-01 18:08, alaninmcr wrote:
> On 01/12/2014 15:43, Nikos Minadakis wrote:
>> Hello Stian and Everyone,
>>
>> sorry for the delay of my answer but I just found out than I had to 
>> send
>> this email :P
>>
>> So, my main questions as you already mentioned are the following:
>>
>> 1) Can we track the provenance of data by using Taverna?
>>
>> You already answered to this and added that:
>> - Portal is not capturing who is doing which interaction so if the 
>> run
>> is shared with multiple people that might be something
>> additional to add.
>
> Interesting idea. The additional information could be fed back on the
> ATOM feed and then saved by the interaction service.
>
> This requires thought as workflow runs and their interactions will
> not always be done from a portal. In addition to running directly 
> from
> the Taverna Workbench, they could be called from IPython Notebook for
> example.
>
>> -I checked, and the BioVel Portal is not yet using the version of
>> Taverna Server that allows capturing/export of Provenance (2.5.4) -
>> but is planning that upgrade soon.
>
> Rob tells me that the portal is running with the 2.5.4 server. The
> provenance may not be exposed in the UI though.
>
>> So, this is fine by me and I would like to ask you if it is possible 
>> to
>> be more specific for the versions that will support such 
>> functionalities.
>
> I'm not quite sure what you are asking.
>
>> 2) Can I use my own provenance Schema in case I don't want to use
>> PROV-O? (CRM digital is an example, and we are using it in FORTH)
>
> I doubt that would be possible. Stian can give a more precise answer.
>
>> 3) To be more specific with my requirements.
>> I want to use taverna in order to implement a complex scientific
>> workflow that supports interactions from different Actors and 
>> Institutions.
>>
>> So lets consider that that the workflow consists of 3 steps A-B-C 
>> (in
>> real life they are much more) and 2 Institutions will interact. In1 
>> and In3
>>
>> In1 will start with step A. When this step finishes In2 will be 
>> informed
>> automatically and will continue with Step B. When step B is finished 
>> In1
>> will be automatically informed in order to continue with step C and
>> finish the workflow. Now imagine this carried out by 20 institutions 
>> and
>> hundreds of steps.
>>
>> As a result we should be able to know who did what and when. Not 
>> only
>> for the concrete steps but even for the actions that are done 
>> internally
>> in each institution (data provenance, runs, etc etc). And cause of 
>> the
>> specializations of the actions PROV-O may not be enough for tracking
>> such provenance, so another schema may be used for it.
>
> Additional information could probably be annotated. Stian can give
> more details of that.
>
>> Off course it would be great as a final goal to be able to go back. 
>> So,
>> to start from the final product of the execution and by following 
>> the
>> provenance information to be able to do as many steps back as needed 
>> and
>> to extract the previous results.
>>
>> My final questions is: Is Taverna capable of fulfilling such a
>> requirement? And of course if not, to what degree does it support 
>> it,
>> and what effort does it need to be extended?
>
> The interactions do not currently capture "who did it" - so that
> would not be possible at the moment.
>
> With regard to going upstream and looking at the preceding results,
> that is currently possible. That is how Taverna shows the 
> intermediate
> results. It may require SPARQL queries to obtain the information you
> want. Do you have examples?
>
>> If I could have an answer by the end of this week it would be great
>> since I will report it in a new project's meeting next week.
>>
>> Thank you,
>>
>> Nikos
>
> Alan


Re: Taverna Player provenance

Posted by alaninmcr <al...@googlemail.com>.
On 01/12/2014 15:43, Nikos Minadakis wrote:
> Hello Stian and Everyone,
>
> sorry for the delay of my answer but I just found out than I had to send
> this email :P
>
> So, my main questions as you already mentioned are the following:
>
> 1) Can we track the provenance of data by using Taverna?
>
> You already answered to this and added that:
> - Portal is not capturing who is doing which interaction so if the run
> is shared with multiple people that might be something
> additional to add.

Interesting idea. The additional information could be fed back on the 
ATOM feed and then saved by the interaction service.

This requires thought as workflow runs and their interactions will not 
always be done from a portal. In addition to running directly from the 
Taverna Workbench, they could be called from IPython Notebook for example.

> -I checked, and the BioVel Portal is not yet using the version of
> Taverna Server that allows capturing/export of Provenance (2.5.4) -
> but is planning that upgrade soon.

Rob tells me that the portal is running with the 2.5.4 server. The 
provenance may not be exposed in the UI though.

> So, this is fine by me and I would like to ask you if it is possible to
> be more specific for the versions that will support such functionalities.

I'm not quite sure what you are asking.

> 2) Can I use my own provenance Schema in case I don't want to use
> PROV-O? (CRM digital is an example, and we are using it in FORTH)

I doubt that would be possible. Stian can give a more precise answer.

> 3) To be more specific with my requirements.
> I want to use taverna in order to implement a complex scientific
> workflow that supports interactions from different Actors and Institutions.
>
> So lets consider that that the workflow consists of 3 steps A-B-C (in
> real life they are much more) and 2 Institutions will interact. In1 and In3
>
> In1 will start with step A. When this step finishes In2 will be informed
> automatically and will continue with Step B. When step B is finished In1
> will be automatically informed in order to continue with step C and
> finish the workflow. Now imagine this carried out by 20 institutions and
> hundreds of steps.
>
> As a result we should be able to know who did what and when. Not only
> for the concrete steps but even for the actions that are done internally
> in each institution (data provenance, runs, etc etc). And cause of the
> specializations of the actions PROV-O may not be enough for tracking
> such provenance, so another schema may be used for it.

Additional information could probably be annotated. Stian can give more 
details of that.

> Off course it would be great as a final goal to be able to go back. So,
> to start from the final product of the execution and by following the
> provenance information to be able to do as many steps back as needed and
> to extract the previous results.
>
> My final questions is: Is Taverna capable of fulfilling such a
> requirement? And of course if not, to what degree does it support it,
> and what effort does it need to be extended?

The interactions do not currently capture "who did it" - so that would 
not be possible at the moment.

With regard to going upstream and looking at the preceding results, that 
is currently possible. That is how Taverna shows the intermediate 
results. It may require SPARQL queries to obtain the information you 
want. Do you have examples?

> If I could have an answer by the end of this week it would be great
> since I will report it in a new project's meeting next week.
>
> Thank you,
>
> Nikos

Alan

Re: Taverna Player provenance

Posted by Nikos Minadakis <mi...@ics.forth.gr>.
Hello Stian and Everyone,

sorry for the delay of my answer but I just found out than I had to 
send this email :P

So, my main questions as you already mentioned are the following:

1) Can we track the provenance of data by using Taverna?

You already answered to this and added that:
- Portal is not capturing who is doing which interaction so if the run 
is shared with multiple people that might be something
additional to add.
-I checked, and the BioVel Portal is not yet using the version of 
Taverna Server that allows capturing/export of Provenance (2.5.4) -
but is planning that upgrade soon.

So, this is fine by me and I would like to ask you if it is possible to 
be more specific for the versions that will support such 
functionalities.

2) Can I use my own provenance Schema in case I don't want to use 
PROV-O? (CRM digital is an example, and we are using it in FORTH)

3) To be more specific with my requirements.
I want to use taverna in order to implement a complex scientific 
workflow that supports interactions from different Actors and 
Institutions.

So lets consider that that the workflow consists of 3 steps A-B-C (in 
real life they are much more) and 2 Institutions will interact. In1 and 
In3

In1 will start with step A. When this step finishes In2 will be 
informed automatically and will continue with Step B. When step B is 
finished In1 will be automatically informed in order to continue with 
step C and finish the workflow. Now imagine this carried out by 20 
institutions and hundreds of steps.

As a result we should be able to know who did what and when. Not only 
for the concrete steps but even for the actions that are done internally 
in each institution (data provenance, runs, etc etc). And cause of the 
specializations of the actions PROV-O may not be enough for tracking 
such provenance, so another schema may be used for it.

Off course it would be great as a final goal to be able to go back. So, 
to start from the final product of the execution and by following the 
provenance information to be able to do as many steps back as needed and 
to extract the previous results.

My final questions is: Is Taverna capable of fulfilling such a 
requirement? And of course if not, to what degree does it support it, 
and what effort does it need to be extended?

If I could have an answer by the end of this week it would be great 
since I will report it in a new project's meeting next week.

Thank you,

Nikos

On 2014-11-03 18:22, Stian Soiland-Reyes wrote:
> Nikos Minadakis and myself had a chat about provenance requirements
> for when running workflows through the Taverna Player - in particular
> at the http://portal.biovel.eu/
>
>
> Nikos would want to get this kind of provenance out of a workflow run
> in the Player/portal:
>
> * modification events
> * who did these modifications
> * when
> * inputs and outputs
>
> (and presumably which workflow :) )
>
>
> An example is the Data Refinement Workflow
> https://portal.biovel.eu/workflows/641 which has several user
> interactions that should be tracked.
>
> Nikos would like to have access to the provenance primarily as
> machine-accessible, but preferably also in a human-readable kind of
> report. In particular he would like to mix-in his own
> provenance-specific schema (crm digital ?).
>
>
> I described how the Taverna Server can capture provenance of the
> details of a workflow run and expose that as a Data Bundle -
> 
> https://github.com/taverna/taverna-prov#structure-of-exported-provenance
>
> This is however basically a trace of every step of the workflow - and
> would include the user modification as a series of events, like:
>
> 1) at 15:42:00 the workflow 1298319283 was started as run 2781721
> 2)  at 15:42:12 in run 2781721, the Interaction service named
> "Ask_user_5" in workflow 1298319283 responded an output value 51231.
> The value contains "GBIF".
> 3) at 15:42:53 in run 2781721, the Interaction service named
> "run_analysis" in workflow 1298319283 used an input value 51231. The
> value contains "GBIF".
>
>
> Nikos would like to connect these interactions with who was using the
> Portal. I checked, and the BioVel Portal is not yet using the version
> of Taverna Server that allows capturing/export of Provenance (2.5.4) 
> -
> but is planning that upgrade soon.
>
> The Taverna Server does not know who started the run from the
> Player/Portal - so the Player would need to inject that additional
> provenance afterwards.  (This should be doable within the same bundle
> as it is a ZIP file where you can just add files).
>
> I don't believe the Portal is capturing who is doing which 
> interaction
> - so if the run is shared with multiple people that might be 
> something
> additional to add.
>
> It might be needed to mark up or understand the workflow so that the
> resulting provenance only focus only on the steps that are 
> 'important'
> scientifically.
>
>
>
> ACTION Nikos: To respond to this email with a deeper list of
> requirements / queries that the provenance should be able to
> capture/expose.
>
> ACTION Nikos: Ping back to this thread in a week's time so we won't 
> forget :)
>
> ACTION Stian/Rob: Ask Rob if it is possible to turn on provenance for
> workflow runs in the development instance of the portal
>
> ACTION Rob/Alan: Is it possible for the player/portal to know who 
> does
> which interaction? Can they tell the server or should it be injected
> after the fact?