You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@taverna.apache.org by "Hoeftberger, Johann" <Jo...@DFCI.HARVARD.EDU> on 2014/11/25 17:36:31 UTC

Taverna and SGE

Hello,

I guess my first posting to the hackers group wasn't the best place for 
my question. Sorry. I repost my question here.

Can Taverna make use of a Sun Grid Engine Cluster and is there a good 
explanation how to do that?

I am looking for a working solution for this setting Taverna, SGE 
Cluster, Bioinformatic workflows.
I appreciate good suggestions, let me know your ideas, please.


Regards,
Johann




The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Re: Taverna and SGE

Posted by Stian Soiland-Reyes <so...@cs.manchester.ac.uk>.

You can execute any command line locally and by SSH from the Tool Activity.

http://dev.mygrid.org.uk/wiki/display/tav250/Tool+service

So as SGE has a command line interface, you can use qsub and friends
from the Tool activity. Note that you can use multiple commands within
the Tool service as it is executed as a shell script - and you can
pick up arbitrary files after execution, e.g. example.o1.



What is lesser known is that the implementation of the Tool Activity
has a pluggable execution point. We have previously used this to
submit jobs to ARQ: http://dx.doi.org/10.1093/bioinformatics/btn095 --
but it should be possible to implement support for SGE in the same
way, and thus avoid having to do the qsub commands - this could become
a plugin for Taverna.

See for instance how SSH was implemented:

https://github.com/taverna/taverna-external-tool-activity/tree/maintenance/src/main/java/de/uni_luebeck/inb/knowarc/usecases/invocation/ssh


The UI implications of these are however varying a lot per grid
environment, as they all require various things to be set up for the
client tools to work, e.g. security credentials, certificates, shell
variables, etc.   Some don't have Java bindings, so you will have to
basically still execute job submit commands from within the Java
implementation.

On 26 November 2014 at 15:14, Hoeftberger, Johann
<Jo...@dfci.harvard.edu> wrote:
> Hello Alan,
>
> thank you for your answer.
>
> On 11/26/2014 08:55 AM, alaninmcr wrote:
>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>> explanation how to do that?
>>
>> What do you mean by "make use of" ? Do you want to have a workflow be
>> run on a node in the cluster or to have individual steps within a
>> workflow run be executed on nodes in the cluster?
>
> I think both.
> The most important need for me seems to be to have the ability that
> individual steps within a workflow run on our SGE cluster transparently
> for the user which uses the workflow. So those workflow steps should be
> executed on nodes of the SGE cluster, the whole handshake for the usage
> of those steps should be done by the Taverna system itself.
>
> If that's possible, and I hope I can find a solution for that approach,
> I don't see limitation to create whole workflows running on the SGE grid
> too, except the "administration" of those workflows which should run on
> a separated local PC where the Taverna GUI is running.
>
> Or maybe later a commandline tool for the same purpose running on a
> local PC AND/OR running on the SGE Grid cluster as cronjob.
>
>
>>> I am looking for a working solution for this setting Taverna, SGE
>>> Cluster, Bioinformatic workflows.
>>> I appreciate good suggestions, let me know your ideas, please.
>>
>> I think it depends exactly what you are trying to do.
>
> My aim is to build Taverna workflows which can make use of our SGE grid
> for the computational demanding parts transparently (invisible) in the
> background. Depending on the concrete workflow this could mean that all
> workflow steps should be executed on the SGE grid.
>
> I couldn't find any SGE Taverna solution so far, it seems to me SGE
> isn't supported by Taverna!?
>
>
> Regards,
> Johann
>
>
> --
>
>
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in error
> but does not contain patient information, please contact the sender and properly
> dispose of the e-mail.
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

Re: Taverna and SGE

Posted by Marlon Pierce <ma...@iu.edu>.

Now is an excellent time to get involved, with Taverna going through 
Apache incubation.  Demonstrating community contributions and welcoming 
new committers is part of the process.

Marlon

On 11/27/14, 8:41 PM, Stian Soiland-Reyes wrote:
> We would welcome any effort in adding an SGE integration. Please join
> the developer list if you want to discuss this in detail.
> http://mail-archives.apache.org/mod_mbox/taverna-dev/
>
> There's already interest in generalizing a "Execute this step
> elsewhere" mechanism (not just for Tool activity) - so if we think of
> it like this rather than just for SGE then it might get more interest
> (where say SGE would just be a configuration that said "submit=qsub %1
> --cluster=%2"  or something).
>
>
> There are some intermediary solutions as well - although technically
> Taverna would be pulling from SGE, the ENGINE and workFLOW is not -
> the Tool Activity would be a single step in the workflow, and the
> pulling is just part of its configuration.
>
> Now your users might not all want to type all this qsub and so on for
> every executable they want to run - so there are two faster solutions
> I can think of:
>
>
> a) Describe the possible executables (e.g. blast) using the "Usecase"
> XML that can be imported into "Available Services". Inside the
> description here, the commands could all include the
> submit-pull-retrieve logic. From the Available Services users can then
> drag-drop commands into the workflow and won't see the qsub etc.
> unless they need to customize the parameters.
> http://dev.mygrid.org.uk/wiki/display/tav250/Importing+a+set+of+tool+descriptions
>
> b) Create External Tool activities pr executable, then package each as
> a Component. These can then be shared on myExperiment as a
> group/"family". Users can add this group of components to the
> workbench for selection and drag-drop into the workflow. Users will
> however no longer be able to easily customize the command line as
> components are "gray boxes".
> http://dev.mygrid.org.uk/wiki/display/tav250/Component+services
>
>
>
> On 26 November 2014 at 18:50, Hoeftberger, Johann
> <Jo...@dfci.harvard.edu> wrote:
>> On 11/26/2014 10:56 AM, alaninmcr wrote:
>>> On 26/11/2014 15:14, Hoeftberger, Johann wrote:
>>>> On 11/26/2014 08:55 AM, alaninmcr wrote:
>>>>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>>>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>>>>> explanation how to do that?
>>>>> What do you mean by "make use of" ? Do you want to have a workflow be
>>>>> run on a node in the cluster or to have individual steps within a
>>>>> workflow run be executed on nodes in the cluster?
>>>> I think both.
>>>> The most important need for me seems to be to have the ability that
>>>> individual steps within a workflow run on our SGE cluster transparently
>>>> for the user which uses the workflow. So those workflow steps should be
>>>> executed on nodes of the SGE cluster, the whole handshake for the usage
>>>> of those steps should be done by the Taverna system itself.
>>> What is the normal interface for "talking" to the SGE?
>> That's the question whose answer I hoped the SGE integration in Taverna
>> would give.
>> (For our current jobs we use qsub and related tools.)
>>
>> It seems currently there doesn't exist a real integration of a SGE Grid
>> Engine in Taverna. And the only available solution is over Tool Service,
>> SSH and qsub commands.
>> Although I don't have any experience with that I see the theoretical
>> approach behind it. But this results in a polling strategy of the Grid
>> results from a workflow engine perspectice. I thought / hoped there
>> exists a tighter integrated solution for the connection between Taverna
>> and SGE.
>>
>>
>>> A quick way to try things would be use the tool service to run qsub (and
>>> related) commands to your SGE. Then you could group the choreography of
>>> submit -> poll * -> retrieve results, into a component.
>> Yes, that's the best I could found and you confirm that somehow.
>>
>>
>>> Alternatively, if there is a Java library, you could write an equivalent
>>> Beanshell.
>> I also find very interesting the "Tool Activity pluggable execution
>> point" Alan mentioned in his last post. But this would take a much
>> bigger development approach I guess. I have to figure out what fits best
>> in our situation.
>>
>> Thank you all for your suggestions so far.
>>
>>
>> Regards,
>> Johann
>>
>>
>> --
>>
>>
>> The information in this e-mail is intended only for the person to whom it is
>> addressed. If you believe this e-mail was sent to you in error and the e-mail
>> contains patient information, please contact the Partners Compliance HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you in error
>> but does not contain patient information, please contact the sender and properly
>> dispose of the e-mail.
>>
>
>

Re: Taverna and SGE

Posted by Stian Soiland-Reyes <so...@cs.manchester.ac.uk>.

We would welcome any effort in adding an SGE integration. Please join
the developer list if you want to discuss this in detail.
http://mail-archives.apache.org/mod_mbox/taverna-dev/

There's already interest in generalizing a "Execute this step
elsewhere" mechanism (not just for Tool activity) - so if we think of
it like this rather than just for SGE then it might get more interest
(where say SGE would just be a configuration that said "submit=qsub %1
--cluster=%2"  or something).


There are some intermediary solutions as well - although technically
Taverna would be pulling from SGE, the ENGINE and workFLOW is not -
the Tool Activity would be a single step in the workflow, and the
pulling is just part of its configuration.

Now your users might not all want to type all this qsub and so on for
every executable they want to run - so there are two faster solutions
I can think of:


a) Describe the possible executables (e.g. blast) using the "Usecase"
XML that can be imported into "Available Services". Inside the
description here, the commands could all include the
submit-pull-retrieve logic. From the Available Services users can then
drag-drop commands into the workflow and won't see the qsub etc.
unless they need to customize the parameters.
http://dev.mygrid.org.uk/wiki/display/tav250/Importing+a+set+of+tool+descriptions

b) Create External Tool activities pr executable, then package each as
a Component. These can then be shared on myExperiment as a
group/"family". Users can add this group of components to the
workbench for selection and drag-drop into the workflow. Users will
however no longer be able to easily customize the command line as
components are "gray boxes".
http://dev.mygrid.org.uk/wiki/display/tav250/Component+services



On 26 November 2014 at 18:50, Hoeftberger, Johann
<Jo...@dfci.harvard.edu> wrote:
> On 11/26/2014 10:56 AM, alaninmcr wrote:
>> On 26/11/2014 15:14, Hoeftberger, Johann wrote:
>>> On 11/26/2014 08:55 AM, alaninmcr wrote:
>>>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>>>> explanation how to do that?
>>>>
>>>> What do you mean by "make use of" ? Do you want to have a workflow be
>>>> run on a node in the cluster or to have individual steps within a
>>>> workflow run be executed on nodes in the cluster?
>>>
>>> I think both.
>>> The most important need for me seems to be to have the ability that
>>> individual steps within a workflow run on our SGE cluster transparently
>>> for the user which uses the workflow. So those workflow steps should be
>>> executed on nodes of the SGE cluster, the whole handshake for the usage
>>> of those steps should be done by the Taverna system itself.
>>
>> What is the normal interface for "talking" to the SGE?
>
> That's the question whose answer I hoped the SGE integration in Taverna
> would give.
> (For our current jobs we use qsub and related tools.)
>
> It seems currently there doesn't exist a real integration of a SGE Grid
> Engine in Taverna. And the only available solution is over Tool Service,
> SSH and qsub commands.
> Although I don't have any experience with that I see the theoretical
> approach behind it. But this results in a polling strategy of the Grid
> results from a workflow engine perspectice. I thought / hoped there
> exists a tighter integrated solution for the connection between Taverna
> and SGE.
>
>
>> A quick way to try things would be use the tool service to run qsub (and
>> related) commands to your SGE. Then you could group the choreography of
>> submit -> poll * -> retrieve results, into a component.
>
> Yes, that's the best I could found and you confirm that somehow.
>
>
>> Alternatively, if there is a Java library, you could write an equivalent
>> Beanshell.
>
> I also find very interesting the "Tool Activity pluggable execution
> point" Alan mentioned in his last post. But this would take a much
> bigger development approach I guess. I have to figure out what fits best
> in our situation.
>
> Thank you all for your suggestions so far.
>
>
> Regards,
> Johann
>
>
> --
>
>
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in error
> but does not contain patient information, please contact the sender and properly
> dispose of the e-mail.
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

Re: Taverna and SGE

Posted by "Hoeftberger, Johann" <Jo...@DFCI.HARVARD.EDU>.

On 11/26/2014 10:56 AM, alaninmcr wrote:
> On 26/11/2014 15:14, Hoeftberger, Johann wrote:
>> On 11/26/2014 08:55 AM, alaninmcr wrote:
>>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>>> explanation how to do that?
>>>
>>> What do you mean by "make use of" ? Do you want to have a workflow be
>>> run on a node in the cluster or to have individual steps within a
>>> workflow run be executed on nodes in the cluster?
>>
>> I think both.
>> The most important need for me seems to be to have the ability that
>> individual steps within a workflow run on our SGE cluster transparently
>> for the user which uses the workflow. So those workflow steps should be
>> executed on nodes of the SGE cluster, the whole handshake for the usage
>> of those steps should be done by the Taverna system itself.
>
> What is the normal interface for "talking" to the SGE?

That's the question whose answer I hoped the SGE integration in Taverna 
would give.
(For our current jobs we use qsub and related tools.)

It seems currently there doesn't exist a real integration of a SGE Grid 
Engine in Taverna. And the only available solution is over Tool Service, 
SSH and qsub commands.
Although I don't have any experience with that I see the theoretical 
approach behind it. But this results in a polling strategy of the Grid 
results from a workflow engine perspectice. I thought / hoped there 
exists a tighter integrated solution for the connection between Taverna 
and SGE.

> A quick way to try things would be use the tool service to run qsub (and
> related) commands to your SGE. Then you could group the choreography of
> submit -> poll * -> retrieve results, into a component.

Yes, that's the best I could found and you confirm that somehow.

> Alternatively, if there is a Java library, you could write an equivalent
> Beanshell.

I also find very interesting the "Tool Activity pluggable execution 
point" Alan mentioned in his last post. But this would take a much 
bigger development approach I guess. I have to figure out what fits best 
in our situation.

Thank you all for your suggestions so far.

Regards,
Johann

-- 

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Re: Taverna and SGE

Posted by alaninmcr <al...@googlemail.com>.

On 26/11/2014 15:14, Hoeftberger, Johann wrote:
> Hello Alan,

Hello

> thank you for your answer.
>
> On 11/26/2014 08:55 AM, alaninmcr wrote:
>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>> explanation how to do that?
>>
>> What do you mean by "make use of" ? Do you want to have a workflow be
>> run on a node in the cluster or to have individual steps within a
>> workflow run be executed on nodes in the cluster?
>
> I think both.
> The most important need for me seems to be to have the ability that
> individual steps within a workflow run on our SGE cluster transparently
> for the user which uses the workflow. So those workflow steps should be
> executed on nodes of the SGE cluster, the whole handshake for the usage
> of those steps should be done by the Taverna system itself.

What is the normal interface for "talking" to the SGE?

A quick way to try things would be use the tool service to run qsub (and 
related) commands to your SGE. Then you could group the choreography of 
submit -> poll * -> retrieve results, into a component.

Alternatively, if there is a Java library, you could write an equivalent 
Beanshell.

> Regards,
> Johann

Alan

Re: Taverna and SGE

Posted by "Hoeftberger, Johann" <Jo...@DFCI.HARVARD.EDU>.

Hello Alan,

thank you for your answer.

On 11/26/2014 08:55 AM, alaninmcr wrote:
> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>> explanation how to do that?
>
> What do you mean by "make use of" ? Do you want to have a workflow be
> run on a node in the cluster or to have individual steps within a
> workflow run be executed on nodes in the cluster?

I think both.
The most important need for me seems to be to have the ability that 
individual steps within a workflow run on our SGE cluster transparently 
for the user which uses the workflow. So those workflow steps should be 
executed on nodes of the SGE cluster, the whole handshake for the usage 
of those steps should be done by the Taverna system itself.

If that's possible, and I hope I can find a solution for that approach, 
I don't see limitation to create whole workflows running on the SGE grid 
too, except the "administration" of those workflows which should run on 
a separated local PC where the Taverna GUI is running.

Or maybe later a commandline tool for the same purpose running on a 
local PC AND/OR running on the SGE Grid cluster as cronjob.

>> I am looking for a working solution for this setting Taverna, SGE
>> Cluster, Bioinformatic workflows.
>> I appreciate good suggestions, let me know your ideas, please.
>
> I think it depends exactly what you are trying to do.

My aim is to build Taverna workflows which can make use of our SGE grid 
for the computational demanding parts transparently (invisible) in the 
background. Depending on the concrete workflow this could mean that all 
workflow steps should be executed on the SGE grid.

I couldn't find any SGE Taverna solution so far, it seems to me SGE 
isn't supported by Taverna!?

Regards,
Johann

-- 

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Re: Taverna and SGE

Posted by alaninmcr <al...@googlemail.com>.

On 25/11/2014 16:36, Hoeftberger, Johann wrote:
> Hello,

Hello

> I guess my first posting to the hackers group wasn't the best place for
> my question. Sorry. I repost my question here.
>
> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
> explanation how to do that?

What do you mean by "make use of" ? Do you want to have a workflow be 
run on a node in the cluster or to have individual steps within a 
workflow run be executed on nodes in the cluster?

> I am looking for a working solution for this setting Taverna, SGE
> Cluster, Bioinformatic workflows.
> I appreciate good suggestions, let me know your ideas, please.

I think it depends exactly what you are trying to do.

> Regards,
> Johann

Alan