You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@taverna.apache.org by "Hoeftberger, Johann" <Jo...@DFCI.HARVARD.EDU> on 2014/11/25 17:36:31 UTC
Taverna and SGE
Hello,
I guess my first posting to the hackers group wasn't the best place for
my question. Sorry. I repost my question here.
Can Taverna make use of a Sun Grid Engine Cluster and is there a good
explanation how to do that?
I am looking for a working solution for this setting Taverna, SGE
Cluster, Bioinformatic workflows.
I appreciate good suggestions, let me know your ideas, please.
Regards,
Johann
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
Re: Taverna and SGE
Posted by Stian Soiland-Reyes <so...@cs.manchester.ac.uk>.
You can execute any command line locally and by SSH from the Tool Activity.
http://dev.mygrid.org.uk/wiki/display/tav250/Tool+service
So as SGE has a command line interface, you can use qsub and friends
from the Tool activity. Note that you can use multiple commands within
the Tool service as it is executed as a shell script - and you can
pick up arbitrary files after execution, e.g. example.o1.
What is lesser known is that the implementation of the Tool Activity
has a pluggable execution point. We have previously used this to
submit jobs to ARQ: http://dx.doi.org/10.1093/bioinformatics/btn095 --
but it should be possible to implement support for SGE in the same
way, and thus avoid having to do the qsub commands - this could become
a plugin for Taverna.
See for instance how SSH was implemented:
https://github.com/taverna/taverna-external-tool-activity/tree/maintenance/src/main/java/de/uni_luebeck/inb/knowarc/usecases/invocation/ssh
The UI implications of these are however varying a lot per grid
environment, as they all require various things to be set up for the
client tools to work, e.g. security credentials, certificates, shell
variables, etc. Some don't have Java bindings, so you will have to
basically still execute job submit commands from within the Java
implementation.
On 26 November 2014 at 15:14, Hoeftberger, Johann
<Jo...@dfci.harvard.edu> wrote:
> Hello Alan,
>
> thank you for your answer.
>
> On 11/26/2014 08:55 AM, alaninmcr wrote:
>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>> explanation how to do that?
>>
>> What do you mean by "make use of" ? Do you want to have a workflow be
>> run on a node in the cluster or to have individual steps within a
>> workflow run be executed on nodes in the cluster?
>
> I think both.
> The most important need for me seems to be to have the ability that
> individual steps within a workflow run on our SGE cluster transparently
> for the user which uses the workflow. So those workflow steps should be
> executed on nodes of the SGE cluster, the whole handshake for the usage
> of those steps should be done by the Taverna system itself.
>
> If that's possible, and I hope I can find a solution for that approach,
> I don't see limitation to create whole workflows running on the SGE grid
> too, except the "administration" of those workflows which should run on
> a separated local PC where the Taverna GUI is running.
>
> Or maybe later a commandline tool for the same purpose running on a
> local PC AND/OR running on the SGE Grid cluster as cronjob.
>
>
>>> I am looking for a working solution for this setting Taverna, SGE
>>> Cluster, Bioinformatic workflows.
>>> I appreciate good suggestions, let me know your ideas, please.
>>
>> I think it depends exactly what you are trying to do.
>
> My aim is to build Taverna workflows which can make use of our SGE grid
> for the computational demanding parts transparently (invisible) in the
> background. Depending on the concrete workflow this could mean that all
> workflow steps should be executed on the SGE grid.
>
> I couldn't find any SGE Taverna solution so far, it seems to me SGE
> isn't supported by Taverna!?
>
>
> Regards,
> Johann
>
>
> --
>
>
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in error
> but does not contain patient information, please contact the sender and properly
> dispose of the e-mail.
>
--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Re: Taverna and SGE
Posted by Marlon Pierce <ma...@iu.edu>.
Now is an excellent time to get involved, with Taverna going through
Apache incubation. Demonstrating community contributions and welcoming
new committers is part of the process.
Marlon
On 11/27/14, 8:41 PM, Stian Soiland-Reyes wrote:
> We would welcome any effort in adding an SGE integration. Please join
> the developer list if you want to discuss this in detail.
> http://mail-archives.apache.org/mod_mbox/taverna-dev/
>
> There's already interest in generalizing a "Execute this step
> elsewhere" mechanism (not just for Tool activity) - so if we think of
> it like this rather than just for SGE then it might get more interest
> (where say SGE would just be a configuration that said "submit=qsub %1
> --cluster=%2" or something).
>
>
> There are some intermediary solutions as well - although technically
> Taverna would be pulling from SGE, the ENGINE and workFLOW is not -
> the Tool Activity would be a single step in the workflow, and the
> pulling is just part of its configuration.
>
> Now your users might not all want to type all this qsub and so on for
> every executable they want to run - so there are two faster solutions
> I can think of:
>
>
> a) Describe the possible executables (e.g. blast) using the "Usecase"
> XML that can be imported into "Available Services". Inside the
> description here, the commands could all include the
> submit-pull-retrieve logic. From the Available Services users can then
> drag-drop commands into the workflow and won't see the qsub etc.
> unless they need to customize the parameters.
> http://dev.mygrid.org.uk/wiki/display/tav250/Importing+a+set+of+tool+descriptions
>
> b) Create External Tool activities pr executable, then package each as
> a Component. These can then be shared on myExperiment as a
> group/"family". Users can add this group of components to the
> workbench for selection and drag-drop into the workflow. Users will
> however no longer be able to easily customize the command line as
> components are "gray boxes".
> http://dev.mygrid.org.uk/wiki/display/tav250/Component+services
>
>
>
> On 26 November 2014 at 18:50, Hoeftberger, Johann
> <Jo...@dfci.harvard.edu> wrote:
>> On 11/26/2014 10:56 AM, alaninmcr wrote:
>>> On 26/11/2014 15:14, Hoeftberger, Johann wrote:
>>>> On 11/26/2014 08:55 AM, alaninmcr wrote:
>>>>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>>>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>>>>> explanation how to do that?
>>>>> What do you mean by "make use of" ? Do you want to have a workflow be
>>>>> run on a node in the cluster or to have individual steps within a
>>>>> workflow run be executed on nodes in the cluster?
>>>> I think both.
>>>> The most important need for me seems to be to have the ability that
>>>> individual steps within a workflow run on our SGE cluster transparently
>>>> for the user which uses the workflow. So those workflow steps should be
>>>> executed on nodes of the SGE cluster, the whole handshake for the usage
>>>> of those steps should be done by the Taverna system itself.
>>> What is the normal interface for "talking" to the SGE?
>> That's the question whose answer I hoped the SGE integration in Taverna
>> would give.
>> (For our current jobs we use qsub and related tools.)
>>
>> It seems currently there doesn't exist a real integration of a SGE Grid
>> Engine in Taverna. And the only available solution is over Tool Service,
>> SSH and qsub commands.
>> Although I don't have any experience with that I see the theoretical
>> approach behind it. But this results in a polling strategy of the Grid
>> results from a workflow engine perspectice. I thought / hoped there
>> exists a tighter integrated solution for the connection between Taverna
>> and SGE.
>>
>>
>>> A quick way to try things would be use the tool service to run qsub (and
>>> related) commands to your SGE. Then you could group the choreography of
>>> submit -> poll * -> retrieve results, into a component.
>> Yes, that's the best I could found and you confirm that somehow.
>>
>>
>>> Alternatively, if there is a Java library, you could write an equivalent
>>> Beanshell.
>> I also find very interesting the "Tool Activity pluggable execution
>> point" Alan mentioned in his last post. But this would take a much
>> bigger development approach I guess. I have to figure out what fits best
>> in our situation.
>>
>> Thank you all for your suggestions so far.
>>
>>
>> Regards,
>> Johann
>>
>>
>> --
>>
>>
>> The information in this e-mail is intended only for the person to whom it is
>> addressed. If you believe this e-mail was sent to you in error and the e-mail
>> contains patient information, please contact the Partners Compliance HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you in error
>> but does not contain patient information, please contact the sender and properly
>> dispose of the e-mail.
>>
>
>
Re: Taverna and SGE
Posted by Stian Soiland-Reyes <so...@cs.manchester.ac.uk>.
We would welcome any effort in adding an SGE integration. Please join
the developer list if you want to discuss this in detail.
http://mail-archives.apache.org/mod_mbox/taverna-dev/
There's already interest in generalizing a "Execute this step
elsewhere" mechanism (not just for Tool activity) - so if we think of
it like this rather than just for SGE then it might get more interest
(where say SGE would just be a configuration that said "submit=qsub %1
--cluster=%2" or something).
There are some intermediary solutions as well - although technically
Taverna would be pulling from SGE, the ENGINE and workFLOW is not -
the Tool Activity would be a single step in the workflow, and the
pulling is just part of its configuration.
Now your users might not all want to type all this qsub and so on for
every executable they want to run - so there are two faster solutions
I can think of:
a) Describe the possible executables (e.g. blast) using the "Usecase"
XML that can be imported into "Available Services". Inside the
description here, the commands could all include the
submit-pull-retrieve logic. From the Available Services users can then
drag-drop commands into the workflow and won't see the qsub etc.
unless they need to customize the parameters.
http://dev.mygrid.org.uk/wiki/display/tav250/Importing+a+set+of+tool+descriptions
b) Create External Tool activities pr executable, then package each as
a Component. These can then be shared on myExperiment as a
group/"family". Users can add this group of components to the
workbench for selection and drag-drop into the workflow. Users will
however no longer be able to easily customize the command line as
components are "gray boxes".
http://dev.mygrid.org.uk/wiki/display/tav250/Component+services
On 26 November 2014 at 18:50, Hoeftberger, Johann
<Jo...@dfci.harvard.edu> wrote:
> On 11/26/2014 10:56 AM, alaninmcr wrote:
>> On 26/11/2014 15:14, Hoeftberger, Johann wrote:
>>> On 11/26/2014 08:55 AM, alaninmcr wrote:
>>>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>>>> explanation how to do that?
>>>>
>>>> What do you mean by "make use of" ? Do you want to have a workflow be
>>>> run on a node in the cluster or to have individual steps within a
>>>> workflow run be executed on nodes in the cluster?
>>>
>>> I think both.
>>> The most important need for me seems to be to have the ability that
>>> individual steps within a workflow run on our SGE cluster transparently
>>> for the user which uses the workflow. So those workflow steps should be
>>> executed on nodes of the SGE cluster, the whole handshake for the usage
>>> of those steps should be done by the Taverna system itself.
>>
>> What is the normal interface for "talking" to the SGE?
>
> That's the question whose answer I hoped the SGE integration in Taverna
> would give.
> (For our current jobs we use qsub and related tools.)
>
> It seems currently there doesn't exist a real integration of a SGE Grid
> Engine in Taverna. And the only available solution is over Tool Service,
> SSH and qsub commands.
> Although I don't have any experience with that I see the theoretical
> approach behind it. But this results in a polling strategy of the Grid
> results from a workflow engine perspectice. I thought / hoped there
> exists a tighter integrated solution for the connection between Taverna
> and SGE.
>
>
>> A quick way to try things would be use the tool service to run qsub (and
>> related) commands to your SGE. Then you could group the choreography of
>> submit -> poll * -> retrieve results, into a component.
>
> Yes, that's the best I could found and you confirm that somehow.
>
>
>> Alternatively, if there is a Java library, you could write an equivalent
>> Beanshell.
>
> I also find very interesting the "Tool Activity pluggable execution
> point" Alan mentioned in his last post. But this would take a much
> bigger development approach I guess. I have to figure out what fits best
> in our situation.
>
> Thank you all for your suggestions so far.
>
>
> Regards,
> Johann
>
>
> --
>
>
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in error
> but does not contain patient information, please contact the sender and properly
> dispose of the e-mail.
>
--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Re: Taverna and SGE
Posted by "Hoeftberger, Johann" <Jo...@DFCI.HARVARD.EDU>.
On 11/26/2014 10:56 AM, alaninmcr wrote:
> On 26/11/2014 15:14, Hoeftberger, Johann wrote:
>> On 11/26/2014 08:55 AM, alaninmcr wrote:
>>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>>> explanation how to do that?
>>>
>>> What do you mean by "make use of" ? Do you want to have a workflow be
>>> run on a node in the cluster or to have individual steps within a
>>> workflow run be executed on nodes in the cluster?
>>
>> I think both.
>> The most important need for me seems to be to have the ability that
>> individual steps within a workflow run on our SGE cluster transparently
>> for the user which uses the workflow. So those workflow steps should be
>> executed on nodes of the SGE cluster, the whole handshake for the usage
>> of those steps should be done by the Taverna system itself.
>
> What is the normal interface for "talking" to the SGE?
That's the question whose answer I hoped the SGE integration in Taverna
would give.
(For our current jobs we use qsub and related tools.)
It seems currently there doesn't exist a real integration of a SGE Grid
Engine in Taverna. And the only available solution is over Tool Service,
SSH and qsub commands.
Although I don't have any experience with that I see the theoretical
approach behind it. But this results in a polling strategy of the Grid
results from a workflow engine perspectice. I thought / hoped there
exists a tighter integrated solution for the connection between Taverna
and SGE.
> A quick way to try things would be use the tool service to run qsub (and
> related) commands to your SGE. Then you could group the choreography of
> submit -> poll * -> retrieve results, into a component.
Yes, that's the best I could found and you confirm that somehow.
> Alternatively, if there is a Java library, you could write an equivalent
> Beanshell.
I also find very interesting the "Tool Activity pluggable execution
point" Alan mentioned in his last post. But this would take a much
bigger development approach I guess. I have to figure out what fits best
in our situation.
Thank you all for your suggestions so far.
Regards,
Johann
--
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
Re: Taverna and SGE
Posted by alaninmcr <al...@googlemail.com>.
On 26/11/2014 15:14, Hoeftberger, Johann wrote:
> Hello Alan,
Hello
> thank you for your answer.
>
> On 11/26/2014 08:55 AM, alaninmcr wrote:
>> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>>> explanation how to do that?
>>
>> What do you mean by "make use of" ? Do you want to have a workflow be
>> run on a node in the cluster or to have individual steps within a
>> workflow run be executed on nodes in the cluster?
>
> I think both.
> The most important need for me seems to be to have the ability that
> individual steps within a workflow run on our SGE cluster transparently
> for the user which uses the workflow. So those workflow steps should be
> executed on nodes of the SGE cluster, the whole handshake for the usage
> of those steps should be done by the Taverna system itself.
What is the normal interface for "talking" to the SGE?
A quick way to try things would be use the tool service to run qsub (and
related) commands to your SGE. Then you could group the choreography of
submit -> poll * -> retrieve results, into a component.
Alternatively, if there is a Java library, you could write an equivalent
Beanshell.
> Regards,
> Johann
Alan
Re: Taverna and SGE
Posted by "Hoeftberger, Johann" <Jo...@DFCI.HARVARD.EDU>.
Hello Alan,
thank you for your answer.
On 11/26/2014 08:55 AM, alaninmcr wrote:
> On 25/11/2014 16:36, Hoeftberger, Johann wrote:
>> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
>> explanation how to do that?
>
> What do you mean by "make use of" ? Do you want to have a workflow be
> run on a node in the cluster or to have individual steps within a
> workflow run be executed on nodes in the cluster?
I think both.
The most important need for me seems to be to have the ability that
individual steps within a workflow run on our SGE cluster transparently
for the user which uses the workflow. So those workflow steps should be
executed on nodes of the SGE cluster, the whole handshake for the usage
of those steps should be done by the Taverna system itself.
If that's possible, and I hope I can find a solution for that approach,
I don't see limitation to create whole workflows running on the SGE grid
too, except the "administration" of those workflows which should run on
a separated local PC where the Taverna GUI is running.
Or maybe later a commandline tool for the same purpose running on a
local PC AND/OR running on the SGE Grid cluster as cronjob.
>> I am looking for a working solution for this setting Taverna, SGE
>> Cluster, Bioinformatic workflows.
>> I appreciate good suggestions, let me know your ideas, please.
>
> I think it depends exactly what you are trying to do.
My aim is to build Taverna workflows which can make use of our SGE grid
for the computational demanding parts transparently (invisible) in the
background. Depending on the concrete workflow this could mean that all
workflow steps should be executed on the SGE grid.
I couldn't find any SGE Taverna solution so far, it seems to me SGE
isn't supported by Taverna!?
Regards,
Johann
--
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
Re: Taverna and SGE
Posted by alaninmcr <al...@googlemail.com>.
On 25/11/2014 16:36, Hoeftberger, Johann wrote:
> Hello,
Hello
> I guess my first posting to the hackers group wasn't the best place for
> my question. Sorry. I repost my question here.
>
> Can Taverna make use of a Sun Grid Engine Cluster and is there a good
> explanation how to do that?
What do you mean by "make use of" ? Do you want to have a workflow be
run on a node in the cluster or to have individual steps within a
workflow run be executed on nodes in the cluster?
> I am looking for a working solution for this setting Taverna, SGE
> Cluster, Bioinformatic workflows.
> I appreciate good suggestions, let me know your ideas, please.
I think it depends exactly what you are trying to do.
> Regards,
> Johann
Alan