You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Arvid Heise <ar...@gmail.com> on 2014/11/10 15:02:17 UTC

KNIME Integration of Flink

Dear Flinkler,

For my current project, we want to outsource some performance critical
parts of a complex KNIME workflow to Flink. Is there already a way to
trigger a Flink workflow from KNIME? If not, we will probably provide a
straight-forward way to execute Flink (Scala) programs from KNIME within
this month. The overall goal is to upload the data from the KNIME workflow
to s3, execute Flink on this data, and retrieve the output from s3.

Since my employer (BfR) has a rather strict firewall setup, I will also add
a feature request for a REST API of the job manager similar to s3 (
http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html), so
that we can completely rely on HTTP(S).

Lastly, are there any plans for providing out-of-the-box VMs for Amazon
MapReduce? I saw the blog post
https://flink.incubator.apache.org/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html
, which would suffice, but still requires quite a bit manual work (with
SSH, which does not work here).

Feel free to split the topics into separate threads if you want to discuss
them individually.

Best,

Arvid

Re: KNIME Integration of Flink

Posted by Arvid Heise <ar...@gmail.com>.
Hi Stephan,

Concerning the Amazon images: Amazon provides a facility to add key pairs
to their storage.
http://docs.aws.amazon.com/gettingstarted/latest/emr/getting-started-emr-setup.html
It then initializes the booted image with the keys (details not completely
clear). It should be easy to transfer their Hadoop setup to the Flink setup.

Ciao

Arvid


On Mon, Nov 10, 2014 at 4:23 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Arvid!
>
> What you describe sounds like a great use case. I am not aware of an
> integration of Flink with KNIME. Triggering progamms from other programs
> should work through the Client & PackagedProgram classes-
>
>  -
>
> https://github.com/apache/incubator-flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/Client.java
>
>  -
>
> https://github.com/apache/incubator-flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java
>
> Triggering programs through HTTPS REST should be possible with a thin layer
> that wraps RPC calls / actor messages in HTTP requests.
>
> Out-of-the-box images for Amazon do not exist with any current version of
> the system. So far, elastic MapReduce with YARN has proved a simpler setup
> than any VM we had prepared before. How would you skip the SSH key setup
> with prepared images?
>
> Greetings,
> Stephan
>
>
>
> On Mon, Nov 10, 2014 at 3:02 PM, Arvid Heise <ar...@gmail.com>
> wrote:
>
> > Dear Flinkler,
> >
> > For my current project, we want to outsource some performance critical
> > parts of a complex KNIME workflow to Flink. Is there already a way to
> > trigger a Flink workflow from KNIME? If not, we will probably provide a
> > straight-forward way to execute Flink (Scala) programs from KNIME within
> > this month. The overall goal is to upload the data from the KNIME
> workflow
> > to s3, execute Flink on this data, and retrieve the output from s3.
> >
> > Since my employer (BfR) has a rather strict firewall setup, I will also
> add
> > a feature request for a REST API of the job manager similar to s3 (
> > http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html),
> > so
> > that we can completely rely on HTTP(S).
> >
> > Lastly, are there any plans for providing out-of-the-box VMs for Amazon
> > MapReduce? I saw the blog post
> >
> >
> https://flink.incubator.apache.org/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html
> > , which would suffice, but still requires quite a bit manual work (with
> > SSH, which does not work here).
> >
> > Feel free to split the topics into separate threads if you want to
> discuss
> > them individually.
> >
> > Best,
> >
> > Arvid
> >
>

Re: KNIME Integration of Flink

Posted by Stephan Ewen <se...@apache.org>.
Hi Arvid!

What you describe sounds like a great use case. I am not aware of an
integration of Flink with KNIME. Triggering progamms from other programs
should work through the Client & PackagedProgram classes-

 -
https://github.com/apache/incubator-flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/Client.java

 -
https://github.com/apache/incubator-flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java

Triggering programs through HTTPS REST should be possible with a thin layer
that wraps RPC calls / actor messages in HTTP requests.

Out-of-the-box images for Amazon do not exist with any current version of
the system. So far, elastic MapReduce with YARN has proved a simpler setup
than any VM we had prepared before. How would you skip the SSH key setup
with prepared images?

Greetings,
Stephan



On Mon, Nov 10, 2014 at 3:02 PM, Arvid Heise <ar...@gmail.com> wrote:

> Dear Flinkler,
>
> For my current project, we want to outsource some performance critical
> parts of a complex KNIME workflow to Flink. Is there already a way to
> trigger a Flink workflow from KNIME? If not, we will probably provide a
> straight-forward way to execute Flink (Scala) programs from KNIME within
> this month. The overall goal is to upload the data from the KNIME workflow
> to s3, execute Flink on this data, and retrieve the output from s3.
>
> Since my employer (BfR) has a rather strict firewall setup, I will also add
> a feature request for a REST API of the job manager similar to s3 (
> http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-post-example.html),
> so
> that we can completely rely on HTTP(S).
>
> Lastly, are there any plans for providing out-of-the-box VMs for Amazon
> MapReduce? I saw the blog post
>
> https://flink.incubator.apache.org/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html
> , which would suffice, but still requires quite a bit manual work (with
> SSH, which does not work here).
>
> Feel free to split the topics into separate threads if you want to discuss
> them individually.
>
> Best,
>
> Arvid
>