You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Raul Kripalani <ra...@apache.org> on 2016/03/30 13:48:08 UTC

Apache Flink <=> Apache Ignite integration

Hello from the Apache Ignite community!

Last year there was an interesting thread [1] about such integration.
Unfortunately there's been little follow-through, so let's try and fix that
in 2016 ;-)

I'm sure a lot has changed in the Flink community, with the recent
graduation and 1.0 release, so I'd like to make a new (updated) list of
synergies and areas of integration I can think of:

+++ *Ignite as a bidirectional Connector* +++

The first and obvious integration point is Ignite as a source and a sink of
Flink. An Ignite contributor has already sent a pull request [2] to serve
as a sink into Ignite Queues, but I feel this integration can be deeper and
more functional. Moreover, it should be hosted in the Flink source tree as
a Connector (like the Kafka, or ES connectors). Particularly, we could
offer these features:

* As a Flink sink => inject data directly into a cache via a DataStreamer.
* As a Flink source => run a continuous query against one or multiple
caches [4].

+++ *Ignite as a state backend* +++

Either natively [5] or via the IGFS (Ignite Filesystem) interface which can
run as a Hadoop Filesystem [6].

This would allow Flink to store intermediate states in Ignite. I believe
this is what you called "distributed backup for Streaming Operator State"
in the initial exchange, is it?

+++ *Ignite as a DataSet API connector* +++

Ability to use Ignite as a source for batch pipelines, by executing Ignite
SQL queries [7] against a cache and feeding the results into a Flink
pipeline. Basically a batch counterpart to the streaming continuous query
idea above.

+++ *Ignite as an execution backend* +++

You already mentioned this in [1] and I think it makes for a perfect
synergy between both projects, through Ignite's Compute API.

Still agree with this? Any changes since last year I should take into
account?

+++ *Ignite as a parameter server* +++

This was in the initial proposal [1], but it's not clear to me. I have
found references to the idea of a Parameter Server in Flink, but only as
proposed ideas. Was this feature finally implemented, or is it in the
future roadmap?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is just a newer, updated proposal from my side, but I'm sure that both
communities can, and will want to, chime in!

Cheers,

[1]
https://mail-archives.apache.org/mod_mbox/flink-dev/201504.mbox/%3CCANC1h_u__KgsdOo2SZ4M=8jf3zOMozS3XbekQ0erJj9p4WF1zg@mail.gmail.com%3E
[2] https://issues.apache.org/jira/browse/IGNITE-813
[3] https://ignite.apache.org/features/streaming.html
[4] http://apacheignite.gridgain.org/v1.5/docs/continuous-queries
[5] https://apacheignite-fs.readme.io/docs/igfs
[6] https://apacheignite-fs.readme.io/docs/file-system
[7] https://apacheignite.readme.io/docs/sql-queries

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
Blog: raul.io | twitter: @raulvk <https://twitter.com/raulvk>

Re: Apache Flink <=> Apache Ignite integration

Posted by Anton Vinogradov <av...@gridgain.com>.

Hi All,
I'll review it in the near future.

On Tue, Jul 19, 2016 at 11:53 AM, Denis Magda <dm...@gridgain.com> wrote:

> Hi Saikat,
>
> Thanks for this contribution.
>
> *Anton V.*, please review the following contribution.
>
> —
> Denis
>
> On Jul 16, 2016, at 11:09 PM, Saikat Maitra <sa...@gmail.com>
> wrote:
>
> Hi
>
> I have raised a PR for the following scope.
>
> As a Flink source => run a continuous query against one or multiple
> caches
>
> PR https://github.com/apache/ignite/pull/870
> Jira https://issues.apache.org/jira/browse/IGNITE-3303
>
> Please review and share feedback.
>
> Regards
> Saikat
>
>
> On Mon, Apr 4, 2016 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:
>
> Hi!
>
>  - Sounds like the having Ignite for snapshots should work pretty much out
> of the box (via the IGFS)
>  - The source and sink connector sounds like the next logical step. Does
> Ignite have a notion of stream partitions and offsets, to build a
> consistent replay around? This should probably have its dedicated issue and
> discussion thread.
>
>  - For Ignite as an execution backend - I am not sure how relevant and
> feasible that is. Many DataStream API features make use of the specific
> Flink runtime. For streaming, the runtime is not as decoupled as for batch.
>  - I think the parameter server integration would not be part of the Flink
> codebase - this is a pretty application specific thing that should be its
> own project and it is actually not tightly coupled to Flink.
>
> Greetings,
> Stephan
>
>
> On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org>
> wrote:
>
> Hi Raul,
>
> thanks a lot for reaching out to the Flink community.
> I'm really excited to see a Flink connector in Ignite. If you feel that
>
> the
>
> connector would be more suitable for our "connector library" feel free to
> open a JIRA and open a pull request.
>
> Were there requests in the Ignite community to have an integration with
> Flink?
>
>
>
> On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
> wrote:
>
> Hi ,
>
> I agree with Roman and Raul.
> https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
>
> to
>
> into cache via Data Streamer. Integrating with Ignite FileSystem for
>
> source
>
> and sink will allow for bidirectional connector. It will also allow
>
> easier
>
> implementation for DataStream transformations over Ignite FileSystem.
>
> Regards
> Saikat
>
> On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <aljoscha@apache.org
>
>
> wrote:
>
> Hi,
> it should already be possible to use the Ignite FileSystem to store
>
> state
>
> since we just use the HDFS FileSystem interface for that. Of course,
>
> one
>
> would have to properly set up the jars and paths and everything for
>
> Flink
>
> to pick up the IGFS classes.
>
> Cheers,
> Aljoscha
>
> On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org>
>
> wrote:
>
>
> On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
>
> wrote:
>
>
> Raul,
>
> Small comment from me.
>
> * As a Flink sink => inject data directly into a cache via a
>
> DataStreamer.
>
> After reviews, IGNITE-813 is exactly this functionality.
>
>
> That's cool, Roman! The idea would be to host these (richer)
>
> modules
>
> as
>
> Flink connectors, like they do with others:
>
>
> https://github.com/apache/flink/tree/master/flink-streaming-connectors
>
> https://github.com/apache/flink/tree/master/flink-batch-connectors
>
>
>
>
>
>
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Anton Vinogradov <av...@gridgain.com>.

Hi All,
I'll review it in the near future.

On Tue, Jul 19, 2016 at 11:53 AM, Denis Magda <dm...@gridgain.com> wrote:

> Hi Saikat,
>
> Thanks for this contribution.
>
> *Anton V.*, please review the following contribution.
>
> —
> Denis
>
> On Jul 16, 2016, at 11:09 PM, Saikat Maitra <sa...@gmail.com>
> wrote:
>
> Hi
>
> I have raised a PR for the following scope.
>
> As a Flink source => run a continuous query against one or multiple
> caches
>
> PR https://github.com/apache/ignite/pull/870
> Jira https://issues.apache.org/jira/browse/IGNITE-3303
>
> Please review and share feedback.
>
> Regards
> Saikat
>
>
> On Mon, Apr 4, 2016 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:
>
> Hi!
>
>  - Sounds like the having Ignite for snapshots should work pretty much out
> of the box (via the IGFS)
>  - The source and sink connector sounds like the next logical step. Does
> Ignite have a notion of stream partitions and offsets, to build a
> consistent replay around? This should probably have its dedicated issue and
> discussion thread.
>
>  - For Ignite as an execution backend - I am not sure how relevant and
> feasible that is. Many DataStream API features make use of the specific
> Flink runtime. For streaming, the runtime is not as decoupled as for batch.
>  - I think the parameter server integration would not be part of the Flink
> codebase - this is a pretty application specific thing that should be its
> own project and it is actually not tightly coupled to Flink.
>
> Greetings,
> Stephan
>
>
> On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org>
> wrote:
>
> Hi Raul,
>
> thanks a lot for reaching out to the Flink community.
> I'm really excited to see a Flink connector in Ignite. If you feel that
>
> the
>
> connector would be more suitable for our "connector library" feel free to
> open a JIRA and open a pull request.
>
> Were there requests in the Ignite community to have an integration with
> Flink?
>
>
>
> On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
> wrote:
>
> Hi ,
>
> I agree with Roman and Raul.
> https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
>
> to
>
> into cache via Data Streamer. Integrating with Ignite FileSystem for
>
> source
>
> and sink will allow for bidirectional connector. It will also allow
>
> easier
>
> implementation for DataStream transformations over Ignite FileSystem.
>
> Regards
> Saikat
>
> On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <aljoscha@apache.org
>
>
> wrote:
>
> Hi,
> it should already be possible to use the Ignite FileSystem to store
>
> state
>
> since we just use the HDFS FileSystem interface for that. Of course,
>
> one
>
> would have to properly set up the jars and paths and everything for
>
> Flink
>
> to pick up the IGFS classes.
>
> Cheers,
> Aljoscha
>
> On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org>
>
> wrote:
>
>
> On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
>
> wrote:
>
>
> Raul,
>
> Small comment from me.
>
> * As a Flink sink => inject data directly into a cache via a
>
> DataStreamer.
>
> After reviews, IGNITE-813 is exactly this functionality.
>
>
> That's cool, Roman! The idea would be to host these (richer)
>
> modules
>
> as
>
> Flink connectors, like they do with others:
>
>
> https://github.com/apache/flink/tree/master/flink-streaming-connectors
>
> https://github.com/apache/flink/tree/master/flink-batch-connectors
>
>
>
>
>
>
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Denis Magda <dm...@gridgain.com>.

Hi Saikat,

Thanks for this contribution.

Anton V., please review the following contribution.

—
Denis

> On Jul 16, 2016, at 11:09 PM, Saikat Maitra <sa...@gmail.com> wrote:
> 
> Hi
> 
> I have raised a PR for the following scope.
> 
> As a Flink source => run a continuous query against one or multiple
> caches
> 
> PR https://github.com/apache/ignite/pull/870
> Jira https://issues.apache.org/jira/browse/IGNITE-3303
> 
> Please review and share feedback.
> 
> Regards
> Saikat
> 
> 
> On Mon, Apr 4, 2016 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:
> 
>> Hi!
>> 
>>  - Sounds like the having Ignite for snapshots should work pretty much out
>> of the box (via the IGFS)
>>  - The source and sink connector sounds like the next logical step. Does
>> Ignite have a notion of stream partitions and offsets, to build a
>> consistent replay around? This should probably have its dedicated issue and
>> discussion thread.
>> 
>>  - For Ignite as an execution backend - I am not sure how relevant and
>> feasible that is. Many DataStream API features make use of the specific
>> Flink runtime. For streaming, the runtime is not as decoupled as for batch.
>>  - I think the parameter server integration would not be part of the Flink
>> codebase - this is a pretty application specific thing that should be its
>> own project and it is actually not tightly coupled to Flink.
>> 
>> Greetings,
>> Stephan
>> 
>> 
>> On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org>
>> wrote:
>> 
>>> Hi Raul,
>>> 
>>> thanks a lot for reaching out to the Flink community.
>>> I'm really excited to see a Flink connector in Ignite. If you feel that
>> the
>>> connector would be more suitable for our "connector library" feel free to
>>> open a JIRA and open a pull request.
>>> 
>>> Were there requests in the Ignite community to have an integration with
>>> Flink?
>>> 
>>> 
>>> 
>>> On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
>>> wrote:
>>> 
>>>> Hi ,
>>>> 
>>>> I agree with Roman and Raul.
>>>> https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
>>> to
>>>> into cache via Data Streamer. Integrating with Ignite FileSystem for
>>> source
>>>> and sink will allow for bidirectional connector. It will also allow
>>> easier
>>>> implementation for DataStream transformations over Ignite FileSystem.
>>>> 
>>>> Regards
>>>> Saikat
>>>> 
>>>> On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <aljoscha@apache.org
>>> 
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> it should already be possible to use the Ignite FileSystem to store
>>> state
>>>>> since we just use the HDFS FileSystem interface for that. Of course,
>>> one
>>>>> would have to properly set up the jars and paths and everything for
>>> Flink
>>>>> to pick up the IGFS classes.
>>>>> 
>>>>> Cheers,
>>>>> Aljoscha
>>>>> 
>>>>> On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org>
>> wrote:
>>>>> 
>>>>>> On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
>>>>> wrote:
>>>>>> 
>>>>>>> Raul,
>>>>>>> 
>>>>>>> Small comment from me.
>>>>>>> 
>>>>>>>> * As a Flink sink => inject data directly into a cache via a
>>>>>> DataStreamer.
>>>>>>> After reviews, IGNITE-813 is exactly this functionality.
>>>>>>> 
>>>>>>> 
>>>>>> That's cool, Roman! The idea would be to host these (richer)
>> modules
>>> as
>>>>>> Flink connectors, like they do with others:
>>>>>> 
>>>>>> 
>>> https://github.com/apache/flink/tree/master/flink-streaming-connectors
>>>>>> https://github.com/apache/flink/tree/master/flink-batch-connectors
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Apache Flink <=> Apache Ignite integration

Posted by Denis Magda <dm...@gridgain.com>.

Hi Saikat,

Thanks for this contribution.

Anton V., please review the following contribution.

—
Denis

> On Jul 16, 2016, at 11:09 PM, Saikat Maitra <sa...@gmail.com> wrote:
> 
> Hi
> 
> I have raised a PR for the following scope.
> 
> As a Flink source => run a continuous query against one or multiple
> caches
> 
> PR https://github.com/apache/ignite/pull/870
> Jira https://issues.apache.org/jira/browse/IGNITE-3303
> 
> Please review and share feedback.
> 
> Regards
> Saikat
> 
> 
> On Mon, Apr 4, 2016 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:
> 
>> Hi!
>> 
>>  - Sounds like the having Ignite for snapshots should work pretty much out
>> of the box (via the IGFS)
>>  - The source and sink connector sounds like the next logical step. Does
>> Ignite have a notion of stream partitions and offsets, to build a
>> consistent replay around? This should probably have its dedicated issue and
>> discussion thread.
>> 
>>  - For Ignite as an execution backend - I am not sure how relevant and
>> feasible that is. Many DataStream API features make use of the specific
>> Flink runtime. For streaming, the runtime is not as decoupled as for batch.
>>  - I think the parameter server integration would not be part of the Flink
>> codebase - this is a pretty application specific thing that should be its
>> own project and it is actually not tightly coupled to Flink.
>> 
>> Greetings,
>> Stephan
>> 
>> 
>> On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org>
>> wrote:
>> 
>>> Hi Raul,
>>> 
>>> thanks a lot for reaching out to the Flink community.
>>> I'm really excited to see a Flink connector in Ignite. If you feel that
>> the
>>> connector would be more suitable for our "connector library" feel free to
>>> open a JIRA and open a pull request.
>>> 
>>> Were there requests in the Ignite community to have an integration with
>>> Flink?
>>> 
>>> 
>>> 
>>> On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
>>> wrote:
>>> 
>>>> Hi ,
>>>> 
>>>> I agree with Roman and Raul.
>>>> https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
>>> to
>>>> into cache via Data Streamer. Integrating with Ignite FileSystem for
>>> source
>>>> and sink will allow for bidirectional connector. It will also allow
>>> easier
>>>> implementation for DataStream transformations over Ignite FileSystem.
>>>> 
>>>> Regards
>>>> Saikat
>>>> 
>>>> On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <aljoscha@apache.org
>>> 
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> it should already be possible to use the Ignite FileSystem to store
>>> state
>>>>> since we just use the HDFS FileSystem interface for that. Of course,
>>> one
>>>>> would have to properly set up the jars and paths and everything for
>>> Flink
>>>>> to pick up the IGFS classes.
>>>>> 
>>>>> Cheers,
>>>>> Aljoscha
>>>>> 
>>>>> On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org>
>> wrote:
>>>>> 
>>>>>> On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
>>>>> wrote:
>>>>>> 
>>>>>>> Raul,
>>>>>>> 
>>>>>>> Small comment from me.
>>>>>>> 
>>>>>>>> * As a Flink sink => inject data directly into a cache via a
>>>>>> DataStreamer.
>>>>>>> After reviews, IGNITE-813 is exactly this functionality.
>>>>>>> 
>>>>>>> 
>>>>>> That's cool, Roman! The idea would be to host these (richer)
>> modules
>>> as
>>>>>> Flink connectors, like they do with others:
>>>>>> 
>>>>>> 
>>> https://github.com/apache/flink/tree/master/flink-streaming-connectors
>>>>>> https://github.com/apache/flink/tree/master/flink-batch-connectors
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Apache Flink <=> Apache Ignite integration

Posted by Saikat Maitra <sa...@gmail.com>.

Hi

I have raised a PR for the following scope.

As a Flink source => run a continuous query against one or multiple
caches

PR https://github.com/apache/ignite/pull/870
Jira https://issues.apache.org/jira/browse/IGNITE-3303

Please review and share feedback.

Regards
Saikat


On Mon, Apr 4, 2016 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
>   - Sounds like the having Ignite for snapshots should work pretty much out
> of the box (via the IGFS)
>   - The source and sink connector sounds like the next logical step. Does
> Ignite have a notion of stream partitions and offsets, to build a
> consistent replay around? This should probably have its dedicated issue and
> discussion thread.
>
>   - For Ignite as an execution backend - I am not sure how relevant and
> feasible that is. Many DataStream API features make use of the specific
> Flink runtime. For streaming, the runtime is not as decoupled as for batch.
>   - I think the parameter server integration would not be part of the Flink
> codebase - this is a pretty application specific thing that should be its
> own project and it is actually not tightly coupled to Flink.
>
> Greetings,
> Stephan
>
>
> On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org>
> wrote:
>
> > Hi Raul,
> >
> > thanks a lot for reaching out to the Flink community.
> > I'm really excited to see a Flink connector in Ignite. If you feel that
> the
> > connector would be more suitable for our "connector library" feel free to
> > open a JIRA and open a pull request.
> >
> > Were there requests in the Ignite community to have an integration with
> > Flink?
> >
> >
> >
> > On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
> > wrote:
> >
> > > Hi ,
> > >
> > > I agree with Roman and Raul.
> > > https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
> > to
> > > into cache via Data Streamer. Integrating with Ignite FileSystem for
> > source
> > > and sink will allow for bidirectional connector. It will also allow
> > easier
> > > implementation for DataStream transformations over Ignite FileSystem.
> > >
> > > Regards
> > > Saikat
> > >
> > > On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <aljoscha@apache.org
> >
> > > wrote:
> > >
> > > > Hi,
> > > > it should already be possible to use the Ignite FileSystem to store
> > state
> > > > since we just use the HDFS FileSystem interface for that. Of course,
> > one
> > > > would have to properly set up the jars and paths and everything for
> > Flink
> > > > to pick up the IGFS classes.
> > > >
> > > > Cheers,
> > > > Aljoscha
> > > >
> > > > On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org>
> wrote:
> > > >
> > > > > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> > > > wrote:
> > > > >
> > > > > > Raul,
> > > > > >
> > > > > > Small comment from me.
> > > > > >
> > > > > > >* As a Flink sink => inject data directly into a cache via a
> > > > > DataStreamer.
> > > > > > After reviews, IGNITE-813 is exactly this functionality.
> > > > > >
> > > > > >
> > > > > That's cool, Roman! The idea would be to host these (richer)
> modules
> > as
> > > > > Flink connectors, like they do with others:
> > > > >
> > > > >
> > https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > > > > https://github.com/apache/flink/tree/master/flink-batch-connectors
> > > > >
> > > >
> > >
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Saikat Maitra <sa...@gmail.com>.

Hi

I have raised a PR for the following scope.

As a Flink source => run a continuous query against one or multiple
caches

PR https://github.com/apache/ignite/pull/870
Jira https://issues.apache.org/jira/browse/IGNITE-3303

Please review and share feedback.

Regards
Saikat


On Mon, Apr 4, 2016 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
>   - Sounds like the having Ignite for snapshots should work pretty much out
> of the box (via the IGFS)
>   - The source and sink connector sounds like the next logical step. Does
> Ignite have a notion of stream partitions and offsets, to build a
> consistent replay around? This should probably have its dedicated issue and
> discussion thread.
>
>   - For Ignite as an execution backend - I am not sure how relevant and
> feasible that is. Many DataStream API features make use of the specific
> Flink runtime. For streaming, the runtime is not as decoupled as for batch.
>   - I think the parameter server integration would not be part of the Flink
> codebase - this is a pretty application specific thing that should be its
> own project and it is actually not tightly coupled to Flink.
>
> Greetings,
> Stephan
>
>
> On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org>
> wrote:
>
> > Hi Raul,
> >
> > thanks a lot for reaching out to the Flink community.
> > I'm really excited to see a Flink connector in Ignite. If you feel that
> the
> > connector would be more suitable for our "connector library" feel free to
> > open a JIRA and open a pull request.
> >
> > Were there requests in the Ignite community to have an integration with
> > Flink?
> >
> >
> >
> > On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
> > wrote:
> >
> > > Hi ,
> > >
> > > I agree with Roman and Raul.
> > > https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
> > to
> > > into cache via Data Streamer. Integrating with Ignite FileSystem for
> > source
> > > and sink will allow for bidirectional connector. It will also allow
> > easier
> > > implementation for DataStream transformations over Ignite FileSystem.
> > >
> > > Regards
> > > Saikat
> > >
> > > On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <aljoscha@apache.org
> >
> > > wrote:
> > >
> > > > Hi,
> > > > it should already be possible to use the Ignite FileSystem to store
> > state
> > > > since we just use the HDFS FileSystem interface for that. Of course,
> > one
> > > > would have to properly set up the jars and paths and everything for
> > Flink
> > > > to pick up the IGFS classes.
> > > >
> > > > Cheers,
> > > > Aljoscha
> > > >
> > > > On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org>
> wrote:
> > > >
> > > > > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> > > > wrote:
> > > > >
> > > > > > Raul,
> > > > > >
> > > > > > Small comment from me.
> > > > > >
> > > > > > >* As a Flink sink => inject data directly into a cache via a
> > > > > DataStreamer.
> > > > > > After reviews, IGNITE-813 is exactly this functionality.
> > > > > >
> > > > > >
> > > > > That's cool, Roman! The idea would be to host these (richer)
> modules
> > as
> > > > > Flink connectors, like they do with others:
> > > > >
> > > > >
> > https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > > > > https://github.com/apache/flink/tree/master/flink-batch-connectors
> > > > >
> > > >
> > >
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Stephan Ewen <se...@apache.org>.

Hi Raul!

Concerning the source connector and position marker: Great idea!
The FlinkKafkaConsumer uses pretty much the same trick - the
offset-per-partition is used to filter during replays.

Greetings,
Stephan



On Tue, Apr 5, 2016 at 2:24 AM, Raul Kripalani <ra...@apache.org> wrote:

> On Mon, Apr 4, 2016 at 3:54 PM, Stephan Ewen <se...@apache.org> wrote:
>
> >
> >   - Sounds like the having Ignite for snapshots should work pretty much
> > out
> > of the box (via the IGFS)
> >   - The source and sink connector sounds like the next logical step. Does
> > Ignite have a notion of stream partitions and offsets, to build a
> > consistent replay around? This should probably have its dedicated issue
> and
> > discussion thread.
> >
> >   - For Ignite as an execution backend - I am not sure how relevant and
> > feasible that is. Many DataStream API features make use of the specific
> > Flink runtime. For streaming, the runtime is not as decoupled as for
> > batch.
> >   - I think the parameter server integration would not be part of the
> > Flink
> > codebase - this is a pretty application specific thing that should be its
> > own project and it is actually not tightly coupled to Flink.
>
>
> Danke, Stephan! I think I'll start with the sink/source connector – reusing
> what's already been committed to our codebase.
>
> With regards to source replayability, I plan to integrate Ignite Continuous
> Queries as a source. If the user's data objects contain an indexed
> ascending numeric or datetime field, we could use such a field as a
> "position marker" by launching the query with the appropriate WHERE filter
> when a replay is demanded.
>
> Do you have similar use cases with existing connectors?
>
> Cheers,
>
> *Raúl Kripalani*
> PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
> Messaging Engineer
> http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
> Blog: raul.io
> <http://raul.io/?utm_source=email&utm_medium=email&utm_campaign=apache> |
> twitter: @raulvk <https://twitter.com/raulvk>
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Raul Kripalani <ra...@apache.org>.

On Mon, Apr 4, 2016 at 3:54 PM, Stephan Ewen <se...@apache.org> wrote:

>
>   - Sounds like the having Ignite for snapshots should work pretty much
> out
> of the box (via the IGFS)
>   - The source and sink connector sounds like the next logical step. Does
> Ignite have a notion of stream partitions and offsets, to build a
> consistent replay around? This should probably have its dedicated issue and
> discussion thread.
>
>   - For Ignite as an execution backend - I am not sure how relevant and
> feasible that is. Many DataStream API features make use of the specific
> Flink runtime. For streaming, the runtime is not as decoupled as for
> batch.
>   - I think the parameter server integration would not be part of the
> Flink
> codebase - this is a pretty application specific thing that should be its
> own project and it is actually not tightly coupled to Flink.

Danke, Stephan! I think I'll start with the sink/source connector – reusing
what's already been committed to our codebase.

With regards to source replayability, I plan to integrate Ignite Continuous
Queries as a source. If the user's data objects contain an indexed
ascending numeric or datetime field, we could use such a field as a
"position marker" by launching the query with the appropriate WHERE filter
when a replay is demanded.

Do you have similar use cases with existing connectors?

Cheers,

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
Blog: raul.io
<http://raul.io/?utm_source=email&utm_medium=email&utm_campaign=apache> |
twitter: @raulvk <https://twitter.com/raulvk>

Re: Apache Flink <=> Apache Ignite integration

Posted by Stephan Ewen <se...@apache.org>.

Hi!

  - Sounds like the having Ignite for snapshots should work pretty much out
of the box (via the IGFS)
  - The source and sink connector sounds like the next logical step. Does
Ignite have a notion of stream partitions and offsets, to build a
consistent replay around? This should probably have its dedicated issue and
discussion thread.

  - For Ignite as an execution backend - I am not sure how relevant and
feasible that is. Many DataStream API features make use of the specific
Flink runtime. For streaming, the runtime is not as decoupled as for batch.
  - I think the parameter server integration would not be part of the Flink
codebase - this is a pretty application specific thing that should be its
own project and it is actually not tightly coupled to Flink.

Greetings,
Stephan


On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org> wrote:

> Hi Raul,
>
> thanks a lot for reaching out to the Flink community.
> I'm really excited to see a Flink connector in Ignite. If you feel that the
> connector would be more suitable for our "connector library" feel free to
> open a JIRA and open a pull request.
>
> Were there requests in the Ignite community to have an integration with
> Flink?
>
>
>
> On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
> wrote:
>
> > Hi ,
> >
> > I agree with Roman and Raul.
> > https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
> to
> > into cache via Data Streamer. Integrating with Ignite FileSystem for
> source
> > and sink will allow for bidirectional connector. It will also allow
> easier
> > implementation for DataStream transformations over Ignite FileSystem.
> >
> > Regards
> > Saikat
> >
> > On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > Hi,
> > > it should already be possible to use the Ignite FileSystem to store
> state
> > > since we just use the HDFS FileSystem interface for that. Of course,
> one
> > > would have to properly set up the jars and paths and everything for
> Flink
> > > to pick up the IGFS classes.
> > >
> > > Cheers,
> > > Aljoscha
> > >
> > > On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:
> > >
> > > > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> > > wrote:
> > > >
> > > > > Raul,
> > > > >
> > > > > Small comment from me.
> > > > >
> > > > > >* As a Flink sink => inject data directly into a cache via a
> > > > DataStreamer.
> > > > > After reviews, IGNITE-813 is exactly this functionality.
> > > > >
> > > > >
> > > > That's cool, Roman! The idea would be to host these (richer) modules
> as
> > > > Flink connectors, like they do with others:
> > > >
> > > >
> https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > > > https://github.com/apache/flink/tree/master/flink-batch-connectors
> > > >
> > >
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Stephan Ewen <se...@apache.org>.

Hi!

  - Sounds like the having Ignite for snapshots should work pretty much out
of the box (via the IGFS)
  - The source and sink connector sounds like the next logical step. Does
Ignite have a notion of stream partitions and offsets, to build a
consistent replay around? This should probably have its dedicated issue and
discussion thread.

  - For Ignite as an execution backend - I am not sure how relevant and
feasible that is. Many DataStream API features make use of the specific
Flink runtime. For streaming, the runtime is not as decoupled as for batch.
  - I think the parameter server integration would not be part of the Flink
codebase - this is a pretty application specific thing that should be its
own project and it is actually not tightly coupled to Flink.

Greetings,
Stephan


On Mon, Apr 4, 2016 at 4:35 PM, Robert Metzger <rm...@apache.org> wrote:

> Hi Raul,
>
> thanks a lot for reaching out to the Flink community.
> I'm really excited to see a Flink connector in Ignite. If you feel that the
> connector would be more suitable for our "connector library" feel free to
> open a JIRA and open a pull request.
>
> Were there requests in the Ignite community to have an integration with
> Flink?
>
>
>
> On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
> wrote:
>
> > Hi ,
> >
> > I agree with Roman and Raul.
> > https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data
> to
> > into cache via Data Streamer. Integrating with Ignite FileSystem for
> source
> > and sink will allow for bidirectional connector. It will also allow
> easier
> > implementation for DataStream transformations over Ignite FileSystem.
> >
> > Regards
> > Saikat
> >
> > On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > Hi,
> > > it should already be possible to use the Ignite FileSystem to store
> state
> > > since we just use the HDFS FileSystem interface for that. Of course,
> one
> > > would have to properly set up the jars and paths and everything for
> Flink
> > > to pick up the IGFS classes.
> > >
> > > Cheers,
> > > Aljoscha
> > >
> > > On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:
> > >
> > > > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> > > wrote:
> > > >
> > > > > Raul,
> > > > >
> > > > > Small comment from me.
> > > > >
> > > > > >* As a Flink sink => inject data directly into a cache via a
> > > > DataStreamer.
> > > > > After reviews, IGNITE-813 is exactly this functionality.
> > > > >
> > > > >
> > > > That's cool, Roman! The idea would be to host these (richer) modules
> as
> > > > Flink connectors, like they do with others:
> > > >
> > > >
> https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > > > https://github.com/apache/flink/tree/master/flink-batch-connectors
> > > >
> > >
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Raul Kripalani <ra...@apache.org>.

On Mon, Apr 4, 2016 at 3:35 PM, Robert Metzger <rm...@apache.org> wrote:

> thanks a lot for reaching out to the Flink community.
> I'm really excited to see a Flink connector in Ignite. If you feel that
> the
> connector would be more suitable for our "connector library" feel free to
> open a JIRA and open a pull request.
>

Will do.

> Were there requests in the Ignite community to have an integration with
> Flink?
>

Actually, there's a little story behind this. I'm personally interested in
reactive programming and I was keen on developing RxJava semantics for
Ignite, e.g. to consider DataStreamers as Observables and to apply
operators e.g. join, debounce, etc. In that exploration, Flink came up as a
synergistic project to integrate with and hence this thread.

Not as exciting as saying: "dude, we had 1000's of requests for this from
users" :) But once its there, I'm pretty sure people will use it.

Cheers,

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
Blog: raul.io
<http://raul.io/?utm_source=email&utm_medium=email&utm_campaign=apache> |
twitter: @raulvk <https://twitter.com/raulvk>

Re: Apache Flink <=> Apache Ignite integration

Posted by Robert Metzger <rm...@apache.org>.

Hi Raul,

thanks a lot for reaching out to the Flink community.
I'm really excited to see a Flink connector in Ignite. If you feel that the
connector would be more suitable for our "connector library" feel free to
open a JIRA and open a pull request.

Were there requests in the Ignite community to have an integration with
Flink?



On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
wrote:

> Hi ,
>
> I agree with Roman and Raul.
> https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data to
> into cache via Data Streamer. Integrating with Ignite FileSystem for source
> and sink will allow for bidirectional connector. It will also allow easier
> implementation for DataStream transformations over Ignite FileSystem.
>
> Regards
> Saikat
>
> On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > Hi,
> > it should already be possible to use the Ignite FileSystem to store state
> > since we just use the HDFS FileSystem interface for that. Of course, one
> > would have to properly set up the jars and paths and everything for Flink
> > to pick up the IGFS classes.
> >
> > Cheers,
> > Aljoscha
> >
> > On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:
> >
> > > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> > wrote:
> > >
> > > > Raul,
> > > >
> > > > Small comment from me.
> > > >
> > > > >* As a Flink sink => inject data directly into a cache via a
> > > DataStreamer.
> > > > After reviews, IGNITE-813 is exactly this functionality.
> > > >
> > > >
> > > That's cool, Roman! The idea would be to host these (richer) modules as
> > > Flink connectors, like they do with others:
> > >
> > > https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > > https://github.com/apache/flink/tree/master/flink-batch-connectors
> > >
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Robert Metzger <rm...@apache.org>.

Hi Raul,

thanks a lot for reaching out to the Flink community.
I'm really excited to see a Flink connector in Ignite. If you feel that the
connector would be more suitable for our "connector library" feel free to
open a JIRA and open a pull request.

Were there requests in the Ignite community to have an integration with
Flink?



On Thu, Mar 31, 2016 at 5:20 PM, Saikat Maitra <sa...@gmail.com>
wrote:

> Hi ,
>
> I agree with Roman and Raul.
> https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data to
> into cache via Data Streamer. Integrating with Ignite FileSystem for source
> and sink will allow for bidirectional connector. It will also allow easier
> implementation for DataStream transformations over Ignite FileSystem.
>
> Regards
> Saikat
>
> On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > Hi,
> > it should already be possible to use the Ignite FileSystem to store state
> > since we just use the HDFS FileSystem interface for that. Of course, one
> > would have to properly set up the jars and paths and everything for Flink
> > to pick up the IGFS classes.
> >
> > Cheers,
> > Aljoscha
> >
> > On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:
> >
> > > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> > wrote:
> > >
> > > > Raul,
> > > >
> > > > Small comment from me.
> > > >
> > > > >* As a Flink sink => inject data directly into a cache via a
> > > DataStreamer.
> > > > After reviews, IGNITE-813 is exactly this functionality.
> > > >
> > > >
> > > That's cool, Roman! The idea would be to host these (richer) modules as
> > > Flink connectors, like they do with others:
> > >
> > > https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > > https://github.com/apache/flink/tree/master/flink-batch-connectors
> > >
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Saikat Maitra <sa...@gmail.com>.

Hi ,

I agree with Roman and Raul.
https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data to
into cache via Data Streamer. Integrating with Ignite FileSystem for source
and sink will allow for bidirectional connector. It will also allow easier
implementation for DataStream transformations over Ignite FileSystem.

Regards
Saikat

On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> it should already be possible to use the Ignite FileSystem to store state
> since we just use the HDFS FileSystem interface for that. Of course, one
> would have to properly set up the jars and paths and everything for Flink
> to pick up the IGFS classes.
>
> Cheers,
> Aljoscha
>
> On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:
>
> > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> wrote:
> >
> > > Raul,
> > >
> > > Small comment from me.
> > >
> > > >* As a Flink sink => inject data directly into a cache via a
> > DataStreamer.
> > > After reviews, IGNITE-813 is exactly this functionality.
> > >
> > >
> > That's cool, Roman! The idea would be to host these (richer) modules as
> > Flink connectors, like they do with others:
> >
> > https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > https://github.com/apache/flink/tree/master/flink-batch-connectors
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Saikat Maitra <sa...@gmail.com>.

Hi ,

I agree with Roman and Raul.
https://issues.apache.org/jira/browse/IGNITE-813 allows injecting data to
into cache via Data Streamer. Integrating with Ignite FileSystem for source
and sink will allow for bidirectional connector. It will also allow easier
implementation for DataStream transformations over Ignite FileSystem.

Regards
Saikat

On Thu, Mar 31, 2016 at 2:44 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> it should already be possible to use the Ignite FileSystem to store state
> since we just use the HDFS FileSystem interface for that. Of course, one
> would have to properly set up the jars and paths and everything for Flink
> to pick up the IGFS classes.
>
> Cheers,
> Aljoscha
>
> On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:
>
> > On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid>
> wrote:
> >
> > > Raul,
> > >
> > > Small comment from me.
> > >
> > > >* As a Flink sink => inject data directly into a cache via a
> > DataStreamer.
> > > After reviews, IGNITE-813 is exactly this functionality.
> > >
> > >
> > That's cool, Roman! The idea would be to host these (richer) modules as
> > Flink connectors, like they do with others:
> >
> > https://github.com/apache/flink/tree/master/flink-streaming-connectors
> > https://github.com/apache/flink/tree/master/flink-batch-connectors
> >
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
it should already be possible to use the Ignite FileSystem to store state
since we just use the HDFS FileSystem interface for that. Of course, one
would have to properly set up the jars and paths and everything for Flink
to pick up the IGFS classes.

Cheers,
Aljoscha

On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:

> On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid> wrote:
>
> > Raul,
> >
> > Small comment from me.
> >
> > >* As a Flink sink => inject data directly into a cache via a
> DataStreamer.
> > After reviews, IGNITE-813 is exactly this functionality.
> >
> >
> That's cool, Roman! The idea would be to host these (richer) modules as
> Flink connectors, like they do with others:
>
> https://github.com/apache/flink/tree/master/flink-streaming-connectors
> https://github.com/apache/flink/tree/master/flink-batch-connectors
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
it should already be possible to use the Ignite FileSystem to store state
since we just use the HDFS FileSystem interface for that. Of course, one
would have to properly set up the jars and paths and everything for Flink
to pick up the IGFS classes.

Cheers,
Aljoscha

On Wed, 30 Mar 2016 at 16:50 Raul Kripalani <ra...@apache.org> wrote:

> On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid> wrote:
>
> > Raul,
> >
> > Small comment from me.
> >
> > >* As a Flink sink => inject data directly into a cache via a
> DataStreamer.
> > After reviews, IGNITE-813 is exactly this functionality.
> >
> >
> That's cool, Roman! The idea would be to host these (richer) modules as
> Flink connectors, like they do with others:
>
> https://github.com/apache/flink/tree/master/flink-streaming-connectors
> https://github.com/apache/flink/tree/master/flink-batch-connectors
>

Re: Apache Flink <=> Apache Ignite integration

Posted by Raul Kripalani <ra...@apache.org>.

On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid> wrote:

> Raul,
>
> Small comment from me.
>
> >* As a Flink sink => inject data directly into a cache via a DataStreamer.
> After reviews, IGNITE-813 is exactly this functionality.
>
>
That's cool, Roman! The idea would be to host these (richer) modules as
Flink connectors, like they do with others:

https://github.com/apache/flink/tree/master/flink-streaming-connectors
https://github.com/apache/flink/tree/master/flink-batch-connectors

Re: Apache Flink <=> Apache Ignite integration

Posted by Raul Kripalani <ra...@apache.org>.

On Wed, Mar 30, 2016 at 2:20 PM, Roman <rs...@yahoo.com.invalid> wrote:

> Raul,
>
> Small comment from me.
>
> >* As a Flink sink => inject data directly into a cache via a DataStreamer.
> After reviews, IGNITE-813 is exactly this functionality.
>
>
That's cool, Roman! The idea would be to host these (richer) modules as
Flink connectors, like they do with others:

https://github.com/apache/flink/tree/master/flink-streaming-connectors
https://github.com/apache/flink/tree/master/flink-batch-connectors

Re: Apache Flink <=> Apache Ignite integration

Posted by Roman <rs...@yahoo.com.INVALID>.

Raul,

Small comment from me.

>* As a Flink sink => inject data directly into a cache via a DataStreamer.
After reviews, IGNITE-813 is exactly this functionality.

-Roman

Re: Apache Flink <=> Apache Ignite integration

Posted by Roman <rs...@yahoo.com.INVALID>.

Raul,

Small comment from me.

>* As a Flink sink => inject data directly into a cache via a DataStreamer.
After reviews, IGNITE-813 is exactly this functionality.

-Roman