You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Denis Magda <dm...@apache.org> on 2017/12/08 04:46:29 UTC

Apache Ignite as a distributed processing back-ends

Hello Apache Beam fellows!

We at Apache Ignite community came across your project and would be happy to integrate with it. 

In short, Ignite is a distributed database and computational platform that has its own map-reduce like component:
https://apacheignite.readme.io/docs/compute-grid <https://apacheignite.readme.io/docs/compute-grid>

The integration will give Beam users an ability to use Ignite as a distributed processing back-end system and database.

How should we proceed? Please share any relevant information.

—
Denis
Ignite PMC

Re: Apache Ignite as a distributed processing back-ends

Posted by Ismaël Mejía <ie...@gmail.com>.
Hello Denis,

This is really gret news, I think Ignite can be integrated on Beam as
an IO in that case Beam developers will read/write their data from/to
Ignite from their data processing pipelines.

You can take a look at some of the existing IOs for ideas and follow
the Ptransform guide for style
https://github.com/apache/beam/tree/master/sdks/java/io
https://beam.apache.org/contribute/ptransform-style-guide/

Notice that there is an open JIRA to support a JCache based connector
so a good idea would be to implement it and use Ignite as the
reference example (of course you can go the Ignite native route but
community wise this would be neat).
https://issues.apache.org/jira/browse/BEAM-2584

From a quicklook at the Compute Grid documentation in the website it
seems also that it could make sense to integrate Ignite into Beam as a
runner. This requires translating the Beam model into the appropriate
Ignite API. For this the best reference to start is :
https://beam.apache.org/contribute/runner-guide/

Also I saw you that you guys have a Filesystem Ignite’s (IGFS) with
support for HDFS so a first quick contribution would be to validate
that it works with Beam and add some documentation on how to use it.

Don’t hesitate to ask questions, create JIRAs, or contact us here or
in the slack channel if needed.

Best,
Ismaël

On Fri, Dec 8, 2017 at 6:54 AM, Romain Manni-Bucau
<rm...@gmail.com> wrote:
> Hi
>
> This sounds awesome to have an Ignite runner which could compete with
> hazelcast-jet.
>
> The entry point would be https://beam.apache.org/contribute/runner-guide/
> IMHO.
>
> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
> or distributed structures. Very exiting.
>
> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>>
>> Hello Apache Beam fellows!
>>
>> We at Apache Ignite community came across your project and would be happy
>> to integrate with it.
>>
>> In short, Ignite is a distributed database and computational platform that
>> has its own map-reduce like component:
>> https://apacheignite.readme.io/docs/compute-grid
>>
>> The integration will give Beam users an ability to use Ignite as a
>> distributed processing back-end system and database.
>>
>> How should we proceed? Please share any relevant information.
>>
>> —
>> Denis
>> Ignite PMC

Re: Apache Ignite as a distributed processing back-ends

Posted by Denis Magda <dm...@apache.org>.
Those are the valid points. Personally, I would go for the beam-repo way because it will guarantee that the integration works as expected with every Beam release. This is for instance how Ignite is integrated with Camel, MyBatis, Zeppelin.

Anyway, here is a ticket. Hope that an Ignite fellow will step in and solve it in the nearest time:
https://issues.apache.org/jira/browse/IGNITE-7198 <https://issues.apache.org/jira/browse/IGNITE-7198>

—
Denis

> On Dec 12, 2017, at 9:52 PM, Romain Manni-Bucau <rm...@gmail.com> wrote:
> 
> Hosting integrations in impl more than beam makes a lot of sense IMHO while
> you can maintain it and follow beam release cycle. It will enable you to
> evolve faster and optimise/adapt it more accurately. If you dont have the
> resources, beam would fit better and guarantee it works with each release.
> 
> My 2 cts
> 
> Le 13 déc. 2017 00:47, "Lukasz Cwik" <lc...@google.com> a écrit :
> 
>> Having it inside the Apache Beam repo makes sense and I could see it being
>> a good fit as an IO and as a runner.
>> 
>> On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:
>> 
>>> Hi Romain,
>>> 
>>> Thanks for the reference. Do you prefer to have the Ignite runner in
>>> Beam’s code base?
>>> 
>>> From what I see, the current runners are hosted there:
>>> https://github.com/apache/beam/tree/master/runners
>>> 
>>> As for Ignite community, we would prefer to hold the integration in your
>>> repo.
>>> 
>>> —
>>> Denis
>>> 
>>> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
>>> wrote:
>>> 
>>> Hi
>>> 
>>> This sounds awesome to have an Ignite runner which could compete with
>>> hazelcast-jet.
>>> 
>>> The entry point would be https://beam.apache.org/contribute/runner-guide/
>>> IMHO.
>>> 
>>> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
>>> or distributed structures. Very exiting.
>>> 
>>> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>>> 
>>> Hello Apache Beam fellows!
>>> 
>>> We at Apache Ignite community came across your project and would be happy
>>> to integrate with it.
>>> 
>>> In short, Ignite is a distributed database and computational platform that
>>> has its own map-reduce like component:
>>> https://apacheignite.readme.io/docs/compute-grid
>>> 
>>> The integration will give Beam users an ability to use Ignite as a
>>> distributed processing back-end system and database.
>>> 
>>> How should we proceed? Please share any relevant information.
>>> 
>>> —
>>> Denis
>>> Ignite PMC
>>> 
>>> 
>>> 
>> 


Re: Apache Ignite as a distributed processing back-ends

Posted by Denis Magda <dm...@apache.org>.
Those are the valid points. Personally, I would go for the beam-repo way because it will guarantee that the integration works as expected with every Beam release. This is for instance how Ignite is integrated with Camel, MyBatis, Zeppelin.

Anyway, here is a ticket. Hope that an Ignite fellow will step in and solve it in the nearest time:
https://issues.apache.org/jira/browse/IGNITE-7198 <https://issues.apache.org/jira/browse/IGNITE-7198>

—
Denis

> On Dec 12, 2017, at 9:52 PM, Romain Manni-Bucau <rm...@gmail.com> wrote:
> 
> Hosting integrations in impl more than beam makes a lot of sense IMHO while
> you can maintain it and follow beam release cycle. It will enable you to
> evolve faster and optimise/adapt it more accurately. If you dont have the
> resources, beam would fit better and guarantee it works with each release.
> 
> My 2 cts
> 
> Le 13 déc. 2017 00:47, "Lukasz Cwik" <lc...@google.com> a écrit :
> 
>> Having it inside the Apache Beam repo makes sense and I could see it being
>> a good fit as an IO and as a runner.
>> 
>> On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:
>> 
>>> Hi Romain,
>>> 
>>> Thanks for the reference. Do you prefer to have the Ignite runner in
>>> Beam’s code base?
>>> 
>>> From what I see, the current runners are hosted there:
>>> https://github.com/apache/beam/tree/master/runners
>>> 
>>> As for Ignite community, we would prefer to hold the integration in your
>>> repo.
>>> 
>>> —
>>> Denis
>>> 
>>> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
>>> wrote:
>>> 
>>> Hi
>>> 
>>> This sounds awesome to have an Ignite runner which could compete with
>>> hazelcast-jet.
>>> 
>>> The entry point would be https://beam.apache.org/contribute/runner-guide/
>>> IMHO.
>>> 
>>> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
>>> or distributed structures. Very exiting.
>>> 
>>> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>>> 
>>> Hello Apache Beam fellows!
>>> 
>>> We at Apache Ignite community came across your project and would be happy
>>> to integrate with it.
>>> 
>>> In short, Ignite is a distributed database and computational platform that
>>> has its own map-reduce like component:
>>> https://apacheignite.readme.io/docs/compute-grid
>>> 
>>> The integration will give Beam users an ability to use Ignite as a
>>> distributed processing back-end system and database.
>>> 
>>> How should we proceed? Please share any relevant information.
>>> 
>>> —
>>> Denis
>>> Ignite PMC
>>> 
>>> 
>>> 
>> 


Re: Apache Ignite as a distributed processing back-ends

Posted by Ismaël Mejía <ie...@gmail.com>.
I personally think that having the code as part of Apache Beam is better:

Advantages:
1. Refactorings + new functionalities for free, notice that the runner
APIs are internal and still evolving due to new ideas like the
portability API.
2. All the Beam infrastructure of the project for ‘free’ (CI, JIRA,
website, etc).
3. The Beam community awareness/support/validation.

The only possible issue I see is that your code would be donated to
Apache. (in case you guys want to keep it closed or under your own
IP). What we commonly do is to have the runners live in their own
branches until they are mature enough to be merged into maste. Notice
that for example the Storm community hosts their own Beam runner. All
the other open source runners are hosted on the Beam repo. However I
am pretty sure most of the people on the Beam community are unaware of
this runner (or its progress).

On Wed, Dec 13, 2017 at 6:52 AM, Romain Manni-Bucau
<rm...@gmail.com> wrote:
> Hosting integrations in impl more than beam makes a lot of sense IMHO while
> you can maintain it and follow beam release cycle. It will enable you to
> evolve faster and optimise/adapt it more accurately. If you dont have the
> resources, beam would fit better and guarantee it works with each release.
>
> My 2 cts
>
> Le 13 déc. 2017 00:47, "Lukasz Cwik" <lc...@google.com> a écrit :
>>
>> Having it inside the Apache Beam repo makes sense and I could see it being
>> a good fit as an IO and as a runner.
>>
>> On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:
>>>
>>> Hi Romain,
>>>
>>> Thanks for the reference. Do you prefer to have the Ignite runner in
>>> Beam’s code base?
>>>
>>> From what I see, the current runners are hosted there:
>>> https://github.com/apache/beam/tree/master/runners
>>>
>>> As for Ignite community, we would prefer to hold the integration in your
>>> repo.
>>>
>>> —
>>> Denis
>>>
>>> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
>>> wrote:
>>>
>>> Hi
>>>
>>> This sounds awesome to have an Ignite runner which could compete with
>>> hazelcast-jet.
>>>
>>> The entry point would be https://beam.apache.org/contribute/runner-guide/
>>> IMHO.
>>>
>>> Being on Ignite cluster also opens a lot of doors - reusing the
>>> filesystem
>>> or distributed structures. Very exiting.
>>>
>>> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>>>
>>> Hello Apache Beam fellows!
>>>
>>> We at Apache Ignite community came across your project and would be happy
>>> to integrate with it.
>>>
>>> In short, Ignite is a distributed database and computational platform
>>> that
>>> has its own map-reduce like component:
>>> https://apacheignite.readme.io/docs/compute-grid
>>>
>>> The integration will give Beam users an ability to use Ignite as a
>>> distributed processing back-end system and database.
>>>
>>> How should we proceed? Please share any relevant information.
>>>
>>> —
>>> Denis
>>> Ignite PMC
>>>
>>>
>>
>

Re: Apache Ignite as a distributed processing back-ends

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hosting integrations in impl more than beam makes a lot of sense IMHO while
you can maintain it and follow beam release cycle. It will enable you to
evolve faster and optimise/adapt it more accurately. If you dont have the
resources, beam would fit better and guarantee it works with each release.

My 2 cts

Le 13 déc. 2017 00:47, "Lukasz Cwik" <lc...@google.com> a écrit :

> Having it inside the Apache Beam repo makes sense and I could see it being
> a good fit as an IO and as a runner.
>
> On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:
>
>> Hi Romain,
>>
>> Thanks for the reference. Do you prefer to have the Ignite runner in
>> Beam’s code base?
>>
>> From what I see, the current runners are hosted there:
>> https://github.com/apache/beam/tree/master/runners
>>
>> As for Ignite community, we would prefer to hold the integration in your
>> repo.
>>
>> —
>> Denis
>>
>> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>> Hi
>>
>> This sounds awesome to have an Ignite runner which could compete with
>> hazelcast-jet.
>>
>> The entry point would be https://beam.apache.org/contribute/runner-guide/
>> IMHO.
>>
>> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
>> or distributed structures. Very exiting.
>>
>> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>>
>> Hello Apache Beam fellows!
>>
>> We at Apache Ignite community came across your project and would be happy
>> to integrate with it.
>>
>> In short, Ignite is a distributed database and computational platform that
>> has its own map-reduce like component:
>> https://apacheignite.readme.io/docs/compute-grid
>>
>> The integration will give Beam users an ability to use Ignite as a
>> distributed processing back-end system and database.
>>
>> How should we proceed? Please share any relevant information.
>>
>> —
>> Denis
>> Ignite PMC
>>
>>
>>
>

Re: Apache Ignite as a distributed processing back-ends

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hosting integrations in impl more than beam makes a lot of sense IMHO while
you can maintain it and follow beam release cycle. It will enable you to
evolve faster and optimise/adapt it more accurately. If you dont have the
resources, beam would fit better and guarantee it works with each release.

My 2 cts

Le 13 déc. 2017 00:47, "Lukasz Cwik" <lc...@google.com> a écrit :

> Having it inside the Apache Beam repo makes sense and I could see it being
> a good fit as an IO and as a runner.
>
> On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:
>
>> Hi Romain,
>>
>> Thanks for the reference. Do you prefer to have the Ignite runner in
>> Beam’s code base?
>>
>> From what I see, the current runners are hosted there:
>> https://github.com/apache/beam/tree/master/runners
>>
>> As for Ignite community, we would prefer to hold the integration in your
>> repo.
>>
>> —
>> Denis
>>
>> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>> Hi
>>
>> This sounds awesome to have an Ignite runner which could compete with
>> hazelcast-jet.
>>
>> The entry point would be https://beam.apache.org/contribute/runner-guide/
>> IMHO.
>>
>> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
>> or distributed structures. Very exiting.
>>
>> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>>
>> Hello Apache Beam fellows!
>>
>> We at Apache Ignite community came across your project and would be happy
>> to integrate with it.
>>
>> In short, Ignite is a distributed database and computational platform that
>> has its own map-reduce like component:
>> https://apacheignite.readme.io/docs/compute-grid
>>
>> The integration will give Beam users an ability to use Ignite as a
>> distributed processing back-end system and database.
>>
>> How should we proceed? Please share any relevant information.
>>
>> —
>> Denis
>> Ignite PMC
>>
>>
>>
>

Re: Apache Ignite as a distributed processing back-ends

Posted by Lukasz Cwik <lc...@google.com>.
Having it inside the Apache Beam repo makes sense and I could see it being
a good fit as an IO and as a runner.

On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:

> Hi Romain,
>
> Thanks for the reference. Do you prefer to have the Ignite runner in
> Beam’s code base?
>
> From what I see, the current runners are hosted there: https://github.com/
> apache/beam/tree/master/runners
>
> As for Ignite community, we would prefer to hold the integration in your
> repo.
>
> —
> Denis
>
> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
> Hi
>
> This sounds awesome to have an Ignite runner which could compete with
> hazelcast-jet.
>
> The entry point would be https://beam.apache.org/contribute/runner-guide/
> IMHO.
>
> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
> or distributed structures. Very exiting.
>
> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>
> Hello Apache Beam fellows!
>
> We at Apache Ignite community came across your project and would be happy
> to integrate with it.
>
> In short, Ignite is a distributed database and computational platform that
> has its own map-reduce like component:
> https://apacheignite.readme.io/docs/compute-grid
>
> The integration will give Beam users an ability to use Ignite as a
> distributed processing back-end system and database.
>
> How should we proceed? Please share any relevant information.
>
> —
> Denis
> Ignite PMC
>
>
>

Re: Apache Ignite as a distributed processing back-ends

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
Having it inside the Apache Beam repo makes sense and I could see it being
a good fit as an IO and as a runner.

On Tue, Dec 12, 2017 at 3:29 PM, Denis Magda <dm...@apache.org> wrote:

> Hi Romain,
>
> Thanks for the reference. Do you prefer to have the Ignite runner in
> Beam’s code base?
>
> From what I see, the current runners are hosted there: https://github.com/
> apache/beam/tree/master/runners
>
> As for Ignite community, we would prefer to hold the integration in your
> repo.
>
> —
> Denis
>
> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
> Hi
>
> This sounds awesome to have an Ignite runner which could compete with
> hazelcast-jet.
>
> The entry point would be https://beam.apache.org/contribute/runner-guide/
> IMHO.
>
> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
> or distributed structures. Very exiting.
>
> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
>
> Hello Apache Beam fellows!
>
> We at Apache Ignite community came across your project and would be happy
> to integrate with it.
>
> In short, Ignite is a distributed database and computational platform that
> has its own map-reduce like component:
> https://apacheignite.readme.io/docs/compute-grid
>
> The integration will give Beam users an ability to use Ignite as a
> distributed processing back-end system and database.
>
> How should we proceed? Please share any relevant information.
>
> —
> Denis
> Ignite PMC
>
>
>

Re: Apache Ignite as a distributed processing back-ends

Posted by Denis Magda <dm...@apache.org>.
Hi Romain,

Thanks for the reference. Do you prefer to have the Ignite runner in Beam’s code base?

From what I see, the current runners are hosted there: https://github.com/apache/beam/tree/master/runners <https://github.com/apache/beam/tree/master/runners>

As for Ignite community, we would prefer to hold the integration in your repo.

—
Denis

> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com> wrote:
> 
> Hi
> 
> This sounds awesome to have an Ignite runner which could compete with
> hazelcast-jet.
> 
> The entry point would be https://beam.apache.org/contribute/runner-guide/
> IMHO.
> 
> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
> or distributed structures. Very exiting.
> 
> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
> 
>> Hello Apache Beam fellows!
>> 
>> We at Apache Ignite community came across your project and would be happy
>> to integrate with it.
>> 
>> In short, Ignite is a distributed database and computational platform that
>> has its own map-reduce like component:
>> https://apacheignite.readme.io/docs/compute-grid
>> 
>> The integration will give Beam users an ability to use Ignite as a
>> distributed processing back-end system and database.
>> 
>> How should we proceed? Please share any relevant information.
>> 
>> —
>> Denis
>> Ignite PMC
>> 


Re: Apache Ignite as a distributed processing back-ends

Posted by Denis Magda <dm...@apache.org>.
Hi Romain,

Thanks for the reference. Do you prefer to have the Ignite runner in Beam’s code base?

From what I see, the current runners are hosted there: https://github.com/apache/beam/tree/master/runners <https://github.com/apache/beam/tree/master/runners>

As for Ignite community, we would prefer to hold the integration in your repo.

—
Denis

> On Dec 7, 2017, at 9:54 PM, Romain Manni-Bucau <rm...@gmail.com> wrote:
> 
> Hi
> 
> This sounds awesome to have an Ignite runner which could compete with
> hazelcast-jet.
> 
> The entry point would be https://beam.apache.org/contribute/runner-guide/
> IMHO.
> 
> Being on Ignite cluster also opens a lot of doors - reusing the filesystem
> or distributed structures. Very exiting.
> 
> Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :
> 
>> Hello Apache Beam fellows!
>> 
>> We at Apache Ignite community came across your project and would be happy
>> to integrate with it.
>> 
>> In short, Ignite is a distributed database and computational platform that
>> has its own map-reduce like component:
>> https://apacheignite.readme.io/docs/compute-grid
>> 
>> The integration will give Beam users an ability to use Ignite as a
>> distributed processing back-end system and database.
>> 
>> How should we proceed? Please share any relevant information.
>> 
>> —
>> Denis
>> Ignite PMC
>> 


Re: Apache Ignite as a distributed processing back-ends

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi

This sounds awesome to have an Ignite runner which could compete with
hazelcast-jet.

The entry point would be https://beam.apache.org/contribute/runner-guide/
IMHO.

Being on Ignite cluster also opens a lot of doors - reusing the filesystem
or distributed structures. Very exiting.

Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :

> Hello Apache Beam fellows!
>
> We at Apache Ignite community came across your project and would be happy
> to integrate with it.
>
> In short, Ignite is a distributed database and computational platform that
> has its own map-reduce like component:
> https://apacheignite.readme.io/docs/compute-grid
>
> The integration will give Beam users an ability to use Ignite as a
> distributed processing back-end system and database.
>
> How should we proceed? Please share any relevant information.
>
> —
> Denis
> Ignite PMC
>

Re: Apache Ignite as a distributed processing back-ends

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi

This sounds awesome to have an Ignite runner which could compete with
hazelcast-jet.

The entry point would be https://beam.apache.org/contribute/runner-guide/
IMHO.

Being on Ignite cluster also opens a lot of doors - reusing the filesystem
or distributed structures. Very exiting.

Le 8 déc. 2017 05:46, "Denis Magda" <dm...@apache.org> a écrit :

> Hello Apache Beam fellows!
>
> We at Apache Ignite community came across your project and would be happy
> to integrate with it.
>
> In short, Ignite is a distributed database and computational platform that
> has its own map-reduce like component:
> https://apacheignite.readme.io/docs/compute-grid
>
> The integration will give Beam users an ability to use Ignite as a
> distributed processing back-end system and database.
>
> How should we proceed? Please share any relevant information.
>
> —
> Denis
> Ignite PMC
>