You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Bobby Evans <bo...@apache.org> on 2016/05/11 20:00:36 UTC

Beam and Storm

I have been trying to get my head wrapped around some of the internals of
beam so I can come up with an architecture/plan for STORM-1757
<https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
<https://issues.apache.org/jira/browse/BEAM-9>.

I see that there are Sources and Sinks.  Sources can be unbounded, but
there appears to be no equivalent to an unbounded Sink.  What I do find are
things like WriteToBigQuery which despite some internal complexity ends up
being an idempotent transform producing a PDone.  Is this the intended way
for data to be output from a streaming DAG?

I will likely have more questions as I dig more into state checkpointing,
etc.

Thanks,

Bobby Evans

Re: Beam and Storm

Posted by "Matthias J. Sax" <mj...@apache.org>.

Bobby is right. It is unrelated.

As your plan is, to rewrite your Strom code to Beam code, Flink will
just execute Beam code (there is not relation to Storm anymore).

In contrast, if you want to run Storm code in Flink without rewriting
your Storm code, you would use Flink's compatibility layer for Strom.
This layers "translates" everything automatically into a Flink
DataStream program.

Be aware that Flink's compatibility layer for Storm is a beta version
lacking some features...

-Matthias

On 05/12/2016 02:33 AM, Bobby Evans wrote:
> Amir,
> 
> That link is for the Storm compatibility layer that Flink offers, it has
> nothing really to do with beam at all, except that flink offers support for
> both BEAM as an API layer, and storm as an API layer.
> 
> - Bobby
> 
> On Wed, May 11, 2016 at 6:58 PM amir bahmanyari <am...@yahoo.com> wrote:
> 
>> Thanks so much for your response Bobby.
>> I have been reading this link for a while, and researching its details.
>> Apache Flink 0.10.2 Documentation: Storm Compatibility
>> <https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html>
>>
>> Apache Flink 0.10.2 Documentation: Storm Compatibility
>>
>> <https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html>
>>
>> I have not started to port our Storm app to Beam yest.
>> But am going to start soon once I execute a FlinkRunner Beam app in my lab
>> cluster LOL!
>> Its getting there...
>>
>> Question to the community: Is this link above the latest & the greatest on
>> porting Storm to Beam (Flink or Spark).
>> Cheers
>> ------------------------------
>> *From:* Bobby Evans <bo...@apache.org>
>> *To:* dev@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
>>
>> *Sent:* Wednesday, May 11, 2016 1:56 PM
>>
>> *Subject:* Re: Beam and Storm
>>
>> Amir,
>>
>> I am kind of new at this so I might be wrong, but in general if your bolts
>> are stateless then they would translate directly into ParDos one to one.
>> The spouts are more complex and you would want to find an existing
>> unbounded source, or you would have to write one.
>>
>> I don't know if there are any custom groupings in BEAM, there doesn't
>> appear to be, but in general a shuffle grouping would become a noop, a
>> fields grouping becomes a GroupByKey transform.
>>
>> If your bolts are stateful you need to find a way using windowing and
>> aggregation to produce an equivalent result.
>>
>> If you are using trident, it is similar all functions become ParDos.  The
>> thing here is that for the state you need to find an equivalent output
>> ParDo, or you need to write one.
>>
>> - Bobby
>>
>>
>> On Wed, May 11, 2016 at 3:37 PM amir bahmanyari
>> <am...@yahoo.com.invalid> wrote:
>>
>> Is there a guideline on how to port an existing Storm code to Beam?Thanks
>>
>>       From: Bobby Evans <bo...@apache.org>
>>  To: dev@beam.incubator.apache.org
>>  Sent: Wednesday, May 11, 2016 1:25 PM
>>  Subject: Re: Beam and Storm
>>
>> Thanks for the quick response.  I'll keep digging.
>>
>> On Wed, May 11, 2016 at 3:14 PM Dan Halperin <dh...@google.com.invalid>
>> wrote:
>>
>>> Hi Bobby,
>>>
>>> You are correct that unbounded sinks are currently implemented as
>>> PTransforms that interact with an external service in a ParDo. Even
>> bounded
>>> sinks are implemented this way -- look at the internals of the Write
>>> transform -- the sink class itself is just a convenience wrapper for one
>>> common pattern.
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:
>>>
>>>> I have been trying to get my head wrapped around some of the internals
>> of
>>>> beam so I can come up with an architecture/plan for STORM-1757
>>>> <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
>>>> <https://issues.apache.org/jira/browse/BEAM-9>.
>>>>
>>>> I see that there are Sources and Sinks.  Sources can be unbounded, but
>>>> there appears to be no equivalent to an unbounded Sink.  What I do find
>>> are
>>>> things like WriteToBigQuery which despite some internal complexity ends
>>> up
>>>> being an idempotent transform producing a PDone.  Is this the intended
>>> way
>>>> for data to be output from a streaming DAG?
>>>>
>>>> I will likely have more questions as I dig more into state
>> checkpointing,
>>>> etc.
>>>>
>>>> Thanks,
>>>>
>>>> Bobby Evans
>>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Beam and Storm

Posted by Bobby Evans <bo...@apache.org>.

Amir,

That link is for the Storm compatibility layer that Flink offers, it has
nothing really to do with beam at all, except that flink offers support for
both BEAM as an API layer, and storm as an API layer.

- Bobby

On Wed, May 11, 2016 at 6:58 PM amir bahmanyari <am...@yahoo.com> wrote:

> Thanks so much for your response Bobby.
> I have been reading this link for a while, and researching its details.
> Apache Flink 0.10.2 Documentation: Storm Compatibility
> <https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html>
>
> Apache Flink 0.10.2 Documentation: Storm Compatibility
>
> <https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/storm_compatibility.html>
>
> I have not started to port our Storm app to Beam yest.
> But am going to start soon once I execute a FlinkRunner Beam app in my lab
> cluster LOL!
> Its getting there...
>
> Question to the community: Is this link above the latest & the greatest on
> porting Storm to Beam (Flink or Spark).
> Cheers
> ------------------------------
> *From:* Bobby Evans <bo...@apache.org>
> *To:* dev@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
>
> *Sent:* Wednesday, May 11, 2016 1:56 PM
>
> *Subject:* Re: Beam and Storm
>
> Amir,
>
> I am kind of new at this so I might be wrong, but in general if your bolts
> are stateless then they would translate directly into ParDos one to one.
> The spouts are more complex and you would want to find an existing
> unbounded source, or you would have to write one.
>
> I don't know if there are any custom groupings in BEAM, there doesn't
> appear to be, but in general a shuffle grouping would become a noop, a
> fields grouping becomes a GroupByKey transform.
>
> If your bolts are stateful you need to find a way using windowing and
> aggregation to produce an equivalent result.
>
> If you are using trident, it is similar all functions become ParDos.  The
> thing here is that for the state you need to find an equivalent output
> ParDo, or you need to write one.
>
> - Bobby
>
>
> On Wed, May 11, 2016 at 3:37 PM amir bahmanyari
> <am...@yahoo.com.invalid> wrote:
>
> Is there a guideline on how to port an existing Storm code to Beam?Thanks
>
>       From: Bobby Evans <bo...@apache.org>
>  To: dev@beam.incubator.apache.org
>  Sent: Wednesday, May 11, 2016 1:25 PM
>  Subject: Re: Beam and Storm
>
> Thanks for the quick response.  I'll keep digging.
>
> On Wed, May 11, 2016 at 3:14 PM Dan Halperin <dh...@google.com.invalid>
> wrote:
>
> > Hi Bobby,
> >
> > You are correct that unbounded sinks are currently implemented as
> > PTransforms that interact with an external service in a ParDo. Even
> bounded
> > sinks are implemented this way -- look at the internals of the Write
> > transform -- the sink class itself is just a convenience wrapper for one
> > common pattern.
> >
> > Thanks,
> > Dan
> >
> > On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:
> >
> > > I have been trying to get my head wrapped around some of the internals
> of
> > > beam so I can come up with an architecture/plan for STORM-1757
> > > <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
> > > <https://issues.apache.org/jira/browse/BEAM-9>.
> > >
> > > I see that there are Sources and Sinks.  Sources can be unbounded, but
> > > there appears to be no equivalent to an unbounded Sink.  What I do find
> > are
> > > things like WriteToBigQuery which despite some internal complexity ends
> > up
> > > being an idempotent transform producing a PDone.  Is this the intended
> > way
> > > for data to be output from a streaming DAG?
> > >
> > > I will likely have more questions as I dig more into state
> checkpointing,
> > > etc.
> > >
> > > Thanks,
> > >
> > > Bobby Evans
> > >
> >
>
>
>
>
>
>
>

Re: Beam and Storm

Posted by amir bahmanyari <am...@yahoo.com.INVALID>.

Thanks so much for your response Bobby.I have been reading this link for a while, and researching its details.Apache Flink 0.10.2 Documentation: Storm Compatibility

|  
|   
|   
|   |    |

   |

  |
|  
|   |  
Apache Flink 0.10.2 Documentation: Storm Compatibility
   |   |

  |

  |

I have not started to port our Storm app to Beam yest.But am going to start soon once I execute a FlinkRunner Beam app in my lab cluster LOL!Its getting there...
Question to the community: Is this link above the latest & the greatest on porting Storm to Beam (Flink or Spark).Cheers

      From: Bobby Evans <bo...@apache.org>
 To: dev@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com> 
 Sent: Wednesday, May 11, 2016 1:56 PM
 Subject: Re: Beam and Storm

Amir,
I am kind of new at this so I might be wrong, but in general if your bolts are stateless then they would translate directly into ParDos one to one.  The spouts are more complex and you would want to find an existing unbounded source, or you would have to write one.
I don't know if there are any custom groupings in BEAM, there doesn't appear to be, but in general a shuffle grouping would become a noop, a fields grouping becomes a GroupByKey transform.
If your bolts are stateful you need to find a way using windowing and aggregation to produce an equivalent result.

If you are using trident, it is similar all functions become ParDos.  The thing here is that for the state you need to find an equivalent output ParDo, or you need to write one.
- Bobby

On Wed, May 11, 2016 at 3:37 PM amir bahmanyari <am...@yahoo.com.invalid> wrote:

Is there a guideline on how to port an existing Storm code to Beam?Thanks

      From: Bobby Evans <bo...@apache.org>
 To: dev@beam.incubator.apache.org
 Sent: Wednesday, May 11, 2016 1:25 PM
 Subject: Re: Beam and Storm

Thanks for the quick response.  I'll keep digging.

On Wed, May 11, 2016 at 3:14 PM Dan Halperin <dh...@google.com.invalid>
wrote:

> Hi Bobby,
>
> You are correct that unbounded sinks are currently implemented as
> PTransforms that interact with an external service in a ParDo. Even bounded
> sinks are implemented this way -- look at the internals of the Write
> transform -- the sink class itself is just a convenience wrapper for one
> common pattern.
>
> Thanks,
> Dan
>
> On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:
>
> > I have been trying to get my head wrapped around some of the internals of
> > beam so I can come up with an architecture/plan for STORM-1757
> > <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
> > <https://issues.apache.org/jira/browse/BEAM-9>.
> >
> > I see that there are Sources and Sinks.  Sources can be unbounded, but
> > there appears to be no equivalent to an unbounded Sink.  What I do find
> are
> > things like WriteToBigQuery which despite some internal complexity ends
> up
> > being an idempotent transform producing a PDone.  Is this the intended
> way
> > for data to be output from a streaming DAG?
> >
> > I will likely have more questions as I dig more into state checkpointing,
> > etc.
> >
> > Thanks,
> >
> > Bobby Evans
> >
>

Re: Beam and Storm

Posted by Bobby Evans <bo...@apache.org>.

Amir,

I am kind of new at this so I might be wrong, but in general if your bolts
are stateless then they would translate directly into ParDos one to one.
The spouts are more complex and you would want to find an existing
unbounded source, or you would have to write one.

I don't know if there are any custom groupings in BEAM, there doesn't
appear to be, but in general a shuffle grouping would become a noop, a
fields grouping becomes a GroupByKey transform.

If your bolts are stateful you need to find a way using windowing and
aggregation to produce an equivalent result.

If you are using trident, it is similar all functions become ParDos.  The
thing here is that for the state you need to find an equivalent output
ParDo, or you need to write one.

- Bobby

On Wed, May 11, 2016 at 3:37 PM amir bahmanyari <am...@yahoo.com.invalid>
wrote:

> Is there a guideline on how to port an existing Storm code to Beam?Thanks
>
>       From: Bobby Evans <bo...@apache.org>
>  To: dev@beam.incubator.apache.org
>  Sent: Wednesday, May 11, 2016 1:25 PM
>  Subject: Re: Beam and Storm
>
> Thanks for the quick response.  I'll keep digging.
>
> On Wed, May 11, 2016 at 3:14 PM Dan Halperin <dh...@google.com.invalid>
> wrote:
>
> > Hi Bobby,
> >
> > You are correct that unbounded sinks are currently implemented as
> > PTransforms that interact with an external service in a ParDo. Even
> bounded
> > sinks are implemented this way -- look at the internals of the Write
> > transform -- the sink class itself is just a convenience wrapper for one
> > common pattern.
> >
> > Thanks,
> > Dan
> >
> > On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:
> >
> > > I have been trying to get my head wrapped around some of the internals
> of
> > > beam so I can come up with an architecture/plan for STORM-1757
> > > <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
> > > <https://issues.apache.org/jira/browse/BEAM-9>.
> > >
> > > I see that there are Sources and Sinks.  Sources can be unbounded, but
> > > there appears to be no equivalent to an unbounded Sink.  What I do find
> > are
> > > things like WriteToBigQuery which despite some internal complexity ends
> > up
> > > being an idempotent transform producing a PDone.  Is this the intended
> > way
> > > for data to be output from a streaming DAG?
> > >
> > > I will likely have more questions as I dig more into state
> checkpointing,
> > > etc.
> > >
> > > Thanks,
> > >
> > > Bobby Evans
> > >
> >
>
>
>

Re: Beam and Storm

Posted by amir bahmanyari <am...@yahoo.com.INVALID>.

Is there a guideline on how to port an existing Storm code to Beam?Thanks

      From: Bobby Evans <bo...@apache.org>
 To: dev@beam.incubator.apache.org 
 Sent: Wednesday, May 11, 2016 1:25 PM
 Subject: Re: Beam and Storm
   
Thanks for the quick response.  I'll keep digging.

On Wed, May 11, 2016 at 3:14 PM Dan Halperin <dh...@google.com.invalid>
wrote:

> Hi Bobby,
>
> You are correct that unbounded sinks are currently implemented as
> PTransforms that interact with an external service in a ParDo. Even bounded
> sinks are implemented this way -- look at the internals of the Write
> transform -- the sink class itself is just a convenience wrapper for one
> common pattern.
>
> Thanks,
> Dan
>
> On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:
>
> > I have been trying to get my head wrapped around some of the internals of
> > beam so I can come up with an architecture/plan for STORM-1757
> > <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
> > <https://issues.apache.org/jira/browse/BEAM-9>.
> >
> > I see that there are Sources and Sinks.  Sources can be unbounded, but
> > there appears to be no equivalent to an unbounded Sink.  What I do find
> are
> > things like WriteToBigQuery which despite some internal complexity ends
> up
> > being an idempotent transform producing a PDone.  Is this the intended
> way
> > for data to be output from a streaming DAG?
> >
> > I will likely have more questions as I dig more into state checkpointing,
> > etc.
> >
> > Thanks,
> >
> > Bobby Evans
> >
>

Re: Beam and Storm

Posted by Bobby Evans <bo...@apache.org>.

Thanks for the quick response.  I'll keep digging.

On Wed, May 11, 2016 at 3:14 PM Dan Halperin <dh...@google.com.invalid>
wrote:

> Hi Bobby,
>
> You are correct that unbounded sinks are currently implemented as
> PTransforms that interact with an external service in a ParDo. Even bounded
> sinks are implemented this way -- look at the internals of the Write
> transform -- the sink class itself is just a convenience wrapper for one
> common pattern.
>
> Thanks,
> Dan
>
> On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:
>
> > I have been trying to get my head wrapped around some of the internals of
> > beam so I can come up with an architecture/plan for STORM-1757
> > <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
> > <https://issues.apache.org/jira/browse/BEAM-9>.
> >
> > I see that there are Sources and Sinks.  Sources can be unbounded, but
> > there appears to be no equivalent to an unbounded Sink.  What I do find
> are
> > things like WriteToBigQuery which despite some internal complexity ends
> up
> > being an idempotent transform producing a PDone.  Is this the intended
> way
> > for data to be output from a streaming DAG?
> >
> > I will likely have more questions as I dig more into state checkpointing,
> > etc.
> >
> > Thanks,
> >
> > Bobby Evans
> >
>

Re: Beam and Storm

Posted by Dan Halperin <dh...@google.com.INVALID>.

Hi Bobby,

You are correct that unbounded sinks are currently implemented as
PTransforms that interact with an external service in a ParDo. Even bounded
sinks are implemented this way -- look at the internals of the Write
transform -- the sink class itself is just a convenience wrapper for one
common pattern.

Thanks,
Dan

On Wed, May 11, 2016 at 1:00 PM, Bobby Evans <bo...@apache.org> wrote:

> I have been trying to get my head wrapped around some of the internals of
> beam so I can come up with an architecture/plan for STORM-1757
> <https://issues.apache.org/jira/browse/STORM-1757> / BEAM-9
> <https://issues.apache.org/jira/browse/BEAM-9>.
>
> I see that there are Sources and Sinks.  Sources can be unbounded, but
> there appears to be no equivalent to an unbounded Sink.  What I do find are
> things like WriteToBigQuery which despite some internal complexity ends up
> being an idempotent transform producing a PDone.  Is this the intended way
> for data to be output from a streaming DAG?
>
> I will likely have more questions as I dig more into state checkpointing,
> etc.
>
> Thanks,
>
> Bobby Evans
>