You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pulasthi Supun Wickramasinghe <pu...@gmail.com> on 2019/05/13 18:03:17 UTC

Developing a new beam runner for Twister2

Hi All,

I am Pulasthi a Ph.D. student at Indiana University. We are planning to
develop a beam runner for our project Twister2 [1] [2]. Twister2 is a big
data framework which supports both batch and stream processing. If you
are interested you can find more information on [2] or read some of our
publications [3]

I wanted to share our intent and get some guidance from the beam developer
community before starting on the project. I was planning on going through
the code for Apache Spark and Apache Flink runners to get a better
understanding of what I need to do. It would be great if I can get any
pointers on how I should approach this project. I am currently reading
through the runner-guide <https://beam.apache.org/contribute/runner-guide/>
.

Finally, I assume that I need to create a JIRA issue to track the progress
of this project, right?. I can create the issue but from what I read from
the contribute section I would need some permission to assign it to my
self, I hope someone would be able to help me with that. Looking forward to
working with the Beam community.

[1] https://github.com/DSC-SPIDAL/twister2
[2] https://twister2.gitbook.io/twister2/
[3] https://twister2.gitbook.io/twister2/publications

Best Regards,
Pulasthi
-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035

Re: Developing a new beam runner for Twister2

Posted by Pulasthi Supun Wickramasinghe <pu...@gmail.com>.
Hi All,

Thanks for all the feedback, As suggested I will directly start working on
a portable runner. Will update the JIRA as I make progress.

Best Regards
Pulasthi

On Wed, May 15, 2019 at 8:13 AM Maximilian Michels <mx...@apache.org> wrote:

> +1 Portability is the way forward. If you have to choose between the
> two, go for the portable one. For educational purposes, I'd still
> suggest checking out the "legacy" Runners. Actually, a new Runner could
> implement both Runner styles with most of the code shared between the two.
>
> -Max
>
> On 15.05.19 11:47, Robert Bradshaw wrote:
> > I would strongly suggest new runners adapt the portability runner from
> > the start, which will be more forward compatible and more flexible
> > (e.g. supporting other languages). The primary difference is that
> > rather than wrapping individual DoFns, one wraps a "fused" bundle of
> > DoFns (called an ExecutableStage). As it looks liek Twister2 is
> > written in Java, you can take advantage of much of the existing Java
> > libraries that already do this that are shared among the other Java
> > runners.
> >
> > On Tue, May 14, 2019 at 7:55 PM Pulasthi Supun Wickramasinghe
> > <pu...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Thanks Kenn and Max for the information. Will read up a little more and
> discuss with the Twister2 team before deciding on which route to take. I
> also created an issue in BEAM JIRA[1], but I cannot assign this to my self
> would someone be able to assign the issue to me. Thanks in advance.
> >>
> >> [1] https://issues.apache.org/jira/browse/BEAM-7304
> >>
> >> Best Regards
> >> Pulasthi
> >>
> >> On Tue, May 14, 2019 at 6:19 AM Maximilian Michels <mx...@apache.org>
> wrote:
> >>>
> >>> Hi Pulasthi,
> >>>
> >>> Great to hear you're planning to implement a Twister2 Runner.
> >>>
> >>> If you have limited time, you probably want to decide whether to build
> a
> >>> "legacy" Java Runner or a portable one. They are not fundamentally
> >>> different but there are some tricky implementation details for the
> >>> portable Runner related to the asynchronous communication with the SDK
> >>> Harness.
> >>>
> >>> If you have enough time, first implementing a "legacy" Runner might be
> a
> >>> good way to learn the Beam model and subsequently creating a portable
> >>> Runner should not be hard then.
> >>>
> >>> To get an idea of the differences, check out the Flink source code:
> >>> - FlinkStreamingTransformTranslators (Java "legacy")
> >>> - FlinkStreamingPortablePipelineTranslator (portable)
> >>>
> >>> Feel free to ask questions here or on Slack.
> >>>
> >>> Cheers,
> >>> Max
> >>>
> >>> On 14.05.19 05:11, Kenneth Knowles wrote:
> >>>> Welcome! This is very cool to hear about.
> >>>>
> >>>> A major caveat about https://beam.apache.org/contribute/runner-guide/
> is
> >>>> that it was written when Beam's portability framework was more of a
> >>>> sketch. The conceptual descriptions are mostly fine, but the pointers
> to
> >>>> Java helper code will lead you to build a "legacy" runner when it is
> >>>> better to build a portable runner from the start*.
> >>>>
> >>>> We now have four portable runners in various levels of completeness:
> >>>> Spark, Flink, Samza, and Dataflow. I have added some relevant people
> to
> >>>> the CC for emphasis. You might also join
> >>>> https://the-asf.slack.com/#beam-portability though I prefer the dev
> list
> >>>> since it gives visibility to a much greater portion of the community.
> >>>>
> >>>> Kenn
> >>>>
> >>>> *volunteers welcome to update the guide to emphasize portability first
> >>>>
> >>>> *From: *Pulasthi Supun Wickramasinghe <pulasthi911@gmail.com
> >>>> <ma...@gmail.com>>
> >>>> *Date: *Mon, May 13, 2019 at 11:03 AM
> >>>> *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
> >>>>
> >>>>      Hi All,
> >>>>
> >>>>      I am Pulasthi a Ph.D. student at Indiana University. We are
> planning
> >>>>      to develop a beam runner for our project Twister2 [1] [2].
> Twister2
> >>>>      is a big data framework which supports both batch and stream
> >>>>      processing. If you are interested you can find more information
> on
> >>>>      [2] or read some of our publications [3]
> >>>>
> >>>>      I wanted to share our intent and get some guidance from the beam
> >>>>      developer community before starting on the project. I was
> planning
> >>>>      on going through the code for Apache Spark and Apache Flink
> runners
> >>>>      to get a better understanding of what I need to do. It would be
> >>>>      great if I can get any pointers on how I should approach this
> >>>>      project. I am currently reading through the runner-guide
> >>>>      <https://beam.apache.org/contribute/runner-guide/>.
> >>>>
> >>>>      Finally, I assume that I need to create a JIRA issue to track the
> >>>>      progress of this project, right?. I can create the issue but from
> >>>>      what I read from the contribute section I would need some
> permission
> >>>>      to assign it to my self, I hope someone would be able to help me
> >>>>      with that. Looking forward to working with the Beam community.
> >>>>
> >>>>      [1] https://github.com/DSC-SPIDAL/twister2
> >>>>      [2] https://twister2.gitbook.io/twister2/
> >>>>      [3] https://twister2.gitbook.io/twister2/publications
> >>>>
> >>>>      Best Regards,
> >>>>      Pulasthi
> >>>>      --
> >>>>      Pulasthi S. Wickramasinghe
> >>>>      PhD Candidate  | Research Assistant
> >>>>      School of Informatics and Computing | Digital Science Center
> >>>>      Indiana University, Bloomington
> >>>>      cell: 224-386-9035
> >>>>
> >>
> >>
> >>
> >> --
> >> Pulasthi S. Wickramasinghe
> >> PhD Candidate  | Research Assistant
> >> School of Informatics and Computing | Digital Science Center
> >> Indiana University, Bloomington
> >> cell: 224-386-9035
>


-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035

Re: Developing a new beam runner for Twister2

Posted by Maximilian Michels <mx...@apache.org>.
+1 Portability is the way forward. If you have to choose between the 
two, go for the portable one. For educational purposes, I'd still 
suggest checking out the "legacy" Runners. Actually, a new Runner could 
implement both Runner styles with most of the code shared between the two.

-Max

On 15.05.19 11:47, Robert Bradshaw wrote:
> I would strongly suggest new runners adapt the portability runner from
> the start, which will be more forward compatible and more flexible
> (e.g. supporting other languages). The primary difference is that
> rather than wrapping individual DoFns, one wraps a "fused" bundle of
> DoFns (called an ExecutableStage). As it looks liek Twister2 is
> written in Java, you can take advantage of much of the existing Java
> libraries that already do this that are shared among the other Java
> runners.
> 
> On Tue, May 14, 2019 at 7:55 PM Pulasthi Supun Wickramasinghe
> <pu...@gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks Kenn and Max for the information. Will read up a little more and discuss with the Twister2 team before deciding on which route to take. I also created an issue in BEAM JIRA[1], but I cannot assign this to my self would someone be able to assign the issue to me. Thanks in advance.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-7304
>>
>> Best Regards
>> Pulasthi
>>
>> On Tue, May 14, 2019 at 6:19 AM Maximilian Michels <mx...@apache.org> wrote:
>>>
>>> Hi Pulasthi,
>>>
>>> Great to hear you're planning to implement a Twister2 Runner.
>>>
>>> If you have limited time, you probably want to decide whether to build a
>>> "legacy" Java Runner or a portable one. They are not fundamentally
>>> different but there are some tricky implementation details for the
>>> portable Runner related to the asynchronous communication with the SDK
>>> Harness.
>>>
>>> If you have enough time, first implementing a "legacy" Runner might be a
>>> good way to learn the Beam model and subsequently creating a portable
>>> Runner should not be hard then.
>>>
>>> To get an idea of the differences, check out the Flink source code:
>>> - FlinkStreamingTransformTranslators (Java "legacy")
>>> - FlinkStreamingPortablePipelineTranslator (portable)
>>>
>>> Feel free to ask questions here or on Slack.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 14.05.19 05:11, Kenneth Knowles wrote:
>>>> Welcome! This is very cool to hear about.
>>>>
>>>> A major caveat about https://beam.apache.org/contribute/runner-guide/ is
>>>> that it was written when Beam's portability framework was more of a
>>>> sketch. The conceptual descriptions are mostly fine, but the pointers to
>>>> Java helper code will lead you to build a "legacy" runner when it is
>>>> better to build a portable runner from the start*.
>>>>
>>>> We now have four portable runners in various levels of completeness:
>>>> Spark, Flink, Samza, and Dataflow. I have added some relevant people to
>>>> the CC for emphasis. You might also join
>>>> https://the-asf.slack.com/#beam-portability though I prefer the dev list
>>>> since it gives visibility to a much greater portion of the community.
>>>>
>>>> Kenn
>>>>
>>>> *volunteers welcome to update the guide to emphasize portability first
>>>>
>>>> *From: *Pulasthi Supun Wickramasinghe <pulasthi911@gmail.com
>>>> <ma...@gmail.com>>
>>>> *Date: *Mon, May 13, 2019 at 11:03 AM
>>>> *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
>>>>
>>>>      Hi All,
>>>>
>>>>      I am Pulasthi a Ph.D. student at Indiana University. We are planning
>>>>      to develop a beam runner for our project Twister2 [1] [2]. Twister2
>>>>      is a big data framework which supports both batch and stream
>>>>      processing. If you are interested you can find more information on
>>>>      [2] or read some of our publications [3]
>>>>
>>>>      I wanted to share our intent and get some guidance from the beam
>>>>      developer community before starting on the project. I was planning
>>>>      on going through the code for Apache Spark and Apache Flink runners
>>>>      to get a better understanding of what I need to do. It would be
>>>>      great if I can get any pointers on how I should approach this
>>>>      project. I am currently reading through the runner-guide
>>>>      <https://beam.apache.org/contribute/runner-guide/>.
>>>>
>>>>      Finally, I assume that I need to create a JIRA issue to track the
>>>>      progress of this project, right?. I can create the issue but from
>>>>      what I read from the contribute section I would need some permission
>>>>      to assign it to my self, I hope someone would be able to help me
>>>>      with that. Looking forward to working with the Beam community.
>>>>
>>>>      [1] https://github.com/DSC-SPIDAL/twister2
>>>>      [2] https://twister2.gitbook.io/twister2/
>>>>      [3] https://twister2.gitbook.io/twister2/publications
>>>>
>>>>      Best Regards,
>>>>      Pulasthi
>>>>      --
>>>>      Pulasthi S. Wickramasinghe
>>>>      PhD Candidate  | Research Assistant
>>>>      School of Informatics and Computing | Digital Science Center
>>>>      Indiana University, Bloomington
>>>>      cell: 224-386-9035
>>>>
>>
>>
>>
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035

Re: Developing a new beam runner for Twister2

Posted by Robert Bradshaw <ro...@google.com>.
I would strongly suggest new runners adapt the portability runner from
the start, which will be more forward compatible and more flexible
(e.g. supporting other languages). The primary difference is that
rather than wrapping individual DoFns, one wraps a "fused" bundle of
DoFns (called an ExecutableStage). As it looks liek Twister2 is
written in Java, you can take advantage of much of the existing Java
libraries that already do this that are shared among the other Java
runners.

On Tue, May 14, 2019 at 7:55 PM Pulasthi Supun Wickramasinghe
<pu...@gmail.com> wrote:
>
> Hi,
>
> Thanks Kenn and Max for the information. Will read up a little more and discuss with the Twister2 team before deciding on which route to take. I also created an issue in BEAM JIRA[1], but I cannot assign this to my self would someone be able to assign the issue to me. Thanks in advance.
>
> [1] https://issues.apache.org/jira/browse/BEAM-7304
>
> Best Regards
> Pulasthi
>
> On Tue, May 14, 2019 at 6:19 AM Maximilian Michels <mx...@apache.org> wrote:
>>
>> Hi Pulasthi,
>>
>> Great to hear you're planning to implement a Twister2 Runner.
>>
>> If you have limited time, you probably want to decide whether to build a
>> "legacy" Java Runner or a portable one. They are not fundamentally
>> different but there are some tricky implementation details for the
>> portable Runner related to the asynchronous communication with the SDK
>> Harness.
>>
>> If you have enough time, first implementing a "legacy" Runner might be a
>> good way to learn the Beam model and subsequently creating a portable
>> Runner should not be hard then.
>>
>> To get an idea of the differences, check out the Flink source code:
>> - FlinkStreamingTransformTranslators (Java "legacy")
>> - FlinkStreamingPortablePipelineTranslator (portable)
>>
>> Feel free to ask questions here or on Slack.
>>
>> Cheers,
>> Max
>>
>> On 14.05.19 05:11, Kenneth Knowles wrote:
>> > Welcome! This is very cool to hear about.
>> >
>> > A major caveat about https://beam.apache.org/contribute/runner-guide/ is
>> > that it was written when Beam's portability framework was more of a
>> > sketch. The conceptual descriptions are mostly fine, but the pointers to
>> > Java helper code will lead you to build a "legacy" runner when it is
>> > better to build a portable runner from the start*.
>> >
>> > We now have four portable runners in various levels of completeness:
>> > Spark, Flink, Samza, and Dataflow. I have added some relevant people to
>> > the CC for emphasis. You might also join
>> > https://the-asf.slack.com/#beam-portability though I prefer the dev list
>> > since it gives visibility to a much greater portion of the community.
>> >
>> > Kenn
>> >
>> > *volunteers welcome to update the guide to emphasize portability first
>> >
>> > *From: *Pulasthi Supun Wickramasinghe <pulasthi911@gmail.com
>> > <ma...@gmail.com>>
>> > *Date: *Mon, May 13, 2019 at 11:03 AM
>> > *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
>> >
>> >     Hi All,
>> >
>> >     I am Pulasthi a Ph.D. student at Indiana University. We are planning
>> >     to develop a beam runner for our project Twister2 [1] [2]. Twister2
>> >     is a big data framework which supports both batch and stream
>> >     processing. If you are interested you can find more information on
>> >     [2] or read some of our publications [3]
>> >
>> >     I wanted to share our intent and get some guidance from the beam
>> >     developer community before starting on the project. I was planning
>> >     on going through the code for Apache Spark and Apache Flink runners
>> >     to get a better understanding of what I need to do. It would be
>> >     great if I can get any pointers on how I should approach this
>> >     project. I am currently reading through the runner-guide
>> >     <https://beam.apache.org/contribute/runner-guide/>.
>> >
>> >     Finally, I assume that I need to create a JIRA issue to track the
>> >     progress of this project, right?. I can create the issue but from
>> >     what I read from the contribute section I would need some permission
>> >     to assign it to my self, I hope someone would be able to help me
>> >     with that. Looking forward to working with the Beam community.
>> >
>> >     [1] https://github.com/DSC-SPIDAL/twister2
>> >     [2] https://twister2.gitbook.io/twister2/
>> >     [3] https://twister2.gitbook.io/twister2/publications
>> >
>> >     Best Regards,
>> >     Pulasthi
>> >     --
>> >     Pulasthi S. Wickramasinghe
>> >     PhD Candidate  | Research Assistant
>> >     School of Informatics and Computing | Digital Science Center
>> >     Indiana University, Bloomington
>> >     cell: 224-386-9035
>> >
>
>
>
> --
> Pulasthi S. Wickramasinghe
> PhD Candidate  | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> cell: 224-386-9035

Re: Developing a new beam runner for Twister2

Posted by Kenneth Knowles <ke...@apache.org>.
I added you to the Jira "Contributors" role, so you should be able to
self-assign the ticket now.

*From: *Pulasthi Supun Wickramasinghe <pu...@gmail.com>
*Date: *Tue, May 14, 2019 at 10:55 AM
*To: *Maximilian Michels, <ke...@apache.org>
*Cc: *dev

Hi,
>
> Thanks Kenn and Max for the information. Will read up a little more and
> discuss with the Twister2 team before deciding on which route to take. I
> also created an issue in BEAM JIRA[1], but I cannot assign this to my self
> would someone be able to assign the issue to me. Thanks in advance.
>
> [1] https://issues.apache.org/jira/browse/BEAM-7304
>
> Best Regards
> Pulasthi
>
> On Tue, May 14, 2019 at 6:19 AM Maximilian Michels <mx...@apache.org> wrote:
>
>> Hi Pulasthi,
>>
>> Great to hear you're planning to implement a Twister2 Runner.
>>
>> If you have limited time, you probably want to decide whether to build a
>> "legacy" Java Runner or a portable one. They are not fundamentally
>> different but there are some tricky implementation details for the
>> portable Runner related to the asynchronous communication with the SDK
>> Harness.
>>
>> If you have enough time, first implementing a "legacy" Runner might be a
>> good way to learn the Beam model and subsequently creating a portable
>> Runner should not be hard then.
>>
>> To get an idea of the differences, check out the Flink source code:
>> - FlinkStreamingTransformTranslators (Java "legacy")
>> - FlinkStreamingPortablePipelineTranslator (portable)
>>
>> Feel free to ask questions here or on Slack.
>>
>> Cheers,
>> Max
>>
>> On 14.05.19 05:11, Kenneth Knowles wrote:
>> > Welcome! This is very cool to hear about.
>> >
>> > A major caveat about https://beam.apache.org/contribute/runner-guide/ is
>>
>> > that it was written when Beam's portability framework was more of a
>> > sketch. The conceptual descriptions are mostly fine, but the pointers
>> to
>> > Java helper code will lead you to build a "legacy" runner when it is
>> > better to build a portable runner from the start*.
>> >
>> > We now have four portable runners in various levels of completeness:
>> > Spark, Flink, Samza, and Dataflow. I have added some relevant people to
>> > the CC for emphasis. You might also join
>> > https://the-asf.slack.com/#beam-portability though I prefer the dev
>> list
>> > since it gives visibility to a much greater portion of the community.
>> >
>> > Kenn
>> >
>> > *volunteers welcome to update the guide to emphasize portability first
>> >
>> > *From: *Pulasthi Supun Wickramasinghe <pulasthi911@gmail.com
>> > <ma...@gmail.com>>
>> > *Date: *Mon, May 13, 2019 at 11:03 AM
>> > *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
>> >
>> >     Hi All,
>> >
>> >     I am Pulasthi a Ph.D. student at Indiana University. We are planning
>> >     to develop a beam runner for our project Twister2 [1] [2]. Twister2
>> >     is a big data framework which supports both batch and stream
>> >     processing. If you are interested you can find more information on
>> >     [2] or read some of our publications [3]
>> >
>> >     I wanted to share our intent and get some guidance from the beam
>> >     developer community before starting on the project. I was planning
>> >     on going through the code for Apache Spark and Apache Flink runners
>> >     to get a better understanding of what I need to do. It would be
>> >     great if I can get any pointers on how I should approach this
>> >     project. I am currently reading through the runner-guide
>> >     <https://beam.apache.org/contribute/runner-guide/>.
>> >
>> >     Finally, I assume that I need to create a JIRA issue to track the
>> >     progress of this project, right?. I can create the issue but from
>> >     what I read from the contribute section I would need some permission
>> >     to assign it to my self, I hope someone would be able to help me
>> >     with that. Looking forward to working with the Beam community.
>> >
>> >     [1] https://github.com/DSC-SPIDAL/twister2
>> >     [2] https://twister2.gitbook.io/twister2/
>> >     [3] https://twister2.gitbook.io/twister2/publications
>> >
>> >     Best Regards,
>> >     Pulasthi
>> >     --
>> >     Pulasthi S. Wickramasinghe
>> >     PhD Candidate  | Research Assistant
>> >     School of Informatics and Computing | Digital Science Center
>> >     Indiana University, Bloomington
>> >     cell: 224-386-9035
>> >
>>
>
>
> --
> Pulasthi S. Wickramasinghe
> PhD Candidate  | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> cell: 224-386-9035
>

Re: Developing a new beam runner for Twister2

Posted by Pulasthi Supun Wickramasinghe <pu...@gmail.com>.
Hi,

Thanks Kenn and Max for the information. Will read up a little more and
discuss with the Twister2 team before deciding on which route to take. I
also created an issue in BEAM JIRA[1], but I cannot assign this to my self
would someone be able to assign the issue to me. Thanks in advance.

[1] https://issues.apache.org/jira/browse/BEAM-7304

Best Regards
Pulasthi

On Tue, May 14, 2019 at 6:19 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi Pulasthi,
>
> Great to hear you're planning to implement a Twister2 Runner.
>
> If you have limited time, you probably want to decide whether to build a
> "legacy" Java Runner or a portable one. They are not fundamentally
> different but there are some tricky implementation details for the
> portable Runner related to the asynchronous communication with the SDK
> Harness.
>
> If you have enough time, first implementing a "legacy" Runner might be a
> good way to learn the Beam model and subsequently creating a portable
> Runner should not be hard then.
>
> To get an idea of the differences, check out the Flink source code:
> - FlinkStreamingTransformTranslators (Java "legacy")
> - FlinkStreamingPortablePipelineTranslator (portable)
>
> Feel free to ask questions here or on Slack.
>
> Cheers,
> Max
>
> On 14.05.19 05:11, Kenneth Knowles wrote:
> > Welcome! This is very cool to hear about.
> >
> > A major caveat about https://beam.apache.org/contribute/runner-guide/ is
>
> > that it was written when Beam's portability framework was more of a
> > sketch. The conceptual descriptions are mostly fine, but the pointers to
> > Java helper code will lead you to build a "legacy" runner when it is
> > better to build a portable runner from the start*.
> >
> > We now have four portable runners in various levels of completeness:
> > Spark, Flink, Samza, and Dataflow. I have added some relevant people to
> > the CC for emphasis. You might also join
> > https://the-asf.slack.com/#beam-portability though I prefer the dev
> list
> > since it gives visibility to a much greater portion of the community.
> >
> > Kenn
> >
> > *volunteers welcome to update the guide to emphasize portability first
> >
> > *From: *Pulasthi Supun Wickramasinghe <pulasthi911@gmail.com
> > <ma...@gmail.com>>
> > *Date: *Mon, May 13, 2019 at 11:03 AM
> > *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
> >
> >     Hi All,
> >
> >     I am Pulasthi a Ph.D. student at Indiana University. We are planning
> >     to develop a beam runner for our project Twister2 [1] [2]. Twister2
> >     is a big data framework which supports both batch and stream
> >     processing. If you are interested you can find more information on
> >     [2] or read some of our publications [3]
> >
> >     I wanted to share our intent and get some guidance from the beam
> >     developer community before starting on the project. I was planning
> >     on going through the code for Apache Spark and Apache Flink runners
> >     to get a better understanding of what I need to do. It would be
> >     great if I can get any pointers on how I should approach this
> >     project. I am currently reading through the runner-guide
> >     <https://beam.apache.org/contribute/runner-guide/>.
> >
> >     Finally, I assume that I need to create a JIRA issue to track the
> >     progress of this project, right?. I can create the issue but from
> >     what I read from the contribute section I would need some permission
> >     to assign it to my self, I hope someone would be able to help me
> >     with that. Looking forward to working with the Beam community.
> >
> >     [1] https://github.com/DSC-SPIDAL/twister2
> >     [2] https://twister2.gitbook.io/twister2/
> >     [3] https://twister2.gitbook.io/twister2/publications
> >
> >     Best Regards,
> >     Pulasthi
> >     --
> >     Pulasthi S. Wickramasinghe
> >     PhD Candidate  | Research Assistant
> >     School of Informatics and Computing | Digital Science Center
> >     Indiana University, Bloomington
> >     cell: 224-386-9035
> >
>


-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035

Re: Developing a new beam runner for Twister2

Posted by Maximilian Michels <mx...@apache.org>.
Hi Pulasthi,

Great to hear you're planning to implement a Twister2 Runner.

If you have limited time, you probably want to decide whether to build a 
"legacy" Java Runner or a portable one. They are not fundamentally 
different but there are some tricky implementation details for the 
portable Runner related to the asynchronous communication with the SDK 
Harness.

If you have enough time, first implementing a "legacy" Runner might be a 
good way to learn the Beam model and subsequently creating a portable 
Runner should not be hard then.

To get an idea of the differences, check out the Flink source code:
- FlinkStreamingTransformTranslators (Java "legacy")
- FlinkStreamingPortablePipelineTranslator (portable)

Feel free to ask questions here or on Slack.

Cheers,
Max

On 14.05.19 05:11, Kenneth Knowles wrote:
> Welcome! This is very cool to hear about.
> 
> A major caveat about https://beam.apache.org/contribute/runner-guide/ is 
> that it was written when Beam's portability framework was more of a 
> sketch. The conceptual descriptions are mostly fine, but the pointers to 
> Java helper code will lead you to build a "legacy" runner when it is 
> better to build a portable runner from the start*.
> 
> We now have four portable runners in various levels of completeness: 
> Spark, Flink, Samza, and Dataflow. I have added some relevant people to 
> the CC for emphasis. You might also join 
> https://the-asf.slack.com/#beam-portability though I prefer the dev list 
> since it gives visibility to a much greater portion of the community.
> 
> Kenn
> 
> *volunteers welcome to update the guide to emphasize portability first
> 
> *From: *Pulasthi Supun Wickramasinghe <pulasthi911@gmail.com 
> <ma...@gmail.com>>
> *Date: *Mon, May 13, 2019 at 11:03 AM
> *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
> 
>     Hi All,
> 
>     I am Pulasthi a Ph.D. student at Indiana University. We are planning
>     to develop a beam runner for our project Twister2 [1] [2]. Twister2
>     is a big data framework which supports both batch and stream
>     processing. If you are interested you can find more information on
>     [2] or read some of our publications [3]
> 
>     I wanted to share our intent and get some guidance from the beam
>     developer community before starting on the project. I was planning
>     on going through the code for Apache Spark and Apache Flink runners
>     to get a better understanding of what I need to do. It would be
>     great if I can get any pointers on how I should approach this
>     project. I am currently reading through the runner-guide
>     <https://beam.apache.org/contribute/runner-guide/>.
> 
>     Finally, I assume that I need to create a JIRA issue to track the
>     progress of this project, right?. I can create the issue but from
>     what I read from the contribute section I would need some permission
>     to assign it to my self, I hope someone would be able to help me
>     with that. Looking forward to working with the Beam community.
> 
>     [1] https://github.com/DSC-SPIDAL/twister2
>     [2] https://twister2.gitbook.io/twister2/
>     [3] https://twister2.gitbook.io/twister2/publications
> 
>     Best Regards,
>     Pulasthi
>     -- 
>     Pulasthi S. Wickramasinghe
>     PhD Candidate  | Research Assistant
>     School of Informatics and Computing | Digital Science Center
>     Indiana University, Bloomington
>     cell: 224-386-9035
> 

Re: Developing a new beam runner for Twister2

Posted by Kenneth Knowles <ke...@apache.org>.
Welcome! This is very cool to hear about.

A major caveat about https://beam.apache.org/contribute/runner-guide/ is
that it was written when Beam's portability framework was more of a sketch.
The conceptual descriptions are mostly fine, but the pointers to Java
helper code will lead you to build a "legacy" runner when it is better to
build a portable runner from the start*.

We now have four portable runners in various levels of completeness: Spark,
Flink, Samza, and Dataflow. I have added some relevant people to the CC for
emphasis. You might also join https://the-asf.slack.com/#beam-portability
though I prefer the dev list since it gives visibility to a much greater
portion of the community.

Kenn

*volunteers welcome to update the guide to emphasize portability first

*From: *Pulasthi Supun Wickramasinghe <pu...@gmail.com>
*Date: *Mon, May 13, 2019 at 11:03 AM
*To: * <de...@beam.apache.org>

Hi All,
>
> I am Pulasthi a Ph.D. student at Indiana University. We are planning to
> develop a beam runner for our project Twister2 [1] [2]. Twister2 is a big
> data framework which supports both batch and stream processing. If you
> are interested you can find more information on [2] or read some of our
> publications [3]
>
> I wanted to share our intent and get some guidance from the beam developer
> community before starting on the project. I was planning on going through
> the code for Apache Spark and Apache Flink runners to get a better
> understanding of what I need to do. It would be great if I can get any
> pointers on how I should approach this project. I am currently reading
> through the runner-guide
> <https://beam.apache.org/contribute/runner-guide/>.
>
> Finally, I assume that I need to create a JIRA issue to track the progress
> of this project, right?. I can create the issue but from what I read from
> the contribute section I would need some permission to assign it to my
> self, I hope someone would be able to help me with that. Looking forward to
> working with the Beam community.
>
> [1] https://github.com/DSC-SPIDAL/twister2
> [2] https://twister2.gitbook.io/twister2/
> [3] https://twister2.gitbook.io/twister2/publications
>
> Best Regards,
> Pulasthi
> --
> Pulasthi S. Wickramasinghe
> PhD Candidate  | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> cell: 224-386-9035
>