You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Kenneth Knowles <ke...@apache.org> on 2023/05/01 17:11:08 UTC

Re: Regarding Project proposal review and feedback

>
> Unless there is a good reason, we should not introduce another Flink
> runner into the Beam codebase. The current one, which is very advanced
> feature-wise (it took years to get there), already has a need for more
> maintainers. If you see weak spots in the current implementation, would you
> consider improving the existing runner instead of trying to write a new one
> from scratch?
>

Agree that we should not introduce a new runner. It would be great to
propose and implement improvements to the current runner. The current
runner does have a pretty complete implementation of the Beam Model
including windowing and triggers.


> (this might be a bit outdated, please correct me if I'm wrong) AFAIK Beam
> SQL is just a DSL over Beam's low-level APIs. This makes introducing new
> runners fairly straightforward because you only need to support a few
> primitive transformations to have a fully working runner (even though it
> might not be optimal performance-wise); everything else is just built on
> top of those. This prevents us from directly translating Beam SQL into
> Flink SQL / Flink Table API (you always need to go through Beam's low-level
> API).
>

One correction: your description of Beam SQL is correct, but the last part
is not. The Beam Model is set up so that any composite transform or any
piece of a graph can be executed in a smart way. All of our runners do this
somewhat, for example lifting combiners before shuffles and
fusion/chaining. You can always throw away a subgraph and do it "your own
way" as long as you match the semantics. You could definitely take a Beam
SqlTransform and implement it directly using Flink SQL or Flink's table
API, as long as you made the behavior match exactly.

Kenn


>
> Best,
> D.
>
> On Fri, Apr 28, 2023 at 11:29 AM Siddharth Aryan <
> siddhartharyan689@gmail.com> wrote:
>
>> Hello Jeff,
>> Thank you for the idea, as it will allow beam users to write sql
>> queries using the Beam SQL API and execute them on the Flink Table API.I
>> will look into it later as my current focus is to implement an integration
>> between Apache Beam and the Flink DataStream API. While the existing Flink
>> runner is based on DataStream and Operator API, my project aims to create a
>> new runner that specifically utilizes the Flink DataStream API.
>> And thanks for the feedback.
>>
>> Best Regards,
>> Siddharth Aryan
>>
>> On Thu, Apr 27, 2023 at 1:39 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> Same question as David,  one idea in my mind is to integrate the beam
>>> sql api with flink table api, this does not exist in the current flink
>>> runner.
>>>
>>> On Thu, Apr 27, 2023 at 3:46 PM David Morávek <dm...@apache.org> wrote:
>>>
>>>> Hi Siddharth,
>>>>
>>>> Thanks for your interest in the Flink Runner for Beam. Reading through
>>>> the project, one thing that immediately strikes me is that there already is
>>>> a Flink runner based on DataStream and Operator (one level below
>>>> DataStream) API in the code base. Are you aware of this? If yes, how does
>>>> the runner you want to introduce differ from the existing one?
>>>>
>>>> Best,
>>>> D.
>>>>
>>>> On Sun, Apr 2, 2023 at 9:41 PM Svetak Sundhar via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> Hi Siddharth,
>>>>> I left some comments as well on the sentiment analysis proposal.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Svetak Sundhar
>>>>>
>>>>>   Technical Solutions Engineer, Data
>>>>> s <ne...@google.com>vetaksundhar@google.com
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Apr 2, 2023 at 1:58 PM Anand Inguva via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> I left some comments on the sentiment analysis proposal.
>>>>>>
>>>>>> Thanks,
>>>>>> Anand
>>>>>>
>>>>>> On Thu, Mar 30, 2023 at 9:59 AM Danny McCormick via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>>
>>>>>>> Thanks Siddharth! I left some comments on the sentiment analysis
>>>>>>> proposal, I am probably not the best person to comment on the flink
>>>>>>> datastream api one though.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Danny
>>>>>>>
>>>>>>> On Fri, Mar 24, 2023 at 11:53 PM Siddharth Aryan <
>>>>>>> siddhartharyan689@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello ,
>>>>>>>> I am Siddharth Aryan a undergrad and I am looking forward to
>>>>>>>> someone who can help me reviewing my proposal and give me a feedback on the
>>>>>>>> them which help me to create a good proposal.
>>>>>>>> Here ,I am attaching my both the project proposals:
>>>>>>>> >Sentimental Analysis Pipeline with the help of Machine Learnig:
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1U6zcXAWsDCrWlbf14f5VlLqPZFucwXR48tD7mrERW-g/edit?usp=sharing
>>>>>>>>
>>>>>>>> >Integrating Apache Beam with Flink Datastream API:
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1sQEe9eVuoHX9QWS9Zj5wVl7MLmfk7QO09pjZOsk-TFY/edit?usp=sharing
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>> Siddharth Aryan
>>>>>>>>
>>>>>>>> Github :https://github.com/nervoussidd
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>