You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Urban, Jaroslav" <Ja...@Teradata.com> on 2018/10/17 20:35:45 UTC

Pass-through transform for accessing runner-specific functionality from Beam?

Dear Co-Beamers,

I am curious about the possibility of using Complex Event Processing (CEP) package in Flink from Apache Beam with the Flink Runner.

I have an architectural question.

I know that Apache Beam and its reference model (https://beam.apache.org/documentation/runners/capability-matrix/) does not have any kind of support for complex event processing (aka pattern matching on streams of events and their attributes). There is merely a JIRA ticket suggesting a future development in this direction (https://issues.apache.org/jira/browse/BEAM-3767).

Does anyone know if there is currently some kind of "pass-through" transform in Apache Beam that would make it possible to access the existing CEP functionality in Flink. Or -for that matter- enable accessing runner-specific features in general?

What do you think? Any ideas?

Best regards

Jaroslav Urban
Consultant (DWH, OpenSource, Cloud)
jaroslav.urban@teradata.com<ma...@teradata.com>


Re: Pass-through transform for accessing runner-specific functionality from Beam?

Posted by Thomas Weise <th...@apache.org>.
With the portable runner it is possible to create Flink native transforms
to expose features of Flink.

You can find an example for a source here:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/flink/flink_streaming_impulse.py
Another one that uses the Flink Kafka/Kinesis connectors in the Lyft fork:
https://github.com/lyft/beam/blob/release-2.8.0-lyft/sdks/python/custom-source-example.py

Our need was to have a streaming connector that the Python SDK does not
offer. It is a customization and works by adding a transform wrapper to the
SDK you are using (above Python) and then add a translation to the runner
to handle that custom URN. Currently this requires to augment the Flink
runner, although it should not be hard to make it pluggable (i.e. drop an
annotated translator class into the job server class path).

Again, this will only with the portable Flink runner and make the pipeline
non-portable, which is what you were interested in.

Thanks,
Thomas


On Wed, Oct 17, 2018 at 2:44 PM Lukasz Cwik <lc...@google.com> wrote:

> +thw@apache.org has been doing something very similar but using it
> support native Flink IO within Apache Beam within the company he works for.
> Note that the community had a discussion about runner specific extensions
> and is currently leaning[1] towards having support for them for internal
> use cases but not allowing those extensions to be part of Apache Beam
> publicly.
>
> 1:
> https://lists.apache.org/thread.html/38b796c4c49823cf946affdb1a457ddf1d142403803b9c6a32442057@%3Cdev.beam.apache.org%3E
>
> On Wed, Oct 17, 2018 at 1:36 PM Urban, Jaroslav <
> Jaroslav.Urban@teradata.com> wrote:
>
>> Dear Co-Beamers,
>>
>>
>>
>> I am curious about the possibility of using Complex Event Processing
>> (CEP) package in Flink from Apache Beam with the Flink Runner.
>>
>>
>>
>> I have an architectural question.
>>
>>
>>
>> I know that Apache Beam and its reference model (
>> https://beam.apache.org/documentation/runners/capability-matrix/) does
>> not have any kind of support for complex event processing (aka pattern
>> matching on streams of events and their attributes). There is merely a JIRA
>> ticket suggesting a future development in this direction (
>> https://issues.apache.org/jira/browse/BEAM-3767).
>>
>>
>>
>> Does anyone know if there is currently some kind of “pass-through”
>> transform in Apache Beam that would make it possible to access the existing
>> CEP functionality in Flink. Or –for that matter- enable accessing
>> runner-specific features in general?
>>
>>
>>
>> What do you think? Any ideas?
>>
>>
>>
>> Best regards
>>
>>
>>
>> *Jaroslav Urban*
>>
>> Consultant (DWH, OpenSource, Cloud)
>> jaroslav.urban@teradata.com
>>
>>

Re: Pass-through transform for accessing runner-specific functionality from Beam?

Posted by Lukasz Cwik <lc...@google.com>.
+thw@apache.org has been doing something very similar but using it support
native Flink IO within Apache Beam within the company he works for. Note
that the community had a discussion about runner specific extensions and is
currently leaning[1] towards having support for them for internal use cases
but not allowing those extensions to be part of Apache Beam publicly.

1:
https://lists.apache.org/thread.html/38b796c4c49823cf946affdb1a457ddf1d142403803b9c6a32442057@%3Cdev.beam.apache.org%3E

On Wed, Oct 17, 2018 at 1:36 PM Urban, Jaroslav <Ja...@teradata.com>
wrote:

> Dear Co-Beamers,
>
>
>
> I am curious about the possibility of using Complex Event Processing (CEP)
> package in Flink from Apache Beam with the Flink Runner.
>
>
>
> I have an architectural question.
>
>
>
> I know that Apache Beam and its reference model (
> https://beam.apache.org/documentation/runners/capability-matrix/) does
> not have any kind of support for complex event processing (aka pattern
> matching on streams of events and their attributes). There is merely a JIRA
> ticket suggesting a future development in this direction (
> https://issues.apache.org/jira/browse/BEAM-3767).
>
>
>
> Does anyone know if there is currently some kind of “pass-through”
> transform in Apache Beam that would make it possible to access the existing
> CEP functionality in Flink. Or –for that matter- enable accessing
> runner-specific features in general?
>
>
>
> What do you think? Any ideas?
>
>
>
> Best regards
>
>
>
> *Jaroslav Urban*
>
> Consultant (DWH, OpenSource, Cloud)
> jaroslav.urban@teradata.com
>
>