You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by James <xu...@gmail.com> on 2017/02/22 01:50:07 UTC

Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Is it possible that a feature which the underlying engine (e.g. Spark)
supports, but cann't be expressed using Beam API?
If there is really such a case, how to handle it? (we are planning to use
Beam as the data processing API, but have this concern here.)

Thanks in advance.

Re: 答复: Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Posted by Reza Rokni <re...@google.com>.

Super nice explanation.. we should have this on a blog..

On Thu, 23 Feb 2017, 06:14 Frances Perry, <fr...@apache.org> wrote:

> That's a great question!
>
> Beam is about building an excellent programming model -- one that's
> unified for batch and streaming use cases, enables efficient execution, and
> is portable across multiple runtimes.
>
> So Beam is neither the intersection of the functionality of all the
> engines (too limited!) nor the union (too much of a kitchen sink!).
> Instead, Beam tries to be at the forefront of where data processing is
> going, both pushing functionality into and pulling patterns out of the
> runtime engines.
>
> State [1] is a great example of functionality that existed in various
> engines and enabled interesting and common use cases, but wasn't originally
> expressible in Beam. We recently expanded the Beam model to include a
> version of this functionality according to Beam's design principles [2].
>
> And vice versa, we hope that Beam will influence the roadmaps of various
> engines as well. For example, the semantics of Flink's DataStreams were
> influenced [3] by the Beam (née Dataflow) model.
>
> This also means that the capabilities will not always be exactly the same
> across different Beam runners. So that's why we're using capability matrix
> [4] to try to clearly communicate the state of things.
>
> Hope that helps,
> Frances
>
> [1] https://beam.apache.org/blog/2017/02/13/stateful-processing.html
> [2] https://beam.apache.org/contribute/design-principles/
> [3]
> http://www.zdnet.com/article/going-with-the-stream-unbounded-data-processing-with-apache-flink/
> [4] https://beam.apache.org/documentation/runners/capability-matrix/
>
>
>
>
>
> On Tue, Feb 21, 2017 at 7:22 PM, Tang Jijun(上海_技术部_数据平台_唐觊隽) <
> tangjijun@yhd.com> wrote:
>
> I found a case.  After submit a spark app, we can stop or getState by
> JavaStreamingContext. But use beam api,we can’t stop or getState for
> pipeline. I think should add stop and getState method in PipelineRunner.
>
>
>
> *发件人:* James [mailto:xumingmingv@gmail.com]
> *发送时间:* 2017年2月22日 9:50
> *收件人:* user@beam.apache.org
> *主题:* Is it possible that a feature which the underlying engine (e.g.
> Spark) supports, but cann't be expressed using Beam API?
>
>
>
> Is it possible that a feature which the underlying engine (e.g. Spark)
> supports, but cann't be expressed using Beam API?
>
> If there is really such a case, how to handle it? (we are planning to use
> Beam as the data processing API, but have this concern here.)
>
>
>
> Thanks in advance.
>
>
>

Re: 答复: Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Posted by Amit Sela <am...@gmail.com>.

If I understand, the use case concerning stop() and getState() in
PipelineRunner is how to interact with a running pipeline - query it's
state and (potentially) stop it - right ?
If so, this API is available via PipelineResult
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java#L37>
which
is returned when executing the pipeline via PipelineRunner#run(Pipeline)
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/PipelineRunner.java#L66>

On Thu, Feb 23, 2017 at 10:13 AM James <xu...@gmail.com> wrote:

> Thanks Fances, well explained!
>
> On Thu, Feb 23, 2017 at 2:14 PM, Frances Perry <fr...@apache.org> wrote:
>
> That's a great question!
>
> Beam is about building an excellent programming model -- one that's
> unified for batch and streaming use cases, enables efficient execution, and
> is portable across multiple runtimes.
>
> So Beam is neither the intersection of the functionality of all the
> engines (too limited!) nor the union (too much of a kitchen sink!).
> Instead, Beam tries to be at the forefront of where data processing is
> going, both pushing functionality into and pulling patterns out of the
> runtime engines.
>
> State [1] is a great example of functionality that existed in various
> engines and enabled interesting and common use cases, but wasn't originally
> expressible in Beam. We recently expanded the Beam model to include a
> version of this functionality according to Beam's design principles [2].
>
> And vice versa, we hope that Beam will influence the roadmaps of various
> engines as well. For example, the semantics of Flink's DataStreams were
> influenced [3] by the Beam (née Dataflow) model.
>
> This also means that the capabilities will not always be exactly the same
> across different Beam runners. So that's why we're using capability matrix
> [4] to try to clearly communicate the state of things.
>
> Hope that helps,
> Frances
>
> [1] https://beam.apache.org/blog/2017/02/13/stateful-processing.html
> [2] https://beam.apache.org/contribute/design-principles/
> [3]
> http://www.zdnet.com/article/going-with-the-stream-unbounded-data-processing-with-apache-flink/
> [4] https://beam.apache.org/documentation/runners/capability-matrix/
>
>
>
>
>
> On Tue, Feb 21, 2017 at 7:22 PM, Tang Jijun(上海_技术部_数据平台_唐觊隽) <
> tangjijun@yhd.com> wrote:
>
> I found a case.  After submit a spark app, we can stop or getState by
> JavaStreamingContext. But use beam api,we can’t stop or getState for
> pipeline. I think should add stop and getState method in PipelineRunner.
>
>
>
> *发件人:* James [mailto:xumingmingv@gmail.com]
> *发送时间:* 2017年2月22日 9:50
> *收件人:* user@beam.apache.org
> *主题:* Is it possible that a feature which the underlying engine (e.g.
> Spark) supports, but cann't be expressed using Beam API?
>
>
>
> Is it possible that a feature which the underlying engine (e.g. Spark)
> supports, but cann't be expressed using Beam API?
>
> If there is really such a case, how to handle it? (we are planning to use
> Beam as the data processing API, but have this concern here.)
>
>
>
> Thanks in advance.
>
>
>
>

Re: 答复: Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Posted by James <xu...@gmail.com>.

Thanks Fances, well explained!

On Thu, Feb 23, 2017 at 2:14 PM, Frances Perry <fr...@apache.org> wrote:

> That's a great question!
>
> Beam is about building an excellent programming model -- one that's
> unified for batch and streaming use cases, enables efficient execution, and
> is portable across multiple runtimes.
>
> So Beam is neither the intersection of the functionality of all the
> engines (too limited!) nor the union (too much of a kitchen sink!).
> Instead, Beam tries to be at the forefront of where data processing is
> going, both pushing functionality into and pulling patterns out of the
> runtime engines.
>
> State [1] is a great example of functionality that existed in various
> engines and enabled interesting and common use cases, but wasn't originally
> expressible in Beam. We recently expanded the Beam model to include a
> version of this functionality according to Beam's design principles [2].
>
> And vice versa, we hope that Beam will influence the roadmaps of various
> engines as well. For example, the semantics of Flink's DataStreams were
> influenced [3] by the Beam (née Dataflow) model.
>
> This also means that the capabilities will not always be exactly the same
> across different Beam runners. So that's why we're using capability matrix
> [4] to try to clearly communicate the state of things.
>
> Hope that helps,
> Frances
>
> [1] https://beam.apache.org/blog/2017/02/13/stateful-processing.html
> [2] https://beam.apache.org/contribute/design-principles/
> [3] http://www.zdnet.com/article/going-with-the-stream-unbou
> nded-data-processing-with-apache-flink/
> [4] https://beam.apache.org/documentation/runners/capability-matrix/
>
>
>
>
>
> On Tue, Feb 21, 2017 at 7:22 PM, Tang Jijun(上海_技术部_数据平台_唐觊隽) <
> tangjijun@yhd.com> wrote:
>
>> I found a case.  After submit a spark app, we can stop or getState by
>> JavaStreamingContext. But use beam api,we can’t stop or getState for
>> pipeline. I think should add stop and getState method in PipelineRunner.
>>
>>
>>
>> *发件人:* James [mailto:xumingmingv@gmail.com]
>> *发送时间:* 2017年2月22日 9:50
>> *收件人:* user@beam.apache.org
>> *主题:* Is it possible that a feature which the underlying engine (e.g.
>> Spark) supports, but cann't be expressed using Beam API?
>>
>>
>>
>> Is it possible that a feature which the underlying engine (e.g. Spark)
>> supports, but cann't be expressed using Beam API?
>>
>> If there is really such a case, how to handle it? (we are planning to use
>> Beam as the data processing API, but have this concern here.)
>>
>>
>>
>> Thanks in advance.
>>
>
>

Re: 答复: Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Posted by Frances Perry <fr...@apache.org>.

That's a great question!

Beam is about building an excellent programming model -- one that's unified
for batch and streaming use cases, enables efficient execution, and is
portable across multiple runtimes.

So Beam is neither the intersection of the functionality of all the engines
(too limited!) nor the union (too much of a kitchen sink!). Instead, Beam
tries to be at the forefront of where data processing is going, both
pushing functionality into and pulling patterns out of the runtime engines.

State [1] is a great example of functionality that existed in various
engines and enabled interesting and common use cases, but wasn't originally
expressible in Beam. We recently expanded the Beam model to include a
version of this functionality according to Beam's design principles [2].

And vice versa, we hope that Beam will influence the roadmaps of various
engines as well. For example, the semantics of Flink's DataStreams were
influenced [3] by the Beam (née Dataflow) model.

This also means that the capabilities will not always be exactly the same
across different Beam runners. So that's why we're using capability matrix
[4] to try to clearly communicate the state of things.

Hope that helps,
Frances

[1] https://beam.apache.org/blog/2017/02/13/stateful-processing.html
[2] https://beam.apache.org/contribute/design-principles/
[3] http://www.zdnet.com/article/going-with-the-stream-
unbounded-data-processing-with-apache-flink/
[4] https://beam.apache.org/documentation/runners/capability-matrix/

On Tue, Feb 21, 2017 at 7:22 PM, Tang Jijun(上海_技术部_数据平台_唐觊隽) <
tangjijun@yhd.com> wrote:

> I found a case.  After submit a spark app, we can stop or getState by
> JavaStreamingContext. But use beam api,we can’t stop or getState for
> pipeline. I think should add stop and getState method in PipelineRunner.
>
>
>
> *发件人:* James [mailto:xumingmingv@gmail.com]
> *发送时间:* 2017年2月22日 9:50
> *收件人:* user@beam.apache.org
> *主题:* Is it possible that a feature which the underlying engine (e.g.
> Spark) supports, but cann't be expressed using Beam API?
>
>
>
> Is it possible that a feature which the underlying engine (e.g. Spark)
> supports, but cann't be expressed using Beam API?
>
> If there is really such a case, how to handle it? (we are planning to use
> Beam as the data processing API, but have this concern here.)
>
>
>
> Thanks in advance.
>

答复: Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Posted by "Tang Jijun (上海_技术部_数据平台_唐觊隽)" <ta...@yhd.com>.

I found a case.  After submit a spark app, we can stop or getState by JavaStreamingContext. But use beam api,we can’t stop or getState for pipeline. I think should add stop and getState method in PipelineRunner.

发件人: James [mailto:xumingmingv@gmail.com]
发送时间: 2017年2月22日 9:50
收件人: user@beam.apache.org
主题: Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?

Is it possible that a feature which the underlying engine (e.g. Spark) supports, but cann't be expressed using Beam API?
If there is really such a case, how to handle it? (we are planning to use Beam as the data processing API, but have this concern here.)

Thanks in advance.