You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by amir bahmanyari <am...@yahoo.com> on 2016/07/25 21:59:33 UTC
Example: pass Runner at command line
Hi Colleagues,Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?No mention of ANY specific Runner in the code like FlinkPipelineOptions .
Thanks.Amir-
Re: Example: pass Runner at command line
Posted by amir bahmanyari <am...@yahoo.com>.
Thanks Ismael et al. Appreciate it sir.Will give it a shot...Cheers
From: Ismaël Mejía <ie...@gmail.com>
To: user@beam.incubator.apache.org
Sent: Tuesday, July 26, 2016 2:25 PM
Subject: Re: Example: pass Runner at command line
Amir, in the repository I sent you, you don't need to recompile, you can use the exact same binary code in every single runner (google dataflow included).
The maven profiles exist just to help you pick the right runtime dependencies. And as Lukasz mentioned you just have to choose the right runner and it works, try it and you will see.
You can even use the same jar with the runner-specific utils (e.g. you can use the spark-submit app with the same jar and the spark runner as dependencies and it will work) but remember that this is not the Beam way.
I agree that the ideal scenario is that all the examples in beam-examples project will be executable in every single runner (that should be a requisite that the runners should pass, of course depending on their capabilities), but this is not currently the case today, however I expect that we will be there soon,
Ismael.
On Tue, Jul 26, 2016 at 9:18 PM, Lukasz Cwik <lc...@google.com> wrote:
I was under the impression that we had several @RunnableOnService integration tests that executed across runners.
Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java)
You still need to pass the "runner" specific options to get them to run like the GCP project / Flink cluster but this does give you the compile once and run the artifact on different runners.
On Tue, Jul 26, 2016 at 2:04 PM, Emanuele Cesena <em...@shopkick.com> wrote:
Hi,
No unfortunately I don’t think there’s currently any such example, although I guess if you take the Flink runner example and run it with the local runner it should work. Probably not with Spark though.
I don’t recall the state issue… but I didn’t have much time lately to explore, so I guess it’s still in the same *state* :)
Best,
> On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com> wrote:
>
> Thanks Emanuele,
> Yes, I know these examples exist.
> I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.
> A true "unified" example.
> Did you solve your State issue? I had the same questions sometime ago.
> For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.
> Cheers
>
> From: Emanuele Cesena <em...@shopkick.com>
> To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
> Sent: Monday, July 25, 2016 3:18 PM
> Subject: Re: Example: pass Runner at command line
>
> Hi Amir,
>
> If you’re looking for a runner-independent example, you can find some in examples/.
>
> If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
>
> Or, you can take code that refers to Flink, and remove the dependencies yourself.
>
> For instance, if you start from this example:
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>
> You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> Hope this helps,
>
>
> > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> >
> > Hi Colleagues,
> > Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> > No mention of ANY specific Runner in the code like FlinkPipelineOptions .
> >
> > Thanks.
> > Amir-
>
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>
>
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali
Re: Example: pass Runner at command line
Posted by Ismaël Mejía <ie...@gmail.com>.
Amir, in the repository I sent you, you don't need to recompile, you can
use the exact same binary code in every single runner (google dataflow
included).
The maven profiles exist just to help you pick the right runtime
dependencies. And as Lukasz mentioned you just have to choose the right
runner and it works, try it and you will see.
You can even use the same jar with the runner-specific utils (e.g. you can
use the spark-submit app with the same jar and the spark runner as
dependencies and it will work) but remember that this is not the Beam way.
I agree that the ideal scenario is that all the examples in beam-examples
project will be executable in every single runner (that should be a
requisite that the runners should pass, of course depending on their
capabilities), but this is not currently the case today, however I expect
that we will be there soon,
Ismael.
On Tue, Jul 26, 2016 at 9:18 PM, Lukasz Cwik <lc...@google.com> wrote:
> I was under the impression that we had several @RunnableOnService
> integration tests that executed across runners.
>
> Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (
> https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
> )
>
> You still need to pass the "runner" specific options to get them to run
> like the GCP project / Flink cluster but this does give you the compile
> once and run the artifact on different runners.
>
> On Tue, Jul 26, 2016 at 2:04 PM, Emanuele Cesena <em...@shopkick.com>
> wrote:
>
>> Hi,
>>
>> No unfortunately I don’t think there’s currently any such example,
>> although I guess if you take the Flink runner example and run it with the
>> local runner it should work. Probably not with Spark though.
>>
>> I don’t recall the state issue… but I didn’t have much time lately to
>> explore, so I guess it’s still in the same *state* :)
>>
>> Best,
>>
>>
>> > On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com>
>> wrote:
>> >
>> > Thanks Emanuele,
>> > Yes, I know these examples exist.
>> > I thought there has been one put together that addresses the Runner
>> agnostic coding specifically without doing any extra work.
>> > A true "unified" example.
>> > Did you solve your State issue? I had the same questions sometime ago.
>> > For now, I use Redis to persist run-time state. Kinda poor man's way
>> :-) works for now, but doesn't scale as I want it.
>> > Cheers
>> >
>> > From: Emanuele Cesena <em...@shopkick.com>
>> > To: user@beam.incubator.apache.org; amir bahmanyari <
>> amirtousa@yahoo.com>
>> > Sent: Monday, July 25, 2016 3:18 PM
>> > Subject: Re: Example: pass Runner at command line
>> >
>> > Hi Amir,
>> >
>> > If you’re looking for a runner-independent example, you can find some
>> in examples/.
>> >
>> > If you’re looking for runner-independent code that works on Flink, I
>> think you should still wait a few iterations.
>> >
>> > Or, you can take code that refers to Flink, and remove the dependencies
>> yourself.
>> >
>> > For instance, if you start from this example:
>> >
>> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>> >
>> > You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
>> > You can remove FlinkPipelineOptions but you have to implement yourself
>> some of the getter/setter required by Flink, e.g. jobName:
>> >
>> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>> >
>> > Hope this helps,
>> >
>> >
>> > > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com>
>> wrote:
>> > >
>> > > Hi Colleagues,
>> > > Is there a simple genetic example where the Runner is passed at the
>> command line, the Beam code sets it in the generic Beam Options.set
>> Runner(), and Pipeline.create() is?
>> > > No mention of ANY specific Runner in the code like
>> FlinkPipelineOptions .
>> > >
>> > > Thanks.
>> > > Amir-
>> >
>> >
>> > --
>> > Emanuele Cesena, Data Eng.
>> > http://www.shopkick.com
>> >
>> > Il corpo non ha ideali
>> >
>> >
>> >
>> >
>> >
>> >
>>
>> --
>> Emanuele Cesena, Data Eng.
>> http://www.shopkick.com
>>
>> Il corpo non ha ideali
>>
>>
>>
>>
>>
>
Re: Example: pass Runner at command line
Posted by Lukasz Cwik <lc...@google.com>.
I was under the impression that we had several @RunnableOnService
integration tests that executed across runners.
Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (
https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
)
You still need to pass the "runner" specific options to get them to run
like the GCP project / Flink cluster but this does give you the compile
once and run the artifact on different runners.
On Tue, Jul 26, 2016 at 2:04 PM, Emanuele Cesena <em...@shopkick.com>
wrote:
> Hi,
>
> No unfortunately I don’t think there’s currently any such example,
> although I guess if you take the Flink runner example and run it with the
> local runner it should work. Probably not with Spark though.
>
> I don’t recall the state issue… but I didn’t have much time lately to
> explore, so I guess it’s still in the same *state* :)
>
> Best,
>
>
> > On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> >
> > Thanks Emanuele,
> > Yes, I know these examples exist.
> > I thought there has been one put together that addresses the Runner
> agnostic coding specifically without doing any extra work.
> > A true "unified" example.
> > Did you solve your State issue? I had the same questions sometime ago.
> > For now, I use Redis to persist run-time state. Kinda poor man's way :-)
> works for now, but doesn't scale as I want it.
> > Cheers
> >
> > From: Emanuele Cesena <em...@shopkick.com>
> > To: user@beam.incubator.apache.org; amir bahmanyari <amirtousa@yahoo.com
> >
> > Sent: Monday, July 25, 2016 3:18 PM
> > Subject: Re: Example: pass Runner at command line
> >
> > Hi Amir,
> >
> > If you’re looking for a runner-independent example, you can find some in
> examples/.
> >
> > If you’re looking for runner-independent code that works on Flink, I
> think you should still wait a few iterations.
> >
> > Or, you can take code that refers to Flink, and remove the dependencies
> yourself.
> >
> > For instance, if you start from this example:
> >
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
> >
> > You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> > You can remove FlinkPipelineOptions but you have to implement yourself
> some of the getter/setter required by Flink, e.g. jobName:
> >
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
> >
> > Hope this helps,
> >
> >
> > > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> > >
> > > Hi Colleagues,
> > > Is there a simple genetic example where the Runner is passed at the
> command line, the Beam code sets it in the generic Beam Options.set
> Runner(), and Pipeline.create() is?
> > > No mention of ANY specific Runner in the code like
> FlinkPipelineOptions .
> > >
> > > Thanks.
> > > Amir-
> >
> >
> > --
> > Emanuele Cesena, Data Eng.
> > http://www.shopkick.com
> >
> > Il corpo non ha ideali
> >
> >
> >
> >
> >
> >
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>
Re: Example: pass Runner at command line
Posted by Emanuele Cesena <em...@shopkick.com>.
Hi,
No unfortunately I don’t think there’s currently any such example, although I guess if you take the Flink runner example and run it with the local runner it should work. Probably not with Spark though.
I don’t recall the state issue… but I didn’t have much time lately to explore, so I guess it’s still in the same *state* :)
Best,
> On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com> wrote:
>
> Thanks Emanuele,
> Yes, I know these examples exist.
> I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.
> A true "unified" example.
> Did you solve your State issue? I had the same questions sometime ago.
> For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.
> Cheers
>
> From: Emanuele Cesena <em...@shopkick.com>
> To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
> Sent: Monday, July 25, 2016 3:18 PM
> Subject: Re: Example: pass Runner at command line
>
> Hi Amir,
>
> If you’re looking for a runner-independent example, you can find some in examples/.
>
> If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
>
> Or, you can take code that refers to Flink, and remove the dependencies yourself.
>
> For instance, if you start from this example:
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>
> You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> Hope this helps,
>
>
> > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> >
> > Hi Colleagues,
> > Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> > No mention of ANY specific Runner in the code like FlinkPipelineOptions .
> >
> > Thanks.
> > Amir-
>
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>
>
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali
Re: Example: pass Runner at command line
Posted by amir bahmanyari <am...@yahoo.com>.
Thanks Ismael.I thought/was looking for: compile once and run the same binary for different Runner(s) passing it from the command line.Is this a realistic assumption?The source code is extremely helpful to get on the same page as far as style, logging etc.Have a great day.
From: Ismaël Mejía <ie...@gmail.com>
To: user@beam.incubator.apache.org
Sent: Tuesday, July 26, 2016 12:28 AM
Subject: Re: Example: pass Runner at command line
Hello,
You can look at the repository for GDELT based examples that JB and me are working on, they include maven profiles to run the Pipelines in most of the existing runners (not Gearpump yet but soon). You can take those profiles as an example. I just changed the README so you can see how to execute them with Maven.
https://github.com/jbonofre/beam-samples
Ismaël
On Tue, Jul 26, 2016 at 12:30 AM, amir bahmanyari <am...@yahoo.com> wrote:
Thanks Emanuele,Yes, I know these examples exist. I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.A true "unified" example.Did you solve your State issue? I had the same questions sometime ago.For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.Cheers
From: Emanuele Cesena <em...@shopkick.com>
To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
Sent: Monday, July 25, 2016 3:18 PM
Subject: Re: Example: pass Runner at command line
Hi Amir,
If you’re looking for a runner-independent example, you can find some in examples/.
If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
Or, you can take code that refers to Flink, and remove the dependencies yourself.
For instance, if you start from this example:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
Hope this helps,
> On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
>
> Hi Colleagues,
> Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> No mention of ANY specific Runner in the code like FlinkPipelineOptions .
>
> Thanks.
> Amir-
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali
Re: Example: pass Runner at command line
Posted by Ismaël Mejía <ie...@gmail.com>.
Hello,
You can look at the repository for GDELT based examples that JB and me are
working on, they include maven profiles to run the Pipelines in most of the
existing runners (not Gearpump yet but soon). You can take those profiles
as an example. I just changed the README so you can see how to execute them
with Maven.
https://github.com/jbonofre/beam-samples
Ismaël
On Tue, Jul 26, 2016 at 12:30 AM, amir bahmanyari <am...@yahoo.com>
wrote:
> Thanks Emanuele,
> Yes, I know these examples exist.
> I thought there has been one put together that addresses the Runner
> agnostic coding specifically without doing any extra work.
> A true "unified" example.
> Did you solve your State issue? I had the same questions sometime ago.
> For now, I use Redis to persist run-time state. Kinda poor man's way :-)
> works for now, but doesn't scale as I want it.
> Cheers
>
> ------------------------------
> *From:* Emanuele Cesena <em...@shopkick.com>
> *To:* user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
>
> *Sent:* Monday, July 25, 2016 3:18 PM
> *Subject:* Re: Example: pass Runner at command line
>
> Hi Amir,
>
> If you’re looking for a runner-independent example, you can find some in
> examples/.
>
> If you’re looking for runner-independent code that works on Flink, I think
> you should still wait a few iterations.
>
> Or, you can take code that refers to Flink, and remove the dependencies
> yourself.
>
> For instance, if you start from this example:
>
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>
> You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> You can remove FlinkPipelineOptions but you have to implement yourself
> some of the getter/setter required by Flink, e.g. jobName:
>
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> Hope this helps,
>
>
> > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> >
> > Hi Colleagues,
> > Is there a simple genetic example where the Runner is passed at the
> command line, the Beam code sets it in the generic Beam Options.set
> Runner(), and Pipeline.create() is?
> > No mention of ANY specific Runner in the code like FlinkPipelineOptions
> .
> >
> > Thanks.
> > Amir-
>
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>
>
>
Re: Example: pass Runner at command line
Posted by amir bahmanyari <am...@yahoo.com>.
Thanks Emanuele,Yes, I know these examples exist. I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.A true "unified" example.Did you solve your State issue? I had the same questions sometime ago.For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.Cheers
From: Emanuele Cesena <em...@shopkick.com>
To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
Sent: Monday, July 25, 2016 3:18 PM
Subject: Re: Example: pass Runner at command line
Hi Amir,
If you’re looking for a runner-independent example, you can find some in examples/.
If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
Or, you can take code that refers to Flink, and remove the dependencies yourself.
For instance, if you start from this example:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
Hope this helps,
> On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
>
> Hi Colleagues,
> Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> No mention of ANY specific Runner in the code like FlinkPipelineOptions .
>
> Thanks.
> Amir-
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali
Re: Example: pass Runner at command line
Posted by Emanuele Cesena <em...@shopkick.com>.
Hi Amir,
If you’re looking for a runner-independent example, you can find some in examples/.
If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
Or, you can take code that refers to Flink, and remove the dependencies yourself.
For instance, if you start from this example:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
Hope this helps,
> On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
>
> Hi Colleagues,
> Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> No mention of ANY specific Runner in the code like FlinkPipelineOptions .
>
> Thanks.
> Amir-
--
Emanuele Cesena, Data Eng.
http://www.shopkick.com
Il corpo non ha ideali