You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by amir bahmanyari <am...@yahoo.com> on 2016/07/25 21:59:33 UTC

Example: pass Runner at command line

Hi Colleagues,Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?No mention of  ANY specific Runner in the code like FlinkPipelineOptions .
Thanks.Amir-

Re: Example: pass Runner at command line

Posted by amir bahmanyari <am...@yahoo.com>.
Thanks Ismael et al. Appreciate it sir.Will give it a shot...Cheers

      From: Ismaël Mejía <ie...@gmail.com>
 To: user@beam.incubator.apache.org 
 Sent: Tuesday, July 26, 2016 2:25 PM
 Subject: Re: Example: pass Runner at command line
   
Amir, in the repository I sent you, you don't need to recompile, you can use the exact same binary code in every single runner (google dataflow included).

The maven profiles exist just to help you pick the right runtime dependencies. And as Lukasz mentioned you just have to choose the right runner and it works, try it and you will see.

You can even use the same jar with the runner-specific utils (e.g. you can use the spark-submit app with the same jar and the spark runner as dependencies and it will work) but remember that this is not the Beam way.

I agree that the ideal scenario is that all the examples in beam-examples project will be executable in every single runner (that should be a requisite that the runners should pass, of course depending on their capabilities), but this is not currently the case today, however I expect that we will be there  soon,

Ismael.





On Tue, Jul 26, 2016 at 9:18 PM, Lukasz Cwik <lc...@google.com> wrote:

I was under the impression that we had several @RunnableOnService integration tests that executed across runners.
Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java)
You still need to pass the "runner" specific options to get them to run like the GCP project / Flink cluster but this does give you the compile once and run the artifact on different runners.
On Tue, Jul 26, 2016 at 2:04 PM, Emanuele Cesena <em...@shopkick.com> wrote:

Hi,

No unfortunately I don’t think there’s currently any such example, although I guess if you take the Flink runner example and run it with the local runner it should work. Probably not with Spark though.

I don’t recall the state issue… but I didn’t have much time lately to explore, so I guess it’s still in the same *state* :)

Best,


> On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com> wrote:
>
> Thanks Emanuele,
> Yes, I know these examples exist.
> I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.
> A true "unified" example.
> Did you solve your State issue? I had the same questions sometime ago.
> For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.
> Cheers
>
> From: Emanuele Cesena <em...@shopkick.com>
> To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
> Sent: Monday, July 25, 2016 3:18 PM
> Subject: Re: Example: pass Runner at command line
>
> Hi Amir,
>
> If you’re looking for a runner-independent example, you can find some in examples/.
>
> If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
>
> Or, you can take code that refers to Flink, and remove the dependencies yourself.
>
> For instance, if you start from this example:
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>
> You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> Hope this helps,
>
>
> > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> >
> > Hi Colleagues,
> > Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> > No mention of  ANY specific Runner in the code like FlinkPipelineOptions .
> >
> > Thanks.
> > Amir-
>
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>
>

--
Emanuele Cesena, Data Eng.
http://www.shopkick.com

Il corpo non ha ideali










  

Re: Example: pass Runner at command line

Posted by Ismaël Mejía <ie...@gmail.com>.
Amir, in the repository I sent you, you don't need to recompile, you can
use the exact same binary code in every single runner (google dataflow
included).

The maven profiles exist just to help you pick the right runtime
dependencies. And as Lukasz mentioned you just have to choose the right
runner and it works, try it and you will see.

You can even use the same jar with the runner-specific utils (e.g. you can
use the spark-submit app with the same jar and the spark runner as
dependencies and it will work) but remember that this is not the Beam way.

I agree that the ideal scenario is that all the examples in beam-examples
project will be executable in every single runner (that should be a
requisite that the runners should pass, of course depending on their
capabilities), but this is not currently the case today, however I expect
that we will be there  soon,

Ismael.





On Tue, Jul 26, 2016 at 9:18 PM, Lukasz Cwik <lc...@google.com> wrote:

> I was under the impression that we had several @RunnableOnService
> integration tests that executed across runners.
>
> Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (
> https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
> )
>
> You still need to pass the "runner" specific options to get them to run
> like the GCP project / Flink cluster but this does give you the compile
> once and run the artifact on different runners.
>
> On Tue, Jul 26, 2016 at 2:04 PM, Emanuele Cesena <em...@shopkick.com>
> wrote:
>
>> Hi,
>>
>> No unfortunately I don’t think there’s currently any such example,
>> although I guess if you take the Flink runner example and run it with the
>> local runner it should work. Probably not with Spark though.
>>
>> I don’t recall the state issue… but I didn’t have much time lately to
>> explore, so I guess it’s still in the same *state* :)
>>
>> Best,
>>
>>
>> > On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com>
>> wrote:
>> >
>> > Thanks Emanuele,
>> > Yes, I know these examples exist.
>> > I thought there has been one put together that addresses the Runner
>> agnostic coding specifically without doing any extra work.
>> > A true "unified" example.
>> > Did you solve your State issue? I had the same questions sometime ago.
>> > For now, I use Redis to persist run-time state. Kinda poor man's way
>> :-) works for now, but doesn't scale as I want it.
>> > Cheers
>> >
>> > From: Emanuele Cesena <em...@shopkick.com>
>> > To: user@beam.incubator.apache.org; amir bahmanyari <
>> amirtousa@yahoo.com>
>> > Sent: Monday, July 25, 2016 3:18 PM
>> > Subject: Re: Example: pass Runner at command line
>> >
>> > Hi Amir,
>> >
>> > If you’re looking for a runner-independent example, you can find some
>> in examples/.
>> >
>> > If you’re looking for runner-independent code that works on Flink, I
>> think you should still wait a few iterations.
>> >
>> > Or, you can take code that refers to Flink, and remove the dependencies
>> yourself.
>> >
>> > For instance, if you start from this example:
>> >
>> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>> >
>> > You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
>> > You can remove FlinkPipelineOptions but you have to implement yourself
>> some of the getter/setter required by Flink, e.g. jobName:
>> >
>> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>> >
>> > Hope this helps,
>> >
>> >
>> > > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com>
>> wrote:
>> > >
>> > > Hi Colleagues,
>> > > Is there a simple genetic example where the Runner is passed at the
>> command line, the Beam code sets it in the generic Beam Options.set
>> Runner(), and Pipeline.create() is?
>> > > No mention of  ANY specific Runner in the code like
>> FlinkPipelineOptions .
>> > >
>> > > Thanks.
>> > > Amir-
>> >
>> >
>> > --
>> > Emanuele Cesena, Data Eng.
>> > http://www.shopkick.com
>> >
>> > Il corpo non ha ideali
>> >
>> >
>> >
>> >
>> >
>> >
>>
>> --
>> Emanuele Cesena, Data Eng.
>> http://www.shopkick.com
>>
>> Il corpo non ha ideali
>>
>>
>>
>>
>>
>

Re: Example: pass Runner at command line

Posted by Lukasz Cwik <lc...@google.com>.
I was under the impression that we had several @RunnableOnService
integration tests that executed across runners.

Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (
https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
)

You still need to pass the "runner" specific options to get them to run
like the GCP project / Flink cluster but this does give you the compile
once and run the artifact on different runners.

On Tue, Jul 26, 2016 at 2:04 PM, Emanuele Cesena <em...@shopkick.com>
wrote:

> Hi,
>
> No unfortunately I don’t think there’s currently any such example,
> although I guess if you take the Flink runner example and run it with the
> local runner it should work. Probably not with Spark though.
>
> I don’t recall the state issue… but I didn’t have much time lately to
> explore, so I guess it’s still in the same *state* :)
>
> Best,
>
>
> > On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> >
> > Thanks Emanuele,
> > Yes, I know these examples exist.
> > I thought there has been one put together that addresses the Runner
> agnostic coding specifically without doing any extra work.
> > A true "unified" example.
> > Did you solve your State issue? I had the same questions sometime ago.
> > For now, I use Redis to persist run-time state. Kinda poor man's way :-)
> works for now, but doesn't scale as I want it.
> > Cheers
> >
> > From: Emanuele Cesena <em...@shopkick.com>
> > To: user@beam.incubator.apache.org; amir bahmanyari <amirtousa@yahoo.com
> >
> > Sent: Monday, July 25, 2016 3:18 PM
> > Subject: Re: Example: pass Runner at command line
> >
> > Hi Amir,
> >
> > If you’re looking for a runner-independent example, you can find some in
> examples/.
> >
> > If you’re looking for runner-independent code that works on Flink, I
> think you should still wait a few iterations.
> >
> > Or, you can take code that refers to Flink, and remove the dependencies
> yourself.
> >
> > For instance, if you start from this example:
> >
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
> >
> > You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> > You can remove FlinkPipelineOptions but you have to implement yourself
> some of the getter/setter required by Flink, e.g. jobName:
> >
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
> >
> > Hope this helps,
> >
> >
> > > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> > >
> > > Hi Colleagues,
> > > Is there a simple genetic example where the Runner is passed at the
> command line, the Beam code sets it in the generic Beam Options.set
> Runner(), and Pipeline.create() is?
> > > No mention of  ANY specific Runner in the code like
> FlinkPipelineOptions .
> > >
> > > Thanks.
> > > Amir-
> >
> >
> > --
> > Emanuele Cesena, Data Eng.
> > http://www.shopkick.com
> >
> > Il corpo non ha ideali
> >
> >
> >
> >
> >
> >
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>

Re: Example: pass Runner at command line

Posted by Emanuele Cesena <em...@shopkick.com>.
Hi,

No unfortunately I don’t think there’s currently any such example, although I guess if you take the Flink runner example and run it with the local runner it should work. Probably not with Spark though.

I don’t recall the state issue… but I didn’t have much time lately to explore, so I guess it’s still in the same *state* :)

Best,


> On Jul 25, 2016, at 3:30 PM, amir bahmanyari <am...@yahoo.com> wrote:
> 
> Thanks Emanuele,
> Yes, I know these examples exist. 
> I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.
> A true "unified" example.
> Did you solve your State issue? I had the same questions sometime ago.
> For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.
> Cheers
> 
> From: Emanuele Cesena <em...@shopkick.com>
> To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com> 
> Sent: Monday, July 25, 2016 3:18 PM
> Subject: Re: Example: pass Runner at command line
> 
> Hi Amir,
> 
> If you’re looking for a runner-independent example, you can find some in examples/.
> 
> If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.
> 
> Or, you can take code that refers to Flink, and remove the dependencies yourself.
> 
> For instance, if you start from this example:
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
> 
> You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
> 
> Hope this helps,
> 
> 
> > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> > 
> > Hi Colleagues,
> > Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> > No mention of  ANY specific Runner in the code like FlinkPipelineOptions .
> > 
> > Thanks.
> > Amir-
> 
> 
> -- 
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
> 
> Il corpo non ha ideali
> 
> 
> 
> 
> 
> 

-- 
Emanuele Cesena, Data Eng.
http://www.shopkick.com

Il corpo non ha ideali





Re: Example: pass Runner at command line

Posted by amir bahmanyari <am...@yahoo.com>.
Thanks Ismael.I thought/was looking for: compile once and run the same binary for different Runner(s) passing it from the command line.Is this a realistic assumption?The source code is extremely helpful to get on the same page as far as style, logging etc.Have a great day.

      From: Ismaël Mejía <ie...@gmail.com>
 To: user@beam.incubator.apache.org 
 Sent: Tuesday, July 26, 2016 12:28 AM
 Subject: Re: Example: pass Runner at command line
   
Hello,

You can look at the repository for GDELT based examples that JB and me are working on, they include maven profiles to run the Pipelines in most of the existing runners (not Gearpump yet but soon). You can take those profiles as an example. I just changed the README so you can see how to execute them with Maven.

https://github.com/jbonofre/beam-samples

Ismaël





On Tue, Jul 26, 2016 at 12:30 AM, amir bahmanyari <am...@yahoo.com> wrote:

Thanks Emanuele,Yes, I know these examples exist. I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.A true "unified" example.Did you solve your State issue? I had the same questions sometime ago.For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.Cheers

      From: Emanuele Cesena <em...@shopkick.com>
 To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com> 
 Sent: Monday, July 25, 2016 3:18 PM
 Subject: Re: Example: pass Runner at command line
  
Hi Amir,

If you’re looking for a runner-independent example, you can find some in examples/.

If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.

Or, you can take code that refers to Flink, and remove the dependencies yourself.

For instance, if you start from this example:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java

You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java

Hope this helps,


> On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> 
> Hi Colleagues,
> Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> No mention of  ANY specific Runner in the code like FlinkPipelineOptions .
> 
> Thanks.
> Amir-

-- 
Emanuele Cesena, Data Eng.
http://www.shopkick.com

Il corpo non ha ideali





   



  

Re: Example: pass Runner at command line

Posted by Ismaël Mejía <ie...@gmail.com>.
Hello,

You can look at the repository for GDELT based examples that JB and me are
working on, they include maven profiles to run the Pipelines in most of the
existing runners (not Gearpump yet but soon). You can take those profiles
as an example. I just changed the README so you can see how to execute them
with Maven.

https://github.com/jbonofre/beam-samples

Ismaël





On Tue, Jul 26, 2016 at 12:30 AM, amir bahmanyari <am...@yahoo.com>
wrote:

> Thanks Emanuele,
> Yes, I know these examples exist.
> I thought there has been one put together that addresses the Runner
> agnostic coding specifically without doing any extra work.
> A true "unified" example.
> Did you solve your State issue? I had the same questions sometime ago.
> For now, I use Redis to persist run-time state. Kinda poor man's way :-)
> works for now, but doesn't scale as I want it.
> Cheers
>
> ------------------------------
> *From:* Emanuele Cesena <em...@shopkick.com>
> *To:* user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com>
>
> *Sent:* Monday, July 25, 2016 3:18 PM
> *Subject:* Re: Example: pass Runner at command line
>
> Hi Amir,
>
> If you’re looking for a runner-independent example, you can find some in
> examples/.
>
> If you’re looking for runner-independent code that works on Flink, I think
> you should still wait a few iterations.
>
> Or, you can take code that refers to Flink, and remove the dependencies
> yourself.
>
> For instance, if you start from this example:
>
> https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java
>
> You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
> You can remove FlinkPipelineOptions but you have to implement yourself
> some of the getter/setter required by Flink, e.g. jobName:
>
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> Hope this helps,
>
>
> > On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com>
> wrote:
> >
> > Hi Colleagues,
> > Is there a simple genetic example where the Runner is passed at the
> command line, the Beam code sets it in the generic Beam Options.set
> Runner(), and Pipeline.create() is?
> > No mention of  ANY specific Runner in the code like FlinkPipelineOptions
> .
> >
> > Thanks.
> > Amir-
>
>
> --
> Emanuele Cesena, Data Eng.
> http://www.shopkick.com
>
> Il corpo non ha ideali
>
>
>
>
>
>
>

Re: Example: pass Runner at command line

Posted by amir bahmanyari <am...@yahoo.com>.
Thanks Emanuele,Yes, I know these examples exist. I thought there has been one put together that addresses the Runner agnostic coding specifically without doing any extra work.A true "unified" example.Did you solve your State issue? I had the same questions sometime ago.For now, I use Redis to persist run-time state. Kinda poor man's way :-) works for now, but doesn't scale as I want it.Cheers

      From: Emanuele Cesena <em...@shopkick.com>
 To: user@beam.incubator.apache.org; amir bahmanyari <am...@yahoo.com> 
 Sent: Monday, July 25, 2016 3:18 PM
 Subject: Re: Example: pass Runner at command line
   
Hi Amir,

If you’re looking for a runner-independent example, you can find some in examples/.

If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.

Or, you can take code that refers to Flink, and remove the dependencies yourself.

For instance, if you start from this example:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java

You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java

Hope this helps,


> On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> 
> Hi Colleagues,
> Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> No mention of  ANY specific Runner in the code like FlinkPipelineOptions .
> 
> Thanks.
> Amir-

-- 
Emanuele Cesena, Data Eng.
http://www.shopkick.com

Il corpo non ha ideali





  

Re: Example: pass Runner at command line

Posted by Emanuele Cesena <em...@shopkick.com>.
Hi Amir,

If you’re looking for a runner-independent example, you can find some in examples/.

If you’re looking for runner-independent code that works on Flink, I think you should still wait a few iterations.

Or, you can take code that refers to Flink, and remove the dependencies yourself.

For instance, if you start from this example:
https://github.com/ecesena/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/WordCount.java

You can remove the setRunner by passing -Drunner=FlinkRunner in pom.xml
You can remove FlinkPipelineOptions but you have to implement yourself some of the getter/setter required by Flink, e.g. jobName:
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java

Hope this helps,


> On Jul 25, 2016, at 2:59 PM, amir bahmanyari <am...@yahoo.com> wrote:
> 
> Hi Colleagues,
> Is there a simple genetic example where the Runner is passed at the command line, the Beam code sets it in the generic Beam Options.set Runner(), and Pipeline.create() is?
> No mention of  ANY specific Runner in the code like FlinkPipelineOptions .
> 
> Thanks.
> Amir-

-- 
Emanuele Cesena, Data Eng.
http://www.shopkick.com

Il corpo non ha ideali