You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Dan <da...@dankeeley.co.uk> on 2019/01/30 09:00:38 UTC

Visual Beam - First demonstration - London

Hi, in just over a week you're all welcome to come and see the very first
public reveal of Kettle running on beam! (Including spark, dataflow and
flink support)

https://www.meetup.com/Pentaho-London-User-Group/events/256773962/

So this ingenious integration combines the power of visual development,
with the platform agnostic benefits of beam - impressive stuff. No vendor
lock-in here!


See you there!
Dan

Re: Visual Beam - First demonstration - London

Posted by Ankur Goenka <go...@google.com>.
Thanks for sharing the video.

On Sun, Feb 10, 2019 at 12:49 PM Dan <da...@dankeeley.co.uk> wrote:

> Here's the video. Enjoy!
>
> https://skillsmatter.com/skillscasts/13405-how-to-run-kettle-on-apache-beam
>
> Sent from my phone
>
> On Wed, 6 Feb 2019, 5:03 pm Maximilian Michels <mxm@apache.org wrote:
>
>> Hi Dan,
>>
>> Thanks for the info. Would be great to share a video of the presentation.
>>
>> Cheers,
>> Max
>>
>> On 30.01.19 10:00, Dan wrote:
>> > Hi, in just over a week you're all welcome to come and see the very
>> > first public reveal of Kettle running on beam! (Including spark,
>> > dataflow and flink support)
>> >
>> > https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
>> >
>> > So this ingenious integration combines the power of visual development,
>> > with the platform agnostic benefits of beam - impressive stuff. No
>> > vendor lock-in here!
>> >
>> >
>> > See you there!
>> > Dan
>>
>

Re: Visual Beam - First demonstration - London

Posted by Dan <da...@dankeeley.co.uk>.
Here's the video. Enjoy!

https://skillsmatter.com/skillscasts/13405-how-to-run-kettle-on-apache-beam

Sent from my phone

On Wed, 6 Feb 2019, 5:03 pm Maximilian Michels <mxm@apache.org wrote:

> Hi Dan,
>
> Thanks for the info. Would be great to share a video of the presentation.
>
> Cheers,
> Max
>
> On 30.01.19 10:00, Dan wrote:
> > Hi, in just over a week you're all welcome to come and see the very
> > first public reveal of Kettle running on beam! (Including spark,
> > dataflow and flink support)
> >
> > https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
> >
> > So this ingenious integration combines the power of visual development,
> > with the platform agnostic benefits of beam - impressive stuff. No
> > vendor lock-in here!
> >
> >
> > See you there!
> > Dan
>

Re: Visual Beam - First demonstration - London

Posted by Matt Casters <ma...@gmail.com>.
Thanks a lot for the tip and for looking into the code. A lot of cleanup
certainly needs to happen ;-)
The original tip for using @FinishBundle came from Maximilian and it does
indeed work like a charm.

For some reason I couldn't find the annotations for the DoFn main methods
but I really should have looked into DoFn.java a bit earlier. This whole
Kettle Beam thing started out as a bit of a side project. I didn't know it
was going to work so quickly.

But now I'm happy how things are going.  On the whole it's really awesome
to be able to drag and drop a Kettle transformation together, unit test it,
run it against direct runner to see all is well and then run it against
bigger data sets in DataFlow or Spark. It really speeds up development in
the sense that it's quite easy test a lot of different scenarios.

Anyway, this has indeed been a lot of fun, thanks for allowing it with the
Apache Beam project!

Cheers,
Matt

Op zo 17 feb. 2019 06:43 schreef Kenneth Knowles <kenn@apache.org:

> I think my favorite line is "Haven’t had this much fun since I started
> Kettle" :-)
>
> I was browsing https://github.com/mattcasters/kettle-beam/ to see if I
> could comment on how to take advantage of bundling (where to move expensive
> logic to @FinishBundle - you will notice that Beam's IO connectors use this)
>
> I noticed that the core DoFns are defined in org.kettle.beam.core which is
> in the separate https://github.com/mattcasters/kettle-beam-core/
> repository. JFYI for the sake of users / code lurkers.
>
> The only place that looked like it did the sort of work where bundling
> would matter is the StepTransform. There's already a separate @FinishBundle
> there - does more of the logic need to be moved there?
>
> Kenn
>
> On Tue, Feb 12, 2019 at 8:01 AM Maximilian Michels <mx...@apache.org> wrote:
>
>> Yes, you can use Flink's local execution mode, which is the default if
>> you don't provide any settings. A cluster should not be necessary to
>> complete the integration. Ideally, it should work out of the box :)
>>
>> > However, I'm first trying to solve the complicated issue of grouping
>> records together in Beam in a safe way so that they can batched up
>>
>> I'm not sure what your use case is but Beam does batching by default.
>> The batches are called bundles. The Flink Runner supports setting the
>> bundle size.
>>
>> Cheers,
>> Max
>>
>> On 12.02.19 12:20, Matt Casters wrote:
>> > Yes, Flink is obviously the next target.  I'm not expecting too many
>> > issues there beyond getting a cluster set up to test on.  I read you
>> can
>> > run the Flink Runner locally so that will help a lot in testing.
>> >
>> > However, I'm first trying to solve the complicated issue of grouping
>> > records together in Beam in a safe way so that they can batched up.
>> > Batching up is really important for fast loading into a lot of output
>> > targets.  I'll probably use some group by behind the scenes or
>> something
>> > like that, need to think about that.
>> > Having the ability to re-use the existing Kettle steps without having
>> to
>> > write new code is really key.
>> >
>> > Once that is done (in a few weeks) I'll give Flink a shot.
>> >
>> > Cheers,
>> >
>> > Matt
>> >
>> > Op di 12 feb. 2019 om 12:02 schreef Maximilian Michels <mxm@apache.org
>> > <ma...@apache.org>>:
>> >
>> >     @Dan: Thanks for sharing the presentation. Kettle is a great way to
>> >     make
>> >     Beam more accessible.
>> >
>> >     @Matt: Thanks for the plug. It's good to hear you enjoyed it. I
>> think
>> >     the link to your slides got messed up: http://beam.kettle.be
>> >
>> >     Are you planning to add execution via the Flink Runner to Kettle?
>> >     Saw in
>> >     the presentation that you already support Direct, Spark, and
>> Dataflow.
>> >
>> >     On 11.02.19 20:50, Matt Casters wrote:
>> >      > By the way, Maximilian, I linked and plugged your wonderful
>> FOSDEM
>> >      > presentation in my slides http://beam kettle.be
>> >     <http://kettle.be> <http://kettle.be> slide
>> >      > 19. If you mind, let me know and I'll get it out of the slides.
>> >     In any
>> >      > case, great content worth promoting I thought.
>> >      >
>> >      > Op wo 6 feb. 2019 18:03 schreef Maximilian Michels
>> >     <mxm@apache.org <ma...@apache.org>
>> >      > <mailto:mxm@apache.org <ma...@apache.org>>:
>> >      >
>> >      >     Hi Dan,
>> >      >
>> >      >     Thanks for the info. Would be great to share a video of the
>> >      >     presentation.
>> >      >
>> >      >     Cheers,
>> >      >     Max
>> >      >
>> >      >     On 30.01.19 10:00, Dan wrote:
>> >      >      > Hi, in just over a week you're all welcome to come and see
>> >     the very
>> >      >      > first public reveal of Kettle running on beam! (Including
>> >     spark,
>> >      >      > dataflow and flink support)
>> >      >      >
>> >      >      >
>> >     https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
>> >      >      >
>> >      >      > So this ingenious integration combines the power of visual
>> >      >     development,
>> >      >      > with the platform agnostic benefits of beam - impressive
>> >     stuff. No
>> >      >      > vendor lock-in here!
>> >      >      >
>> >      >      >
>> >      >      > See you there!
>> >      >      > Dan
>> >      >
>> >
>>
>

Re: Visual Beam - First demonstration - London

Posted by Kenneth Knowles <ke...@apache.org>.
I think my favorite line is "Haven’t had this much fun since I started
Kettle" :-)

I was browsing https://github.com/mattcasters/kettle-beam/ to see if I
could comment on how to take advantage of bundling (where to move expensive
logic to @FinishBundle - you will notice that Beam's IO connectors use this)

I noticed that the core DoFns are defined in org.kettle.beam.core which is
in the separate https://github.com/mattcasters/kettle-beam-core/
repository. JFYI for the sake of users / code lurkers.

The only place that looked like it did the sort of work where bundling
would matter is the StepTransform. There's already a separate @FinishBundle
there - does more of the logic need to be moved there?

Kenn

On Tue, Feb 12, 2019 at 8:01 AM Maximilian Michels <mx...@apache.org> wrote:

> Yes, you can use Flink's local execution mode, which is the default if
> you don't provide any settings. A cluster should not be necessary to
> complete the integration. Ideally, it should work out of the box :)
>
> > However, I'm first trying to solve the complicated issue of grouping
> records together in Beam in a safe way so that they can batched up
>
> I'm not sure what your use case is but Beam does batching by default.
> The batches are called bundles. The Flink Runner supports setting the
> bundle size.
>
> Cheers,
> Max
>
> On 12.02.19 12:20, Matt Casters wrote:
> > Yes, Flink is obviously the next target.  I'm not expecting too many
> > issues there beyond getting a cluster set up to test on.  I read you can
> > run the Flink Runner locally so that will help a lot in testing.
> >
> > However, I'm first trying to solve the complicated issue of grouping
> > records together in Beam in a safe way so that they can batched up.
> > Batching up is really important for fast loading into a lot of output
> > targets.  I'll probably use some group by behind the scenes or something
> > like that, need to think about that.
> > Having the ability to re-use the existing Kettle steps without having to
> > write new code is really key.
> >
> > Once that is done (in a few weeks) I'll give Flink a shot.
> >
> > Cheers,
> >
> > Matt
> >
> > Op di 12 feb. 2019 om 12:02 schreef Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>>:
> >
> >     @Dan: Thanks for sharing the presentation. Kettle is a great way to
> >     make
> >     Beam more accessible.
> >
> >     @Matt: Thanks for the plug. It's good to hear you enjoyed it. I think
> >     the link to your slides got messed up: http://beam.kettle.be
> >
> >     Are you planning to add execution via the Flink Runner to Kettle?
> >     Saw in
> >     the presentation that you already support Direct, Spark, and
> Dataflow.
> >
> >     On 11.02.19 20:50, Matt Casters wrote:
> >      > By the way, Maximilian, I linked and plugged your wonderful FOSDEM
> >      > presentation in my slides http://beam kettle.be
> >     <http://kettle.be> <http://kettle.be> slide
> >      > 19. If you mind, let me know and I'll get it out of the slides.
> >     In any
> >      > case, great content worth promoting I thought.
> >      >
> >      > Op wo 6 feb. 2019 18:03 schreef Maximilian Michels
> >     <mxm@apache.org <ma...@apache.org>
> >      > <mailto:mxm@apache.org <ma...@apache.org>>:
> >      >
> >      >     Hi Dan,
> >      >
> >      >     Thanks for the info. Would be great to share a video of the
> >      >     presentation.
> >      >
> >      >     Cheers,
> >      >     Max
> >      >
> >      >     On 30.01.19 10:00, Dan wrote:
> >      >      > Hi, in just over a week you're all welcome to come and see
> >     the very
> >      >      > first public reveal of Kettle running on beam! (Including
> >     spark,
> >      >      > dataflow and flink support)
> >      >      >
> >      >      >
> >     https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
> >      >      >
> >      >      > So this ingenious integration combines the power of visual
> >      >     development,
> >      >      > with the platform agnostic benefits of beam - impressive
> >     stuff. No
> >      >      > vendor lock-in here!
> >      >      >
> >      >      >
> >      >      > See you there!
> >      >      > Dan
> >      >
> >
>

Re: Visual Beam - First demonstration - London

Posted by Maximilian Michels <mx...@apache.org>.
Yes, you can use Flink's local execution mode, which is the default if 
you don't provide any settings. A cluster should not be necessary to 
complete the integration. Ideally, it should work out of the box :)

> However, I'm first trying to solve the complicated issue of grouping records together in Beam in a safe way so that they can batched up

I'm not sure what your use case is but Beam does batching by default. 
The batches are called bundles. The Flink Runner supports setting the 
bundle size.

Cheers,
Max

On 12.02.19 12:20, Matt Casters wrote:
> Yes, Flink is obviously the next target.  I'm not expecting too many 
> issues there beyond getting a cluster set up to test on.  I read you can 
> run the Flink Runner locally so that will help a lot in testing.
> 
> However, I'm first trying to solve the complicated issue of grouping 
> records together in Beam in a safe way so that they can batched up.
> Batching up is really important for fast loading into a lot of output 
> targets.  I'll probably use some group by behind the scenes or something 
> like that, need to think about that.
> Having the ability to re-use the existing Kettle steps without having to 
> write new code is really key.
> 
> Once that is done (in a few weeks) I'll give Flink a shot.
> 
> Cheers,
> 
> Matt
> 
> Op di 12 feb. 2019 om 12:02 schreef Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>>:
> 
>     @Dan: Thanks for sharing the presentation. Kettle is a great way to
>     make
>     Beam more accessible.
> 
>     @Matt: Thanks for the plug. It's good to hear you enjoyed it. I think
>     the link to your slides got messed up: http://beam.kettle.be
> 
>     Are you planning to add execution via the Flink Runner to Kettle?
>     Saw in
>     the presentation that you already support Direct, Spark, and Dataflow.
> 
>     On 11.02.19 20:50, Matt Casters wrote:
>      > By the way, Maximilian, I linked and plugged your wonderful FOSDEM
>      > presentation in my slides http://beam kettle.be
>     <http://kettle.be> <http://kettle.be> slide
>      > 19. If you mind, let me know and I'll get it out of the slides.
>     In any
>      > case, great content worth promoting I thought.
>      >
>      > Op wo 6 feb. 2019 18:03 schreef Maximilian Michels
>     <mxm@apache.org <ma...@apache.org>
>      > <mailto:mxm@apache.org <ma...@apache.org>>:
>      >
>      >     Hi Dan,
>      >
>      >     Thanks for the info. Would be great to share a video of the
>      >     presentation.
>      >
>      >     Cheers,
>      >     Max
>      >
>      >     On 30.01.19 10:00, Dan wrote:
>      >      > Hi, in just over a week you're all welcome to come and see
>     the very
>      >      > first public reveal of Kettle running on beam! (Including
>     spark,
>      >      > dataflow and flink support)
>      >      >
>      >      >
>     https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
>      >      >
>      >      > So this ingenious integration combines the power of visual
>      >     development,
>      >      > with the platform agnostic benefits of beam - impressive
>     stuff. No
>      >      > vendor lock-in here!
>      >      >
>      >      >
>      >      > See you there!
>      >      > Dan
>      >
> 

Re: Visual Beam - First demonstration - London

Posted by Matt Casters <ma...@gmail.com>.
Yes, Flink is obviously the next target.  I'm not expecting too many issues
there beyond getting a cluster set up to test on.  I read you can run the
Flink Runner locally so that will help a lot in testing.

However, I'm first trying to solve the complicated issue of grouping
records together in Beam in a safe way so that they can batched up.
Batching up is really important for fast loading into a lot of output
targets.  I'll probably use some group by behind the scenes or something
like that, need to think about that.
Having the ability to re-use the existing Kettle steps without having to
write new code is really key.

Once that is done (in a few weeks) I'll give Flink a shot.

Cheers,

Matt

Op di 12 feb. 2019 om 12:02 schreef Maximilian Michels <mx...@apache.org>:

> @Dan: Thanks for sharing the presentation. Kettle is a great way to make
> Beam more accessible.
>
> @Matt: Thanks for the plug. It's good to hear you enjoyed it. I think
> the link to your slides got messed up: http://beam.kettle.be
>
> Are you planning to add execution via the Flink Runner to Kettle? Saw in
> the presentation that you already support Direct, Spark, and Dataflow.
>
> On 11.02.19 20:50, Matt Casters wrote:
> > By the way, Maximilian, I linked and plugged your wonderful FOSDEM
> > presentation in my slides http://beam kettle.be <http://kettle.be>
> slide
> > 19. If you mind, let me know and I'll get it out of the slides. In any
> > case, great content worth promoting I thought.
> >
> > Op wo 6 feb. 2019 18:03 schreef Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>:
> >
> >     Hi Dan,
> >
> >     Thanks for the info. Would be great to share a video of the
> >     presentation.
> >
> >     Cheers,
> >     Max
> >
> >     On 30.01.19 10:00, Dan wrote:
> >      > Hi, in just over a week you're all welcome to come and see the
> very
> >      > first public reveal of Kettle running on beam! (Including spark,
> >      > dataflow and flink support)
> >      >
> >      >
> https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
> >      >
> >      > So this ingenious integration combines the power of visual
> >     development,
> >      > with the platform agnostic benefits of beam - impressive stuff. No
> >      > vendor lock-in here!
> >      >
> >      >
> >      > See you there!
> >      > Dan
> >
>

Re: Visual Beam - First demonstration - London

Posted by Maximilian Michels <mx...@apache.org>.
@Dan: Thanks for sharing the presentation. Kettle is a great way to make 
Beam more accessible.

@Matt: Thanks for the plug. It's good to hear you enjoyed it. I think 
the link to your slides got messed up: http://beam.kettle.be

Are you planning to add execution via the Flink Runner to Kettle? Saw in 
the presentation that you already support Direct, Spark, and Dataflow.

On 11.02.19 20:50, Matt Casters wrote:
> By the way, Maximilian, I linked and plugged your wonderful FOSDEM 
> presentation in my slides http://beam kettle.be <http://kettle.be> slide 
> 19. If you mind, let me know and I'll get it out of the slides. In any 
> case, great content worth promoting I thought.
> 
> Op wo 6 feb. 2019 18:03 schreef Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>:
> 
>     Hi Dan,
> 
>     Thanks for the info. Would be great to share a video of the
>     presentation.
> 
>     Cheers,
>     Max
> 
>     On 30.01.19 10:00, Dan wrote:
>      > Hi, in just over a week you're all welcome to come and see the very
>      > first public reveal of Kettle running on beam! (Including spark,
>      > dataflow and flink support)
>      >
>      > https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
>      >
>      > So this ingenious integration combines the power of visual
>     development,
>      > with the platform agnostic benefits of beam - impressive stuff. No
>      > vendor lock-in here!
>      >
>      >
>      > See you there!
>      > Dan
> 

Re: Visual Beam - First demonstration - London

Posted by Matt Casters <ma...@gmail.com>.
By the way, Maximilian, I linked and plugged your wonderful FOSDEM
presentation in my slides http://beam kettle.be slide 19. If you mind, let
me know and I'll get it out of the slides. In any case, great content worth
promoting I thought.

Op wo 6 feb. 2019 18:03 schreef Maximilian Michels <mxm@apache.org:

> Hi Dan,
>
> Thanks for the info. Would be great to share a video of the presentation.
>
> Cheers,
> Max
>
> On 30.01.19 10:00, Dan wrote:
> > Hi, in just over a week you're all welcome to come and see the very
> > first public reveal of Kettle running on beam! (Including spark,
> > dataflow and flink support)
> >
> > https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
> >
> > So this ingenious integration combines the power of visual development,
> > with the platform agnostic benefits of beam - impressive stuff. No
> > vendor lock-in here!
> >
> >
> > See you there!
> > Dan
>

Re: Visual Beam - First demonstration - London

Posted by Maximilian Michels <mx...@apache.org>.
Hi Dan,

Thanks for the info. Would be great to share a video of the presentation.

Cheers,
Max

On 30.01.19 10:00, Dan wrote:
> Hi, in just over a week you're all welcome to come and see the very 
> first public reveal of Kettle running on beam! (Including spark, 
> dataflow and flink support)
> 
> https://www.meetup.com/Pentaho-London-User-Group/events/256773962/
> 
> So this ingenious integration combines the power of visual development, 
> with the platform agnostic benefits of beam - impressive stuff. No 
> vendor lock-in here!
> 
> 
> See you there!
> Dan