You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ahmet Altay <al...@google.com> on 2019/05/30 00:55:43 UTC

[DISCUSS] Cookbooks for users with knowledge in other frameworks

Hi all,

Inspired by the user asking about a Spark feature in Beam [1] in the
release thread, I searched the user@ list and noticed a few instances of
people asking for question like "I can do X in Spark, how can I do that in
Beam?" Would it make sense to add documentation to explain how certain
tasks that can be accomplished in Beam with side by side examples of doing
the same task in Beam/Spark etc. It could help with on-boarding because it
will be easier for people to leverage their existing knowledge. It could
also help other frameworks as well, because it will serve as a Rosetta
stone with two translations.

Questions I have are:
- Would such a thing be a helpful?
- Is it feasible? Would a few pages worth of examples can cover enough use
cases?

Thank you!
Ahmet

[1]
https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Alexey Romanenko <ar...@gmail.com>.
+1, sounds good for me!

A while ago, I also was thinking in similar way that, potentially, some kind of FAQ/Cookbook would be helpful for users since I see from time to time the similar questions on user@, Slack and SO.
Later, we could extend it wider and add the examples/solutions for different topics as well. 
So, I’m in =) and can help with Spark examples. 

> On 30 May 2019, at 03:26, Reza Rokni <re...@google.com> wrote:
> 
> +1
> 
> I think there will be at least two layers of this;
> 
> Layer 1 - Using primitives : I do join, GBK, Aggregation... with system x this way, what is the canonical equivalent in Beam.   
> Layer 2 - Patterns : I read and join Unbounded and Bounded Data in system x this way, what is the canonical equivalent in Beam.   
> 
> I suspect as a first pass Layer 1 is reasonably well bounded work, there would need to be agreement on "canonical" version of how to do something in Beam as this could be seen to be opinionated. As there are often a multitude of ways of doing x.... 
> 
> 
> On Thu, 30 May 2019 at 08:56, Ahmet Altay <altay@google.com <ma...@google.com>> wrote:
> Hi all,
> 
> Inspired by the user asking about a Spark feature in Beam [1] in the release thread, I searched the user@ list and noticed a few instances of people asking for question like "I can do X in Spark, how can I do that in Beam?" Would it make sense to add documentation to explain how certain tasks that can be accomplished in Beam with side by side examples of doing the same task in Beam/Spark etc. It could help with on-boarding because it will be easier for people to leverage their existing knowledge. It could also help other frameworks as well, because it will serve as a Rosetta stone with two translations.
> 
> Questions I have are:
> - Would such a thing be a helpful?
> - Is it feasible? Would a few pages worth of examples can cover enough use cases?
> 
> Thank you!
> Ahmet
> 
> [1] https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E <https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E>
> 
> -- 
> This email may be confidential and privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. 
> The above terms reflect a potential business arrangement, are provided solely as a basis for further discussion, and are not intended to be and do not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is executed in writing by all parties involved.


Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Maximilian Michels <mx...@apache.org>.
Sounds like a good idea. I think the same can be done for Flink; Flink's and Spark's APIs are similar to a large degree.

Here also a link to the transforms: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/

-Max

On 04.06.19 03:20, Ahmet Altay wrote:
> Thank you for the feedback so far. It seems like this will be generally
> helpful :)
>
> I guess next step would be, would anyone be interested in working in
> this area? We can potentially break this down into starter tasks.
>
> On Sat, Jun 1, 2019 at 7:00 PM Ankur Goenka <goenka@google.com
> <ma...@google.com>> wrote:
>
>     +1 for the proposal.
>     Compatibility Matrix
>     <https://beam.apache.org/documentation/runners/capability-matrix/> can
>     be a good place to show case parity between different runners.
>
>
> +1
>
>     Do you think we should write 2 way examples [Spark, Flink, ..]<=>Beam?
>
>
> Both ways, would be most useful I believe.
>
>
>
>
>     On Sat, Jun 1, 2019 at 4:31 PM Reza Rokni <rez@google.com
>     <ma...@google.com>> wrote:
>
>         For layer 1, what about working through this link as a starting
>         point :
>         https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations?
>
>
> +1
>
>
>         On Sat, 1 Jun 2019 at 09:21, Ahmet Altay <altay@google.com
>         <ma...@google.com>> wrote:
>
>             Thank you Reza. That separation makes sense to me.
>
>             On Wed, May 29, 2019 at 6:26 PM Reza Rokni <rez@google.com
>             <ma...@google.com>> wrote:
>
>                 +1
>
>                 I think there will be at least two layers of this;
>
>                 Layer 1 - Using primitives : I do join, GBK,
>                 Aggregation... with system x this way, what is the
>                 canonical equivalent in Beam.
>                 Layer 2 - Patterns : I read and join Unbounded and
>                 Bounded Data in system x this way, what is the canonical
>                 equivalent in Beam.
>
>                 I suspect as a first pass Layer 1 is reasonably well
>                 bounded work, there would need to be agreement on
>                 "canonical" version of how to do something in Beam as
>                 this could be seen to be opinionated. As there are often
>                 a multitude of ways of doing x....
>
>
>             Once we identify a set of layer 1 items, we could crowd
>             source the canonical implementations. I believe we can use
>             our usual code review process to settle on a version that is
>             agreeable. (Examples have the same issue, they are
>             probably opinionated today based on the author but it works
>             out.)
>
>
>
>                 On Thu, 30 May 2019 at 08:56, Ahmet Altay
>                 <altay@google.com <ma...@google.com>> wrote:
>
>                     Hi all,
>
>                     Inspired by the user asking about a Spark feature in
>                     Beam [1] in the release thread, I searched the user@
>                     list and noticed a few instances of people asking
>                     for question like "I can do X in Spark, how can I do
>                     that in Beam?" Would it make sense to add
>                     documentation to explain how certain tasks that can
>                     be accomplished in Beam with side by side examples
>                     of doing the same task in Beam/Spark etc. It could
>                     help with on-boarding because it will be easier for
>                     people to leverage their existing knowledge. It
>                     could also help other frameworks as well, because it
>                     will serve as a Rosetta stone with two translations.
>
>                     Questions I have are:
>                     - Would such a thing be a helpful?
>                     - Is it feasible? Would a few pages worth of
>                     examples can cover enough use cases?
>
>                     Thank you!
>                     Ahmet
>
>                     [1]
>                     https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>
>
>
>                 --
>
>                 This email may be confidential and privileged. If you
>                 received this communication by mistake, please don't
>                 forward it to anyone else, please erase all copies and
>                 attachments, and please let me know that it has gone to
>                 the wrong person.
>
>                 The above terms reflect a potential business
>                 arrangement, are provided solely as a basis for further
>                 discussion, and are not intended to be and do not
>                 constitute a legally binding obligation. No legally
>                 binding obligations will be created, implied, or
>                 inferred until an agreement in final form is executed in
>                 writing by all parties involved.
>
>
>
>         --
>
>         This email may be confidential and privileged. If you received
>         this communication by mistake, please don't forward it to anyone
>         else, please erase all copies and attachments, and please let me
>         know that it has gone to the wrong person.
>
>         The above terms reflect a potential business arrangement, are
>         provided solely as a basis for further discussion, and are not
>         intended to be and do not constitute a legally binding
>         obligation. No legally binding obligations will be created,
>         implied, or inferred until an agreement in final form is
>         executed in writing by all parties involved.
>


Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Ahmet Altay <al...@google.com>.
Thank you for the feedback so far. It seems like this will be generally
helpful :)

I guess next step would be, would anyone be interested in working in this
area? We can potentially break this down into starter tasks.

On Sat, Jun 1, 2019 at 7:00 PM Ankur Goenka <go...@google.com> wrote:

> +1 for the proposal.
> Compatibility Matrix
> <https://beam.apache.org/documentation/runners/capability-matrix/> can be
> a good place to show case parity between different runners.
>

+1


> Do you think we should write 2 way examples [Spark, Flink, ..]<=>Beam?
>

Both ways, would be most useful I believe.


>
>
>
> On Sat, Jun 1, 2019 at 4:31 PM Reza Rokni <re...@google.com> wrote:
>
>> For layer 1, what about working through this link as a starting point :
>> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations
>> ?
>>
>
+1


>
>> On Sat, 1 Jun 2019 at 09:21, Ahmet Altay <al...@google.com> wrote:
>>
>>> Thank you Reza. That separation makes sense to me.
>>>
>>> On Wed, May 29, 2019 at 6:26 PM Reza Rokni <re...@google.com> wrote:
>>>
>>>> +1
>>>>
>>>> I think there will be at least two layers of this;
>>>>
>>>> Layer 1 - Using primitives : I do join, GBK, Aggregation... with system
>>>> x this way, what is the canonical equivalent in Beam.
>>>> Layer 2 - Patterns : I read and join Unbounded and Bounded Data in
>>>> system x this way, what is the canonical equivalent in Beam.
>>>>
>>>> I suspect as a first pass Layer 1 is reasonably well bounded work,
>>>> there would need to be agreement on "canonical" version of how to do
>>>> something in Beam as this could be seen to be opinionated. As there are
>>>> often a multitude of ways of doing x....
>>>>
>>>
>>> Once we identify a set of layer 1 items, we could crowd source the
>>> canonical implementations. I believe we can use our usual code review
>>> process to settle on a version that is agreeable. (Examples have the same
>>> issue, they are probably opinionated today based on the author but it works
>>> out.)
>>>
>>>
>>>>
>>>>
>>>> On Thu, 30 May 2019 at 08:56, Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Inspired by the user asking about a Spark feature in Beam [1] in the
>>>>> release thread, I searched the user@ list and noticed a few instances
>>>>> of people asking for question like "I can do X in Spark, how can I do that
>>>>> in Beam?" Would it make sense to add documentation to explain how certain
>>>>> tasks that can be accomplished in Beam with side by side examples of doing
>>>>> the same task in Beam/Spark etc. It could help with on-boarding because it
>>>>> will be easier for people to leverage their existing knowledge. It could
>>>>> also help other frameworks as well, because it will serve as a Rosetta
>>>>> stone with two translations.
>>>>>
>>>>> Questions I have are:
>>>>> - Would such a thing be a helpful?
>>>>> - Is it feasible? Would a few pages worth of examples can cover enough
>>>>> use cases?
>>>>>
>>>>> Thank you!
>>>>> Ahmet
>>>>>
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> This email may be confidential and privileged. If you received this
>>>> communication by mistake, please don't forward it to anyone else, please
>>>> erase all copies and attachments, and please let me know that it has gone
>>>> to the wrong person.
>>>>
>>>> The above terms reflect a potential business arrangement, are provided
>>>> solely as a basis for further discussion, and are not intended to be and do
>>>> not constitute a legally binding obligation. No legally binding obligations
>>>> will be created, implied, or inferred until an agreement in final form is
>>>> executed in writing by all parties involved.
>>>>
>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Ankur Goenka <go...@google.com>.
+1 for the proposal.
Compatibility Matrix
<https://beam.apache.org/documentation/runners/capability-matrix/> can be a
good place to show case parity between different runners.
Do you think we should write 2 way examples [Spark, Flink, ..]<=>Beam?



On Sat, Jun 1, 2019 at 4:31 PM Reza Rokni <re...@google.com> wrote:

> For layer 1, what about working through this link as a starting point :
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations
> ?
>
> On Sat, 1 Jun 2019 at 09:21, Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Reza. That separation makes sense to me.
>>
>> On Wed, May 29, 2019 at 6:26 PM Reza Rokni <re...@google.com> wrote:
>>
>>> +1
>>>
>>> I think there will be at least two layers of this;
>>>
>>> Layer 1 - Using primitives : I do join, GBK, Aggregation... with system
>>> x this way, what is the canonical equivalent in Beam.
>>> Layer 2 - Patterns : I read and join Unbounded and Bounded Data in
>>> system x this way, what is the canonical equivalent in Beam.
>>>
>>> I suspect as a first pass Layer 1 is reasonably well bounded work, there
>>> would need to be agreement on "canonical" version of how to do something in
>>> Beam as this could be seen to be opinionated. As there are often a
>>> multitude of ways of doing x....
>>>
>>
>> Once we identify a set of layer 1 items, we could crowd source the
>> canonical implementations. I believe we can use our usual code review
>> process to settle on a version that is agreeable. (Examples have the same
>> issue, they are probably opinionated today based on the author but it works
>> out.)
>>
>>
>>>
>>>
>>> On Thu, 30 May 2019 at 08:56, Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Inspired by the user asking about a Spark feature in Beam [1] in the
>>>> release thread, I searched the user@ list and noticed a few instances
>>>> of people asking for question like "I can do X in Spark, how can I do that
>>>> in Beam?" Would it make sense to add documentation to explain how certain
>>>> tasks that can be accomplished in Beam with side by side examples of doing
>>>> the same task in Beam/Spark etc. It could help with on-boarding because it
>>>> will be easier for people to leverage their existing knowledge. It could
>>>> also help other frameworks as well, because it will serve as a Rosetta
>>>> stone with two translations.
>>>>
>>>> Questions I have are:
>>>> - Would such a thing be a helpful?
>>>> - Is it feasible? Would a few pages worth of examples can cover enough
>>>> use cases?
>>>>
>>>> Thank you!
>>>> Ahmet
>>>>
>>>> [1]
>>>> https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>>>>
>>>
>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and do
>>> not constitute a legally binding obligation. No legally binding obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Reza Rokni <re...@google.com>.
For layer 1, what about working through this link as a starting point :
https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations
?

On Sat, 1 Jun 2019 at 09:21, Ahmet Altay <al...@google.com> wrote:

> Thank you Reza. That separation makes sense to me.
>
> On Wed, May 29, 2019 at 6:26 PM Reza Rokni <re...@google.com> wrote:
>
>> +1
>>
>> I think there will be at least two layers of this;
>>
>> Layer 1 - Using primitives : I do join, GBK, Aggregation... with system x
>> this way, what is the canonical equivalent in Beam.
>> Layer 2 - Patterns : I read and join Unbounded and Bounded Data in system
>> x this way, what is the canonical equivalent in Beam.
>>
>> I suspect as a first pass Layer 1 is reasonably well bounded work, there
>> would need to be agreement on "canonical" version of how to do something in
>> Beam as this could be seen to be opinionated. As there are often a
>> multitude of ways of doing x....
>>
>
> Once we identify a set of layer 1 items, we could crowd source the
> canonical implementations. I believe we can use our usual code review
> process to settle on a version that is agreeable. (Examples have the same
> issue, they are probably opinionated today based on the author but it works
> out.)
>
>
>>
>>
>> On Thu, 30 May 2019 at 08:56, Ahmet Altay <al...@google.com> wrote:
>>
>>> Hi all,
>>>
>>> Inspired by the user asking about a Spark feature in Beam [1] in the
>>> release thread, I searched the user@ list and noticed a few instances
>>> of people asking for question like "I can do X in Spark, how can I do that
>>> in Beam?" Would it make sense to add documentation to explain how certain
>>> tasks that can be accomplished in Beam with side by side examples of doing
>>> the same task in Beam/Spark etc. It could help with on-boarding because it
>>> will be easier for people to leverage their existing knowledge. It could
>>> also help other frameworks as well, because it will serve as a Rosetta
>>> stone with two translations.
>>>
>>> Questions I have are:
>>> - Would such a thing be a helpful?
>>> - Is it feasible? Would a few pages worth of examples can cover enough
>>> use cases?
>>>
>>> Thank you!
>>> Ahmet
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>>>
>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Ahmet Altay <al...@google.com>.
Thank you Reza. That separation makes sense to me.

On Wed, May 29, 2019 at 6:26 PM Reza Rokni <re...@google.com> wrote:

> +1
>
> I think there will be at least two layers of this;
>
> Layer 1 - Using primitives : I do join, GBK, Aggregation... with system x
> this way, what is the canonical equivalent in Beam.
> Layer 2 - Patterns : I read and join Unbounded and Bounded Data in system
> x this way, what is the canonical equivalent in Beam.
>
> I suspect as a first pass Layer 1 is reasonably well bounded work, there
> would need to be agreement on "canonical" version of how to do something in
> Beam as this could be seen to be opinionated. As there are often a
> multitude of ways of doing x....
>

Once we identify a set of layer 1 items, we could crowd source the
canonical implementations. I believe we can use our usual code review
process to settle on a version that is agreeable. (Examples have the same
issue, they are probably opinionated today based on the author but it works
out.)


>
>
> On Thu, 30 May 2019 at 08:56, Ahmet Altay <al...@google.com> wrote:
>
>> Hi all,
>>
>> Inspired by the user asking about a Spark feature in Beam [1] in the
>> release thread, I searched the user@ list and noticed a few instances of
>> people asking for question like "I can do X in Spark, how can I do that in
>> Beam?" Would it make sense to add documentation to explain how certain
>> tasks that can be accomplished in Beam with side by side examples of doing
>> the same task in Beam/Spark etc. It could help with on-boarding because it
>> will be easier for people to leverage their existing knowledge. It could
>> also help other frameworks as well, because it will serve as a Rosetta
>> stone with two translations.
>>
>> Questions I have are:
>> - Would such a thing be a helpful?
>> - Is it feasible? Would a few pages worth of examples can cover enough
>> use cases?
>>
>> Thank you!
>> Ahmet
>>
>> [1]
>> https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>>
>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Posted by Reza Rokni <re...@google.com>.
+1

I think there will be at least two layers of this;

Layer 1 - Using primitives : I do join, GBK, Aggregation... with system x
this way, what is the canonical equivalent in Beam.
Layer 2 - Patterns : I read and join Unbounded and Bounded Data in system x
this way, what is the canonical equivalent in Beam.

I suspect as a first pass Layer 1 is reasonably well bounded work, there
would need to be agreement on "canonical" version of how to do something in
Beam as this could be seen to be opinionated. As there are often a
multitude of ways of doing x....


On Thu, 30 May 2019 at 08:56, Ahmet Altay <al...@google.com> wrote:

> Hi all,
>
> Inspired by the user asking about a Spark feature in Beam [1] in the
> release thread, I searched the user@ list and noticed a few instances of
> people asking for question like "I can do X in Spark, how can I do that in
> Beam?" Would it make sense to add documentation to explain how certain
> tasks that can be accomplished in Beam with side by side examples of doing
> the same task in Beam/Spark etc. It could help with on-boarding because it
> will be easier for people to leverage their existing knowledge. It could
> also help other frameworks as well, because it will serve as a Rosetta
> stone with two translations.
>
> Questions I have are:
> - Would such a thing be a helpful?
> - Is it feasible? Would a few pages worth of examples can cover enough use
> cases?
>
> Thank you!
> Ahmet
>
> [1]
> https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>


-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.