You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pablo Estrada <pa...@google.com> on 2019/11/08 03:52:16 UTC

[discuss] More dimensions for the Capability Matrix

Hi all,
I think this is a relatively common question:

- Can I do X with runner Y, and SDK Z?

The answers vary significantly between SDK and Runner pairs. This makes it
such that the current Capability Matrix falls somewhat short when potential
users / solutions architects / etc are trying to decide to adopt Beam, and
which Runner / SDK to use.

I think we need to put some effort in building a capability matrix that
expresses this information - and maintain it updated.

I would like to discuss a few things:
- Does it make sense to do this?
- If it does, what's a good way of doing it? Should we expand the existing
Capability Matrix to support SDKs as well? Or should we have a new one?
- Any other thoughts you may have about the idea.

Best
-P.

Re: [discuss] More dimensions for the Capability Matrix

Posted by Robert Bradshaw <ro...@google.com>.
On Fri, Nov 8, 2019 at 9:46 AM Brian Hulette <bh...@google.com> wrote:
>
> > Does it make sense to do this?
> I think this makes a lot of sense. Plus it's a good opportunity to refresh the UX of [1].
>
> > what's a good way of doing it? Should we expand the existing Capability Matrix to support SDKs as well? Or should we have a new one?
> To me there are two aspects to this: how we model the data, and how we present the data.
>
> For modelling the data:
> Do we need to maintain the full 3-dimensional <feature - SDK - runner> matrix? That seems untenable to me. With portability, I think the runner and SDK matrix should be completely independent, so it should be safe to just maintain <feature - SDK>, and <feature - runner> matrices and model the 3-dimensional matrix as the cross-product of the two.
> Maybe we should have a new capability matrix just for portable runners so we can exploit this property?

Yes, being able to do that is the crux of the portability work. We may
have to consider, say, "Portable Spark" and "Non-Portable Spark" to be
two separate runners and have the caveat that some runners (namely the
non-portable ones) do not work with all SDKs.

Another thing I'd really, really like to see is these matrices
automatically populated via validates runner test attributes. E.g. you
can pick a runner, run the validates runner test suite, and see what
is fully/partially/not at all supported. This is harder to do for
SDKs, but at least you could get some signal by looking for the
existence of (passing) tests.

> For presenting the data:
> I think there would be value in just presenting <feature - runner> (basically what we have now in [1]), and also presenting <feature - SDK> separately. The <feature - SDK> display could serve as documentation too, with examples of how to do Y in each SDK.
> Maybe there would also be value in presenting <feature - SDK - runner> in some fancy UI so an architect can quickly answer "what can I do with SDK Z on Runner X", but I'm not sure what that would look like.

I think two tables are fine. Note that with cross-language, the
restrictions of an SDK become less of an issue. One could imagine UIs
that would let you select a (set of?) SDKs and runners and
automatically populates the matrix according to the intersection.

> [1] https://beam.apache.org/documentation/runners/capability-matrix/
>
> On Thu, Nov 7, 2019 at 10:09 PM Thomas Weise <th...@apache.org> wrote:
>>
>> FWIW there are currently at least 2 instances of capability matrix [1] [2].
>>
>> [1] has been in need of a refresh for a while.
>>
>> [2] is more useful but only covers portable runners and is hard to find.
>>
>> Thomas
>>
>> [1] https://beam.apache.org/documentation/runners/capability-matrix/
>> [2] https://s.apache.org/apache-beam-portability-support-table
>>
>> On Thu, Nov 7, 2019 at 7:52 PM Pablo Estrada <pa...@google.com> wrote:
>>>
>>> Hi all,
>>> I think this is a relatively common question:
>>>
>>> - Can I do X with runner Y, and SDK Z?
>>>
>>> The answers vary significantly between SDK and Runner pairs. This makes it such that the current Capability Matrix falls somewhat short when potential users / solutions architects / etc are trying to decide to adopt Beam, and which Runner / SDK to use.
>>>
>>> I think we need to put some effort in building a capability matrix that expresses this information - and maintain it updated.
>>>
>>> I would like to discuss a few things:
>>> - Does it make sense to do this?
>>> - If it does, what's a good way of doing it? Should we expand the existing Capability Matrix to support SDKs as well? Or should we have a new one?
>>> - Any other thoughts you may have about the idea.
>>>
>>> Best
>>> -P.

Re: [discuss] More dimensions for the Capability Matrix

Posted by Valentyn Tymofieiev <va...@google.com>.
+1. I think we should also better reflect connector capabilities (or
include them into features), to avoid surprises like [1].

[1]
https://lists.apache.org/thread.html/9e9270bfb85058e24b762790e948d8bfc558f58ef1df9e14c4e4464c@%3Cuser.beam.apache.org%3E

On Fri, Nov 8, 2019 at 10:51 AM Kenneth Knowles <ke...@apache.org> wrote:

>
> On Fri, Nov 8, 2019 at 9:46 AM Brian Hulette <bh...@google.com> wrote:
>
>> > Does it make sense to do this?
>> I think this makes a lot of sense. Plus it's a good opportunity to
>> refresh the UX of [1].
>>
>
> +1 to total UX refresh. I will advertise
> https://issues.apache.org/jira/browse/BEAM-2888 which has a lot of
> related ideas.
>
>
>> > what's a good way of doing it? Should we expand the existing Capability
>> Matrix to support SDKs as well? Or should we have a new one?
>> To me there are two aspects to this: how we model the data, and how we
>> present the data.
>>
>> For modelling the data:
>> Do we need to maintain the full 3-dimensional <feature - SDK - runner>
>> matrix? That seems untenable to me. With portability, I think the runner
>> and SDK matrix should be completely independent, so it should be safe to
>> just maintain <feature - SDK>, and <feature - runner> matrices and model
>> the 3-dimensional matrix as the cross-product of the two.
>> Maybe we should have a new capability matrix just for portable runners so
>> we can exploit this property?
>>
>
> Agree that we should not do the full product of <# runners> times <#
> SDKs>. That's the whole point of portability.
>
> Early in the project, we deliberately did not include SDKs in the
> capability matrix for philosophical reasons:
>
>  - a runner might support/not support features based on intrinsic
> properties of the underlying engine, not just immaturity
>  - an SDK can have no such intrinsic reason
>
> In practice, though, the matrix is more about maturity than intrinsic
> properties. And SDKs include now a significant runtime component which will
> always take time to implement new model features. I think we should embrace
> the matrix as primarily a measure of maturity and answer Pablo's initial
> question which is most useful for users.
>
> Kenn
>
>
>> For presenting the data:
>> I think there would be value in just presenting <feature - runner>
>> (basically what we have now in [1]), and also presenting <feature - SDK>
>> separately. The <feature - SDK> display could serve as documentation too,
>> with examples of how to do Y in each SDK.
>> Maybe there would also be value in presenting <feature - SDK - runner> in
>> some fancy UI so an architect can quickly answer "what can I do with SDK Z
>> on Runner X", but I'm not sure what that would look like.
>>
>> [1] https://beam.apache.org/documentation/runners/capability-matrix/
>>
>> On Thu, Nov 7, 2019 at 10:09 PM Thomas Weise <th...@apache.org> wrote:
>>
>>> FWIW there are currently at least 2 instances of capability matrix [1]
>>> [2].
>>>
>>> [1] has been in need of a refresh for a while.
>>>
>>> [2] is more useful but only covers portable runners and is hard to find.
>>>
>>> Thomas
>>>
>>> [1] https://beam.apache.org/documentation/runners/capability-matrix/
>>> [2] https://s.apache.org/apache-beam-portability-support-table
>>>
>>> On Thu, Nov 7, 2019 at 7:52 PM Pablo Estrada <pa...@google.com> wrote:
>>>
>>>> Hi all,
>>>> I think this is a relatively common question:
>>>>
>>>> - Can I do X with runner Y, and SDK Z?
>>>>
>>>> The answers vary significantly between SDK and Runner pairs. This makes
>>>> it such that the current Capability Matrix falls somewhat short when
>>>> potential users / solutions architects / etc are trying to decide to adopt
>>>> Beam, and which Runner / SDK to use.
>>>>
>>>> I think we need to put some effort in building a capability matrix that
>>>> expresses this information - and maintain it updated.
>>>>
>>>> I would like to discuss a few things:
>>>> - Does it make sense to do this?
>>>> - If it does, what's a good way of doing it? Should we expand the
>>>> existing Capability Matrix to support SDKs as well? Or should we have a new
>>>> one?
>>>> - Any other thoughts you may have about the idea.
>>>>
>>>> Best
>>>> -P.
>>>>
>>>

Re: [discuss] More dimensions for the Capability Matrix

Posted by Kenneth Knowles <ke...@apache.org>.
On Fri, Nov 8, 2019 at 9:46 AM Brian Hulette <bh...@google.com> wrote:

> > Does it make sense to do this?
> I think this makes a lot of sense. Plus it's a good opportunity to refresh
> the UX of [1].
>

+1 to total UX refresh. I will advertise
https://issues.apache.org/jira/browse/BEAM-2888 which has a lot of related
ideas.


> > what's a good way of doing it? Should we expand the existing Capability
> Matrix to support SDKs as well? Or should we have a new one?
> To me there are two aspects to this: how we model the data, and how we
> present the data.
>
> For modelling the data:
> Do we need to maintain the full 3-dimensional <feature - SDK - runner>
> matrix? That seems untenable to me. With portability, I think the runner
> and SDK matrix should be completely independent, so it should be safe to
> just maintain <feature - SDK>, and <feature - runner> matrices and model
> the 3-dimensional matrix as the cross-product of the two.
> Maybe we should have a new capability matrix just for portable runners so
> we can exploit this property?
>

Agree that we should not do the full product of <# runners> times <# SDKs>.
That's the whole point of portability.

Early in the project, we deliberately did not include SDKs in the
capability matrix for philosophical reasons:

 - a runner might support/not support features based on intrinsic
properties of the underlying engine, not just immaturity
 - an SDK can have no such intrinsic reason

In practice, though, the matrix is more about maturity than intrinsic
properties. And SDKs include now a significant runtime component which will
always take time to implement new model features. I think we should embrace
the matrix as primarily a measure of maturity and answer Pablo's initial
question which is most useful for users.

Kenn


> For presenting the data:
> I think there would be value in just presenting <feature - runner>
> (basically what we have now in [1]), and also presenting <feature - SDK>
> separately. The <feature - SDK> display could serve as documentation too,
> with examples of how to do Y in each SDK.
> Maybe there would also be value in presenting <feature - SDK - runner> in
> some fancy UI so an architect can quickly answer "what can I do with SDK Z
> on Runner X", but I'm not sure what that would look like.
>
> [1] https://beam.apache.org/documentation/runners/capability-matrix/
>
> On Thu, Nov 7, 2019 at 10:09 PM Thomas Weise <th...@apache.org> wrote:
>
>> FWIW there are currently at least 2 instances of capability matrix [1]
>> [2].
>>
>> [1] has been in need of a refresh for a while.
>>
>> [2] is more useful but only covers portable runners and is hard to find.
>>
>> Thomas
>>
>> [1] https://beam.apache.org/documentation/runners/capability-matrix/
>> [2] https://s.apache.org/apache-beam-portability-support-table
>>
>> On Thu, Nov 7, 2019 at 7:52 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Hi all,
>>> I think this is a relatively common question:
>>>
>>> - Can I do X with runner Y, and SDK Z?
>>>
>>> The answers vary significantly between SDK and Runner pairs. This makes
>>> it such that the current Capability Matrix falls somewhat short when
>>> potential users / solutions architects / etc are trying to decide to adopt
>>> Beam, and which Runner / SDK to use.
>>>
>>> I think we need to put some effort in building a capability matrix that
>>> expresses this information - and maintain it updated.
>>>
>>> I would like to discuss a few things:
>>> - Does it make sense to do this?
>>> - If it does, what's a good way of doing it? Should we expand the
>>> existing Capability Matrix to support SDKs as well? Or should we have a new
>>> one?
>>> - Any other thoughts you may have about the idea.
>>>
>>> Best
>>> -P.
>>>
>>

Re: [discuss] More dimensions for the Capability Matrix

Posted by Brian Hulette <bh...@google.com>.
> Does it make sense to do this?
I think this makes a lot of sense. Plus it's a good opportunity to refresh
the UX of [1].

> what's a good way of doing it? Should we expand the existing Capability
Matrix to support SDKs as well? Or should we have a new one?
To me there are two aspects to this: how we model the data, and how we
present the data.

For modelling the data:
Do we need to maintain the full 3-dimensional <feature - SDK - runner>
matrix? That seems untenable to me. With portability, I think the runner
and SDK matrix should be completely independent, so it should be safe to
just maintain <feature - SDK>, and <feature - runner> matrices and model
the 3-dimensional matrix as the cross-product of the two.
Maybe we should have a new capability matrix just for portable runners so
we can exploit this property?

For presenting the data:
I think there would be value in just presenting <feature - runner>
(basically what we have now in [1]), and also presenting <feature - SDK>
separately. The <feature - SDK> display could serve as documentation too,
with examples of how to do Y in each SDK.
Maybe there would also be value in presenting <feature - SDK - runner> in
some fancy UI so an architect can quickly answer "what can I do with SDK Z
on Runner X", but I'm not sure what that would look like.

[1] https://beam.apache.org/documentation/runners/capability-matrix/

On Thu, Nov 7, 2019 at 10:09 PM Thomas Weise <th...@apache.org> wrote:

> FWIW there are currently at least 2 instances of capability matrix [1] [2].
>
> [1] has been in need of a refresh for a while.
>
> [2] is more useful but only covers portable runners and is hard to find.
>
> Thomas
>
> [1] https://beam.apache.org/documentation/runners/capability-matrix/
> [2] https://s.apache.org/apache-beam-portability-support-table
>
> On Thu, Nov 7, 2019 at 7:52 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Hi all,
>> I think this is a relatively common question:
>>
>> - Can I do X with runner Y, and SDK Z?
>>
>> The answers vary significantly between SDK and Runner pairs. This makes
>> it such that the current Capability Matrix falls somewhat short when
>> potential users / solutions architects / etc are trying to decide to adopt
>> Beam, and which Runner / SDK to use.
>>
>> I think we need to put some effort in building a capability matrix that
>> expresses this information - and maintain it updated.
>>
>> I would like to discuss a few things:
>> - Does it make sense to do this?
>> - If it does, what's a good way of doing it? Should we expand the
>> existing Capability Matrix to support SDKs as well? Or should we have a new
>> one?
>> - Any other thoughts you may have about the idea.
>>
>> Best
>> -P.
>>
>

Re: [discuss] More dimensions for the Capability Matrix

Posted by Thomas Weise <th...@apache.org>.
FWIW there are currently at least 2 instances of capability matrix [1] [2].

[1] has been in need of a refresh for a while.

[2] is more useful but only covers portable runners and is hard to find.

Thomas

[1] https://beam.apache.org/documentation/runners/capability-matrix/
[2] https://s.apache.org/apache-beam-portability-support-table

On Thu, Nov 7, 2019 at 7:52 PM Pablo Estrada <pa...@google.com> wrote:

> Hi all,
> I think this is a relatively common question:
>
> - Can I do X with runner Y, and SDK Z?
>
> The answers vary significantly between SDK and Runner pairs. This makes it
> such that the current Capability Matrix falls somewhat short when potential
> users / solutions architects / etc are trying to decide to adopt Beam, and
> which Runner / SDK to use.
>
> I think we need to put some effort in building a capability matrix that
> expresses this information - and maintain it updated.
>
> I would like to discuss a few things:
> - Does it make sense to do this?
> - If it does, what's a good way of doing it? Should we expand the existing
> Capability Matrix to support SDKs as well? Or should we have a new one?
> - Any other thoughts you may have about the idea.
>
> Best
> -P.
>