You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by "Joshua B. Harrison" <jo...@gmail.com> on 2020/03/30 19:00:44 UTC

Implementing type hints on multi-output PTransforms

Hey all,

I brought up an issue recently on the user forums noting issues around type
hints and multi-output PTransforms:
https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E

As mentioned there, I think that a NotImplementedError should be raised
when attaching type hints to multi-output PTransforms while the correct
implementation is figured out. And that a 'correct' implementation would
look something like the Union typehints that are expected on multi-input
PTransforms.

I am happy to help out and wanted to get the discussion started around what
the community would like to see here. Thank you all for a great product.

Best,
Joshua

-- 
Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
<jo...@google.com> |  404-433-0242

Re: Implementing type hints on multi-output PTransforms

Posted by Udi Meiri <eh...@google.com>.
On Tue, Mar 31, 2020 at 10:16 AM Joshua B. Harrison <jo...@gmail.com>
wrote:

> Ok - that makes sense. My specific workaround was to remove the
> with_output_types for now, so advising the user on this in the error
> message would be nice. I was just worried about silently passing.
>
> As for the formalization:
>
>    1. I am a little confused on how this is different than passing
>    multiple tagged inputs to a PTransform that does a CoGroupBy*. In this
>    case, with_input_types seems to expect a union of all the types for the
>    keyed values. Why would the same not work for output types?
>
> Since the outputs are distinct PCollections (as in the example
in BEAM-4132) they each have their own element type. If one of these
PCollections is then passed as an input to a transform, our type checking
is more precise if we use the type of just that pcoll instead of the union
of all pcoll types returned.

>
>    1. What is the process for proposing a formalized solution? Should I
>    start a document, or does one already exist? Or does this kind of thing get
>    tracked via Jira issues?
>
> In this case I think an email titled "[PROPSAL] ..." to this mailing list
describing what you want to change should be enough. A document could also
work; I'm not aware of one that touches on this.
Your PR would need to have an associated JIRA (feel free to take over any
of the ones I've mentioned).


> Best,
> Joshua
>
> On Tue, Mar 31, 2020 at 11:07 AM Udi Meiri <eh...@google.com> wrote:
>
>> Hi Joshua,
>> I've been working on type hints recently.
>> Your issue is similar to this:
>> https://issues.apache.org/jira/browse/BEAM-8782 (exactly the same if
>> tags are passed to with_outputs() in the example).
>> There's also this related bug about type inference:
>> https://issues.apache.org/jira/browse/BEAM-4132
>>
>> I agree with Luke that it would be helpful to point to a workaround in
>> the error message (such as removing with_output_types).
>>
>> From what I remember, we'll need to formalize how multi-output type hints
>> are provided to Beam.
>> For example, by passing keywords to with_output_types: main=type,
>> TAG=type, etc.
>>
>> On Tue, Mar 31, 2020 at 9:55 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> I can see that argument but what does a user need to do in this case if
>>> we raise NotImplementedError? Would the need to disable type checking
>>> everywhere?
>>>
>>> Over the long term users will need to deal with improvements to type
>>> checking and will need to fix typing errors when they change Apache Beam
>>> versions.
>>>
>>>
>>> On Tue, Mar 31, 2020 at 9:34 AM Joshua B. Harrison <
>>> josh.harrison@gmail.com> wrote:
>>>
>>>> The current code errors out with a cryptic message around tag types in
>>>> the multi-output. Adding a NotImplementedError was just an attempt to make
>>>> the failure reason more clear.
>>>>
>>>> I would be worried about trivially passing because then the user might
>>>> think they have type checking safety when they don't, which could cause
>>>> failures at later stages and might be hard to debug. Do you agree?
>>>>
>>>> Best,
>>>> Joshua
>>>>
>>>> On Tue, Mar 31, 2020 at 10:16 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> Would the NotImplementedError cause users pipeline errors or is that a
>>>>> signal to the type checking mechanism to ignore it?
>>>>> If this would cause failures I would rather make the unsupported case
>>>>> return something that would be trivially true.
>>>>>
>>>>> On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <
>>>>> josh.harrison@gmail.com> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> I brought up an issue recently on the user forums noting issues
>>>>>> around type hints and multi-output PTransforms:
>>>>>> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>>>>>>
>>>>>> As mentioned there, I think that a NotImplementedError should be
>>>>>> raised when attaching type hints to multi-output PTransforms while the
>>>>>> correct implementation is figured out. And that a 'correct' implementation
>>>>>> would look something like the Union typehints that are expected on
>>>>>> multi-input PTransforms.
>>>>>>
>>>>>> I am happy to help out and wanted to get the discussion started
>>>>>> around what the community would like to see here. Thank you all for a great
>>>>>> product.
>>>>>>
>>>>>> Best,
>>>>>> Joshua
>>>>>>
>>>>>> --
>>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>>
>>>
>
> --
> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>

Re: Implementing type hints on multi-output PTransforms

Posted by "Joshua B. Harrison" <jo...@gmail.com>.
Ok - that makes sense. My specific workaround was to remove the
with_output_types for now, so advising the user on this in the error
message would be nice. I was just worried about silently passing.

As for the formalization:

   1. I am a little confused on how this is different than passing multiple
   tagged inputs to a PTransform that does a CoGroupBy*. In this case,
   with_input_types seems to expect a union of all the types for the keyed
   values. Why would the same not work for output types?
   2. What is the process for proposing a formalized solution? Should I
   start a document, or does one already exist? Or does this kind of thing get
   tracked via Jira issues?

Best,
Joshua

On Tue, Mar 31, 2020 at 11:07 AM Udi Meiri <eh...@google.com> wrote:

> Hi Joshua,
> I've been working on type hints recently.
> Your issue is similar to this:
> https://issues.apache.org/jira/browse/BEAM-8782 (exactly the same if tags
> are passed to with_outputs() in the example).
> There's also this related bug about type inference:
> https://issues.apache.org/jira/browse/BEAM-4132
>
> I agree with Luke that it would be helpful to point to a workaround in the
> error message (such as removing with_output_types).
>
> From what I remember, we'll need to formalize how multi-output type hints
> are provided to Beam.
> For example, by passing keywords to with_output_types: main=type,
> TAG=type, etc.
>
> On Tue, Mar 31, 2020 at 9:55 AM Luke Cwik <lc...@google.com> wrote:
>
>> I can see that argument but what does a user need to do in this case if
>> we raise NotImplementedError? Would the need to disable type checking
>> everywhere?
>>
>> Over the long term users will need to deal with improvements to type
>> checking and will need to fix typing errors when they change Apache Beam
>> versions.
>>
>>
>> On Tue, Mar 31, 2020 at 9:34 AM Joshua B. Harrison <
>> josh.harrison@gmail.com> wrote:
>>
>>> The current code errors out with a cryptic message around tag types in
>>> the multi-output. Adding a NotImplementedError was just an attempt to make
>>> the failure reason more clear.
>>>
>>> I would be worried about trivially passing because then the user might
>>> think they have type checking safety when they don't, which could cause
>>> failures at later stages and might be hard to debug. Do you agree?
>>>
>>> Best,
>>> Joshua
>>>
>>> On Tue, Mar 31, 2020 at 10:16 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> Would the NotImplementedError cause users pipeline errors or is that a
>>>> signal to the type checking mechanism to ignore it?
>>>> If this would cause failures I would rather make the unsupported case
>>>> return something that would be trivially true.
>>>>
>>>> On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <
>>>> josh.harrison@gmail.com> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> I brought up an issue recently on the user forums noting issues around
>>>>> type hints and multi-output PTransforms:
>>>>> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>>>>>
>>>>> As mentioned there, I think that a NotImplementedError should be
>>>>> raised when attaching type hints to multi-output PTransforms while the
>>>>> correct implementation is figured out. And that a 'correct' implementation
>>>>> would look something like the Union typehints that are expected on
>>>>> multi-input PTransforms.
>>>>>
>>>>> I am happy to help out and wanted to get the discussion started around
>>>>> what the community would like to see here. Thank you all for a great
>>>>> product.
>>>>>
>>>>> Best,
>>>>> Joshua
>>>>>
>>>>> --
>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>>>
>>>>
>>>
>>> --
>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>
>>

-- 
Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
<jo...@google.com> |  404-433-0242

Re: Implementing type hints on multi-output PTransforms

Posted by Udi Meiri <eh...@google.com>.
Hi Joshua,
I've been working on type hints recently.
Your issue is similar to this:
https://issues.apache.org/jira/browse/BEAM-8782 (exactly the same if tags
are passed to with_outputs() in the example).
There's also this related bug about type inference:
https://issues.apache.org/jira/browse/BEAM-4132

I agree with Luke that it would be helpful to point to a workaround in the
error message (such as removing with_output_types).

From what I remember, we'll need to formalize how multi-output type hints
are provided to Beam.
For example, by passing keywords to with_output_types: main=type, TAG=type,
etc.

On Tue, Mar 31, 2020 at 9:55 AM Luke Cwik <lc...@google.com> wrote:

> I can see that argument but what does a user need to do in this case if we
> raise NotImplementedError? Would the need to disable type checking
> everywhere?
>
> Over the long term users will need to deal with improvements to type
> checking and will need to fix typing errors when they change Apache Beam
> versions.
>
>
> On Tue, Mar 31, 2020 at 9:34 AM Joshua B. Harrison <
> josh.harrison@gmail.com> wrote:
>
>> The current code errors out with a cryptic message around tag types in
>> the multi-output. Adding a NotImplementedError was just an attempt to make
>> the failure reason more clear.
>>
>> I would be worried about trivially passing because then the user might
>> think they have type checking safety when they don't, which could cause
>> failures at later stages and might be hard to debug. Do you agree?
>>
>> Best,
>> Joshua
>>
>> On Tue, Mar 31, 2020 at 10:16 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> Would the NotImplementedError cause users pipeline errors or is that a
>>> signal to the type checking mechanism to ignore it?
>>> If this would cause failures I would rather make the unsupported case
>>> return something that would be trivially true.
>>>
>>> On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <
>>> josh.harrison@gmail.com> wrote:
>>>
>>>> Hey all,
>>>>
>>>> I brought up an issue recently on the user forums noting issues around
>>>> type hints and multi-output PTransforms:
>>>> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>>>>
>>>> As mentioned there, I think that a NotImplementedError should be raised
>>>> when attaching type hints to multi-output PTransforms while the correct
>>>> implementation is figured out. And that a 'correct' implementation would
>>>> look something like the Union typehints that are expected on multi-input
>>>> PTransforms.
>>>>
>>>> I am happy to help out and wanted to get the discussion started around
>>>> what the community would like to see here. Thank you all for a great
>>>> product.
>>>>
>>>> Best,
>>>> Joshua
>>>>
>>>> --
>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>>
>>>
>>
>> --
>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>
>

Re: Implementing type hints on multi-output PTransforms

Posted by Luke Cwik <lc...@google.com>.
I can see that argument but what does a user need to do in this case if we
raise NotImplementedError? Would the need to disable type checking
everywhere?

Over the long term users will need to deal with improvements to type
checking and will need to fix typing errors when they change Apache Beam
versions.


On Tue, Mar 31, 2020 at 9:34 AM Joshua B. Harrison <jo...@gmail.com>
wrote:

> The current code errors out with a cryptic message around tag types in the
> multi-output. Adding a NotImplementedError was just an attempt to make the
> failure reason more clear.
>
> I would be worried about trivially passing because then the user might
> think they have type checking safety when they don't, which could cause
> failures at later stages and might be hard to debug. Do you agree?
>
> Best,
> Joshua
>
> On Tue, Mar 31, 2020 at 10:16 AM Luke Cwik <lc...@google.com> wrote:
>
>> Would the NotImplementedError cause users pipeline errors or is that a
>> signal to the type checking mechanism to ignore it?
>> If this would cause failures I would rather make the unsupported case
>> return something that would be trivially true.
>>
>> On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <
>> josh.harrison@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> I brought up an issue recently on the user forums noting issues around
>>> type hints and multi-output PTransforms:
>>> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>>>
>>> As mentioned there, I think that a NotImplementedError should be raised
>>> when attaching type hints to multi-output PTransforms while the correct
>>> implementation is figured out. And that a 'correct' implementation would
>>> look something like the Union typehints that are expected on multi-input
>>> PTransforms.
>>>
>>> I am happy to help out and wanted to get the discussion started around
>>> what the community would like to see here. Thank you all for a great
>>> product.
>>>
>>> Best,
>>> Joshua
>>>
>>> --
>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>
>>
>
> --
> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>

Re: Implementing type hints on multi-output PTransforms

Posted by "Joshua B. Harrison" <jo...@gmail.com>.
The current code errors out with a cryptic message around tag types in the
multi-output. Adding a NotImplementedError was just an attempt to make the
failure reason more clear.

I would be worried about trivially passing because then the user might
think they have type checking safety when they don't, which could cause
failures at later stages and might be hard to debug. Do you agree?

Best,
Joshua

On Tue, Mar 31, 2020 at 10:16 AM Luke Cwik <lc...@google.com> wrote:

> Would the NotImplementedError cause users pipeline errors or is that a
> signal to the type checking mechanism to ignore it?
> If this would cause failures I would rather make the unsupported case
> return something that would be trivially true.
>
> On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <
> josh.harrison@gmail.com> wrote:
>
>> Hey all,
>>
>> I brought up an issue recently on the user forums noting issues around
>> type hints and multi-output PTransforms:
>> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>>
>> As mentioned there, I think that a NotImplementedError should be raised
>> when attaching type hints to multi-output PTransforms while the correct
>> implementation is figured out. And that a 'correct' implementation would
>> look something like the Union typehints that are expected on multi-input
>> PTransforms.
>>
>> I am happy to help out and wanted to get the discussion started around
>> what the community would like to see here. Thank you all for a great
>> product.
>>
>> Best,
>> Joshua
>>
>> --
>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>
>

-- 
Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
<jo...@google.com> |  404-433-0242

Re: Implementing type hints on multi-output PTransforms

Posted by Luke Cwik <lc...@google.com>.
Would the NotImplementedError cause users pipeline errors or is that a
signal to the type checking mechanism to ignore it?
If this would cause failures I would rather make the unsupported case
return something that would be trivially true.

On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <jo...@gmail.com>
wrote:

> Hey all,
>
> I brought up an issue recently on the user forums noting issues around
> type hints and multi-output PTransforms:
> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>
> As mentioned there, I think that a NotImplementedError should be raised
> when attaching type hints to multi-output PTransforms while the correct
> implementation is figured out. And that a 'correct' implementation would
> look something like the Union typehints that are expected on multi-input
> PTransforms.
>
> I am happy to help out and wanted to get the discussion started around
> what the community would like to see here. Thank you all for a great
> product.
>
> Best,
> Joshua
>
> --
> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>