You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Udi Meiri <eh...@google.com> on 2020/04/13 22:38:17 UTC

Re: Implementing type hints on multi-output PTransforms

On Tue, Mar 31, 2020 at 10:16 AM Joshua B. Harrison <jo...@gmail.com>
wrote:

> Ok - that makes sense. My specific workaround was to remove the
> with_output_types for now, so advising the user on this in the error
> message would be nice. I was just worried about silently passing.
>
> As for the formalization:
>
>    1. I am a little confused on how this is different than passing
>    multiple tagged inputs to a PTransform that does a CoGroupBy*. In this
>    case, with_input_types seems to expect a union of all the types for the
>    keyed values. Why would the same not work for output types?
>
> Since the outputs are distinct PCollections (as in the example
in BEAM-4132) they each have their own element type. If one of these
PCollections is then passed as an input to a transform, our type checking
is more precise if we use the type of just that pcoll instead of the union
of all pcoll types returned.

>
>    1. What is the process for proposing a formalized solution? Should I
>    start a document, or does one already exist? Or does this kind of thing get
>    tracked via Jira issues?
>
> In this case I think an email titled "[PROPSAL] ..." to this mailing list
describing what you want to change should be enough. A document could also
work; I'm not aware of one that touches on this.
Your PR would need to have an associated JIRA (feel free to take over any
of the ones I've mentioned).


> Best,
> Joshua
>
> On Tue, Mar 31, 2020 at 11:07 AM Udi Meiri <eh...@google.com> wrote:
>
>> Hi Joshua,
>> I've been working on type hints recently.
>> Your issue is similar to this:
>> https://issues.apache.org/jira/browse/BEAM-8782 (exactly the same if
>> tags are passed to with_outputs() in the example).
>> There's also this related bug about type inference:
>> https://issues.apache.org/jira/browse/BEAM-4132
>>
>> I agree with Luke that it would be helpful to point to a workaround in
>> the error message (such as removing with_output_types).
>>
>> From what I remember, we'll need to formalize how multi-output type hints
>> are provided to Beam.
>> For example, by passing keywords to with_output_types: main=type,
>> TAG=type, etc.
>>
>> On Tue, Mar 31, 2020 at 9:55 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> I can see that argument but what does a user need to do in this case if
>>> we raise NotImplementedError? Would the need to disable type checking
>>> everywhere?
>>>
>>> Over the long term users will need to deal with improvements to type
>>> checking and will need to fix typing errors when they change Apache Beam
>>> versions.
>>>
>>>
>>> On Tue, Mar 31, 2020 at 9:34 AM Joshua B. Harrison <
>>> josh.harrison@gmail.com> wrote:
>>>
>>>> The current code errors out with a cryptic message around tag types in
>>>> the multi-output. Adding a NotImplementedError was just an attempt to make
>>>> the failure reason more clear.
>>>>
>>>> I would be worried about trivially passing because then the user might
>>>> think they have type checking safety when they don't, which could cause
>>>> failures at later stages and might be hard to debug. Do you agree?
>>>>
>>>> Best,
>>>> Joshua
>>>>
>>>> On Tue, Mar 31, 2020 at 10:16 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> Would the NotImplementedError cause users pipeline errors or is that a
>>>>> signal to the type checking mechanism to ignore it?
>>>>> If this would cause failures I would rather make the unsupported case
>>>>> return something that would be trivially true.
>>>>>
>>>>> On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison <
>>>>> josh.harrison@gmail.com> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> I brought up an issue recently on the user forums noting issues
>>>>>> around type hints and multi-output PTransforms:
>>>>>> https://lists.apache.org/thread.html/r94bf2e43f09a290dbe87d5a8d7eedb34ea215e0bea861521cbdb0c1c%40%3Cuser.beam.apache.org%3E
>>>>>>
>>>>>> As mentioned there, I think that a NotImplementedError should be
>>>>>> raised when attaching type hints to multi-output PTransforms while the
>>>>>> correct implementation is figured out. And that a 'correct' implementation
>>>>>> would look something like the Union typehints that are expected on
>>>>>> multi-input PTransforms.
>>>>>>
>>>>>> I am happy to help out and wanted to get the discussion started
>>>>>> around what the community would like to see here. Thank you all for a great
>>>>>> product.
>>>>>>
>>>>>> Best,
>>>>>> Joshua
>>>>>>
>>>>>> --
>>>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
>>>> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>>>>
>>>
>
> --
> Joshua Harrison |  Software Engineer |  joshharrison@gmail.com
> <jo...@google.com> |  404-433-0242 <(404)%20433-0242>
>