You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Valentyn Tymofieiev <va...@google.com> on 2022/03/22 00:39:33 UTC

Re: [Python] Heterogeneous TaggedOutput Type Hints

I came across this thread and wasn't able to reproduce the `expecting a KV
coder, but had Strings` error, so hopefully that's fixed now. I had to
modify the repro to add .with_outputs() to the line 49 in
https://gist.github.com/egalpin/2d6ad2210cf9f66108ff48a9c7566ebc

On Mon, Sep 27, 2021 at 5:58 PM Robert Bradshaw <ro...@google.com> wrote:

> As a workaround, can you try passing the use_portable_job_submission
> experiment?
>
> On Mon, Sep 27, 2021 at 2:19 PM Luke Cwik <lc...@google.com> wrote:
> >
> > Sorry, I forgot that you had a minimal repro for this issue, I attached
> details to the internal bug.
> >
> > On Mon, Sep 27, 2021 at 2:18 PM Luke Cwik <lc...@google.com> wrote:
> >>
> >> There is an internal bug 195053987 that matches what you're describing
> but we were unable able to get a minimal repro for it. It would be useful
> if you had a minimal repro for the issue that I could update the internal
> bug with details and/or you could reach out to GCP support with job ids
> and/or minimal repros to get support as well.
> >>
> >> On Wed, Sep 22, 2021 at 6:57 AM Evan Galpin <ev...@gmail.com>
> wrote:
> >>>
> >>> Thanks for the response Luke :-)
> >>>
> >>> I did try setting <pcoll>.element_type for each resulting PCollection
> using "apache_beam.typehints.typehints.KV" to describe the elements, which
> passed type checking.  I also ran the full dataset (batch job) without the
> GBK in question but instead using a dummy DoFn in its place which asserted
> that every element that would be going into the GBK was a 2-tuple, along
> with using --runtime_type_check, all of which run successfully without the
> GBK after the TaggedOutput DoFn.
> >>>
> >>> Adding back the GBK also runs end-to-end successfully on the
> DirectRunner using the identical dataset.  But as soon as I add the GBK and
> use the DataflowRunner (v2), I get errors as soon as the optimized step
> involving the GBK is in the "running" status:
> >>>
> >>> - "Could not start worker docker container"
> >>> - "Error syncing pod"
> >>> - "Check failed: pair_coder Strings" or "Check failed: kv_coder :
> expecting a KV coder, but had Strings"
> >>>
> >>> Anything further to try? I can also provide Job IDs from Dataflow if
> helpful (and safe to share).
> >>>
> >>> Thanks,
> >>> Evan
> >>>
> >>> On Wed, Sep 22, 2021 at 1:09 AM Luke Cwik <lc...@google.com> wrote:
> >>>>
> >>>> Have you tried setting the element_type[1] explicitly on each output
> PCollection that is returned after applying the multi-output ParDo?
> >>>> I believe you'll get a DoOutputsTuple[2] returned after applying the
> mult-output ParDo which allows access to the underlying PCollection objects.
> >>>>
> >>>> 1:
> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L99
> >>>> 2:
> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L234
> >>>>
> >>>> On Tue, Sep 21, 2021 at 10:29 AM Evan Galpin <ev...@gmail.com>
> wrote:
> >>>>>
> >>>>> This is badly plaguing a pipeline I'm currently developing, where
> the exact same data set and code runs end-to-end on DirectRunner, but fails
> on DataflowRunner with either "Check failed: kv_coder : expecting a KV
> coder, but had Strings" or "Check failed: pair_coder Strings" hidden in the
> harness logs. It seems to be consistently repeatable with any TaggedOutput
> + GBK afterwards.
> >>>>>
> >>>>> Any advice on how to proceed?
> >>>>>
> >>>>> Thanks,
> >>>>> Evan
> >>>>>
> >>>>> On Fri, Sep 17, 2021 at 11:20 AM Evan Galpin <ev...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> The Dataflow error logs only showed 1 error which was:  "The job
> failed because a work item has failed 4 times. Look in previous log entries
> for the cause of each one of the 4 failures. For more information, see
> https://cloud.google.com/dataflow/docs/guides/common-errors. The work
> item was attempted on these workers: beamapp-XXXX-XXXXX-kt85-harness-8k2c
> Root cause: The worker lost contact with the service."  In "Diagnostics"
> there were errors stating "Error syncing pod: Could not start worker docker
> container".  The harness logs i.e. "projects/my-project/logs/
> dataflow.googleapis.com%2Fharness" finally contained an error that looked
> suspect, which was "Check failed: kv_coder : expecting a KV coder, but had
> Strings", below[1] is a link to possibly a stacktrace or extra detail, but
> is internal to google so I don't have access.
> >>>>>>
> >>>>>> [1]
> https://symbolize.corp.google.com/r/?trace=55a197abcf56,55a197abbe33,55a197abb97e,55a197abd708,55a196d4e22f,55a196d4d8d3,55a196d4da35,55a1967ec247,55a196f62b26,55a1968969b3,55a196886613,55a19696b0e6,55a196969815,55a1969693eb,55a19696916e,55a1969653bc,55a196b0150a,55a196b04e11,55a1979fc8df,7fe7736674e7,7fe7734dc22c&map=13ddc0ac8b57640c29c5016eb26ef88e:55a1956e7000-55a197bd5010,f1c96c67b57b74a4d7050f34aca016eef674f765:7fe773660000-7fe773676dac,76b955c7af655a4c1e53b8d4aaa0255f3721f95f:7fe7734a5000-7fe7736464c4
> >>>>>>
> >>>>>> On Thu, Sep 9, 2021 at 6:46 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>>>>>
> >>>>>>> Huh, that's strange. Yes, the exact error on the service would be
> helpful.
> >>>>>>>
> >>>>>>> On Wed, Sep 8, 2021 at 10:12 AM Evan Galpin <ev...@gmail.com>
> wrote:
> >>>>>>> >
> >>>>>>> > Thanks for the response. I've created a gist here to demonstrate
> a minimal repro:
> https://gist.github.com/egalpin/2d6ad2210cf9f66108ff48a9c7566ebc
> >>>>>>> >
> >>>>>>> > It seemed to run fine both on DirectRunner and PortableRunner
> (embed mode), but Dataflow v2 runner raised an error at runtime seemingly
> associated with the Shuffle service?  I have job IDs and trace links if
> those are helpful as well.
> >>>>>>> >
> >>>>>>> > Thanks,
> >>>>>>> > Evan
> >>>>>>> >
> >>>>>>> > On Tue, Sep 7, 2021 at 4:35 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >>
> >>>>>>> >> This is not yet supported. Using a union for now is the way to
> go. (If
> >>>>>>> >> only the last value of the union was used, that sounds like a
> bug. Do
> >>>>>>> >> you have a minimal repro?)
> >>>>>>> >>
> >>>>>>> >> On Tue, Sep 7, 2021 at 1:23 PM Evan Galpin <
> evan.galpin@gmail.com> wrote:
> >>>>>>> >> >
> >>>>>>> >> > Hi all,
> >>>>>>> >> >
> >>>>>>> >> > What is the recommended way to write type hints for a tagged
> output DoFn where the outputs to different tags have different types?
> >>>>>>> >> >
> >>>>>>> >> > I tried using a Union to describe each of the possible output
> types, but that resulted in mismatched coder errors where only the last
> entry in the Union was used as the assumed type.  Is there a way to
> associate a type hint to a tag or something like that?
> >>>>>>> >> >
> >>>>>>> >> > Thanks,
> >>>>>>> >> > Evan
>

Re: [Python] Heterogeneous TaggedOutput Type Hints

Posted by Evan Galpin <ev...@gmail.com>.
Thanks for the update!  I also was not able to repro, so presumably
something is fixed? :-)

Thanks,
Evan

On Mon, Mar 21, 2022 at 8:40 PM Valentyn Tymofieiev <va...@google.com>
wrote:

> I came across this thread and wasn't able to reproduce the `expecting a KV
> coder, but had Strings` error, so hopefully that's fixed now. I had to
> modify the repro to add .with_outputs() to the line 49 in
> https://gist.github.com/egalpin/2d6ad2210cf9f66108ff48a9c7566ebc
>
> On Mon, Sep 27, 2021 at 5:58 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> As a workaround, can you try passing the use_portable_job_submission
>> experiment?
>>
>> On Mon, Sep 27, 2021 at 2:19 PM Luke Cwik <lc...@google.com> wrote:
>> >
>> > Sorry, I forgot that you had a minimal repro for this issue, I attached
>> details to the internal bug.
>> >
>> > On Mon, Sep 27, 2021 at 2:18 PM Luke Cwik <lc...@google.com> wrote:
>> >>
>> >> There is an internal bug 195053987 that matches what you're describing
>> but we were unable able to get a minimal repro for it. It would be useful
>> if you had a minimal repro for the issue that I could update the internal
>> bug with details and/or you could reach out to GCP support with job ids
>> and/or minimal repros to get support as well.
>> >>
>> >> On Wed, Sep 22, 2021 at 6:57 AM Evan Galpin <ev...@gmail.com>
>> wrote:
>> >>>
>> >>> Thanks for the response Luke :-)
>> >>>
>> >>> I did try setting <pcoll>.element_type for each resulting PCollection
>> using "apache_beam.typehints.typehints.KV" to describe the elements, which
>> passed type checking.  I also ran the full dataset (batch job) without the
>> GBK in question but instead using a dummy DoFn in its place which asserted
>> that every element that would be going into the GBK was a 2-tuple, along
>> with using --runtime_type_check, all of which run successfully without the
>> GBK after the TaggedOutput DoFn.
>> >>>
>> >>> Adding back the GBK also runs end-to-end successfully on the
>> DirectRunner using the identical dataset.  But as soon as I add the GBK and
>> use the DataflowRunner (v2), I get errors as soon as the optimized step
>> involving the GBK is in the "running" status:
>> >>>
>> >>> - "Could not start worker docker container"
>> >>> - "Error syncing pod"
>> >>> - "Check failed: pair_coder Strings" or "Check failed: kv_coder :
>> expecting a KV coder, but had Strings"
>> >>>
>> >>> Anything further to try? I can also provide Job IDs from Dataflow if
>> helpful (and safe to share).
>> >>>
>> >>> Thanks,
>> >>> Evan
>> >>>
>> >>> On Wed, Sep 22, 2021 at 1:09 AM Luke Cwik <lc...@google.com> wrote:
>> >>>>
>> >>>> Have you tried setting the element_type[1] explicitly on each output
>> PCollection that is returned after applying the multi-output ParDo?
>> >>>> I believe you'll get a DoOutputsTuple[2] returned after applying the
>> mult-output ParDo which allows access to the underlying PCollection objects.
>> >>>>
>> >>>> 1:
>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L99
>> >>>> 2:
>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L234
>> >>>>
>> >>>> On Tue, Sep 21, 2021 at 10:29 AM Evan Galpin <ev...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> This is badly plaguing a pipeline I'm currently developing, where
>> the exact same data set and code runs end-to-end on DirectRunner, but fails
>> on DataflowRunner with either "Check failed: kv_coder : expecting a KV
>> coder, but had Strings" or "Check failed: pair_coder Strings" hidden in the
>> harness logs. It seems to be consistently repeatable with any TaggedOutput
>> + GBK afterwards.
>> >>>>>
>> >>>>> Any advice on how to proceed?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Evan
>> >>>>>
>> >>>>> On Fri, Sep 17, 2021 at 11:20 AM Evan Galpin <ev...@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> The Dataflow error logs only showed 1 error which was:  "The job
>> failed because a work item has failed 4 times. Look in previous log entries
>> for the cause of each one of the 4 failures. For more information, see
>> https://cloud.google.com/dataflow/docs/guides/common-errors. The work
>> item was attempted on these workers: beamapp-XXXX-XXXXX-kt85-harness-8k2c
>> Root cause: The worker lost contact with the service."  In "Diagnostics"
>> there were errors stating "Error syncing pod: Could not start worker docker
>> container".  The harness logs i.e. "projects/my-project/logs/
>> dataflow.googleapis.com%2Fharness" finally contained an error that
>> looked suspect, which was "Check failed: kv_coder : expecting a KV coder,
>> but had Strings", below[1] is a link to possibly a stacktrace or extra
>> detail, but is internal to google so I don't have access.
>> >>>>>>
>> >>>>>> [1]
>> https://symbolize.corp.google.com/r/?trace=55a197abcf56,55a197abbe33,55a197abb97e,55a197abd708,55a196d4e22f,55a196d4d8d3,55a196d4da35,55a1967ec247,55a196f62b26,55a1968969b3,55a196886613,55a19696b0e6,55a196969815,55a1969693eb,55a19696916e,55a1969653bc,55a196b0150a,55a196b04e11,55a1979fc8df,7fe7736674e7,7fe7734dc22c&map=13ddc0ac8b57640c29c5016eb26ef88e:55a1956e7000-55a197bd5010,f1c96c67b57b74a4d7050f34aca016eef674f765:7fe773660000-7fe773676dac,76b955c7af655a4c1e53b8d4aaa0255f3721f95f:7fe7734a5000-7fe7736464c4
>> >>>>>>
>> >>>>>> On Thu, Sep 9, 2021 at 6:46 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>>
>> >>>>>>> Huh, that's strange. Yes, the exact error on the service would be
>> helpful.
>> >>>>>>>
>> >>>>>>> On Wed, Sep 8, 2021 at 10:12 AM Evan Galpin <
>> evan.galpin@gmail.com> wrote:
>> >>>>>>> >
>> >>>>>>> > Thanks for the response. I've created a gist here to
>> demonstrate a minimal repro:
>> https://gist.github.com/egalpin/2d6ad2210cf9f66108ff48a9c7566ebc
>> >>>>>>> >
>> >>>>>>> > It seemed to run fine both on DirectRunner and PortableRunner
>> (embed mode), but Dataflow v2 runner raised an error at runtime seemingly
>> associated with the Shuffle service?  I have job IDs and trace links if
>> those are helpful as well.
>> >>>>>>> >
>> >>>>>>> > Thanks,
>> >>>>>>> > Evan
>> >>>>>>> >
>> >>>>>>> > On Tue, Sep 7, 2021 at 4:35 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>> >>
>> >>>>>>> >> This is not yet supported. Using a union for now is the way to
>> go. (If
>> >>>>>>> >> only the last value of the union was used, that sounds like a
>> bug. Do
>> >>>>>>> >> you have a minimal repro?)
>> >>>>>>> >>
>> >>>>>>> >> On Tue, Sep 7, 2021 at 1:23 PM Evan Galpin <
>> evan.galpin@gmail.com> wrote:
>> >>>>>>> >> >
>> >>>>>>> >> > Hi all,
>> >>>>>>> >> >
>> >>>>>>> >> > What is the recommended way to write type hints for a tagged
>> output DoFn where the outputs to different tags have different types?
>> >>>>>>> >> >
>> >>>>>>> >> > I tried using a Union to describe each of the possible
>> output types, but that resulted in mismatched coder errors where only the
>> last entry in the Union was used as the assumed type.  Is there a way to
>> associate a type hint to a tag or something like that?
>> >>>>>>> >> >
>> >>>>>>> >> > Thanks,
>> >>>>>>> >> > Evan
>>
>