You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Evan Galpin <eg...@apache.org> on 2023/12/12 23:24:05 UTC

[Dataflow][Java][2.52.0] Upgrading to 2.52.0 Surfaces Pubsub Coder Error

Hi all,

When attempting to upgrade a running Dataflow pipeline from SDK 2.51.0 to
2.52.0, an incompatibility warning is surfaced that prevents pipeline
upgrade:


> The Coder or type for step .../PubsubUnboundedSource has changed


Was there an intentional coder change introduced for PubsubMessage in
2.52.0?  I didn't note anything in the release notes
https://beam.apache.org/blog/beam-2.52.0/ nor recent changes in
PubsubMessageWithAttributesCoder[1].  Specifically the step uses
`PubsubMessageWithAttributesCoder` via
`PubsubIO.readMessagesWithAttributes()`

Thanks!


[1]
https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36

Re: [Dataflow][Java][2.52.0] Upgrading to 2.52.0 Surfaces Pubsub Coder Error

Posted by Wiśniowski Piotr <co...@gmail.com>.
Hi Evan,

Actually did hit similar problem few months ago and finally managed to 
solve it. My situation is a bit different as I am using Python SDK and 
DataFlow runner v2 (+streaming engine, prime), and quite a lot 
of state-full processing. And for my case it did fail with very similar 
msg but related to some state-full step.

The thing is I discovered that update pipeline in place does fail even 
when submitting exact same code to the pipeline. it seems the problem 
was that the pipeline graph must be parsed in same order that on the 
original graph. In my case I had an unordered set of steps to add them 
to pipeline resulting in the same pipeline graph, but it seems that the 
ordering of parsing does matter and it fails to update running job if 
order is different.

For my case I just sorted the steps to be added to pipeline by name and 
updating job on fly started working. So it seems that pipeline state on 
DataFlow depends somehow on the order in which steps are added to 
pipeline since some recent versions (as I do recall this was working 
correctly ~2.50?). Anyone knows if this is intended? If yes would like 
to know some explanation.

Best regards

Wiśniowski Piotr

On 15.12.2023 00:14, Evan Galpin wrote:
> The pipeline in question is using Dataflow v1 Runner (Runner v2: 
> Disabled) in case that's an important detail.
>
> On Tue, Dec 12, 2023 at 4:22 PM Evan Galpin <eg...@apache.org> wrote:
>
>     I've checked the source code and deployment command for cases of
>     setting experiments. I don't see "enable_custom_pubsub_source"
>     being used at all, no.  I also confirmed that it is not active on
>     the existing/running job.
>
>     On Tue, Dec 12, 2023 at 4:11 PM Reuven Lax via user
>     <us...@beam.apache.org> wrote:
>
>         Are you setting the enable_custom_pubsub_source experiment by
>         any chance?
>
>         On Tue, Dec 12, 2023 at 3:24 PM Evan Galpin
>         <eg...@apache.org> wrote:
>
>             Hi all,
>
>             When attempting to upgrade a running Dataflow pipeline
>             from SDK 2.51.0 to 2.52.0, an incompatibility warning is
>             surfaced that prevents pipeline upgrade:
>
>                 The Coder or type for step .../PubsubUnboundedSource
>                 has changed
>
>
>             Was there an intentional coder change introduced for
>             PubsubMessage in 2.52.0?  I didn't note anything in the
>             release notes https://beam.apache.org/blog/beam-2.52.0/
>             nor recent changes in
>             PubsubMessageWithAttributesCoder[1].  Specifically the
>             step uses `PubsubMessageWithAttributesCoder` via
>             `PubsubIO.readMessagesWithAttributes()`
>
>             Thanks!
>
>
>             [1]
>             https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36
>

Re: [Dataflow][Java][2.52.0] Upgrading to 2.52.0 Surfaces Pubsub Coder Error

Posted by Wiśniowski Piotr <co...@gmail.com>.
Hi Evan,

Actually did hit similar problem few months ago and finally managed to 
solve it. My situation is a bit different as I am using Python SDK and 
DataFlow runner v2 (+streaming engine, prime), and quite a lot 
of state-full processing. And for my case it did fail with very similar 
msg but related to some state-full step.

The thing is I discovered that update pipeline in place does fail even 
when submitting exact same code to the pipeline. it seems the problem 
was that the pipeline graph must be parsed in same order that on the 
original graph. In my case I had an unordered set of steps to add them 
to pipeline resulting in the same pipeline graph, but it seems that the 
ordering of parsing does matter and it fails to update running job if 
order is different.

For my case I just sorted the steps to be added to pipeline by name and 
updating job on fly started working. So it seems that pipeline state on 
DataFlow depends somehow on the order in which steps are added to 
pipeline since some recent versions (as I do recall this was working 
correctly ~2.50?). Anyone knows if this is intended? If yes would like 
to know some explanation.

Best regards

Wiśniowski Piotr

On 15.12.2023 00:14, Evan Galpin wrote:
> The pipeline in question is using Dataflow v1 Runner (Runner v2: 
> Disabled) in case that's an important detail.
>
> On Tue, Dec 12, 2023 at 4:22 PM Evan Galpin <eg...@apache.org> wrote:
>
>     I've checked the source code and deployment command for cases of
>     setting experiments. I don't see "enable_custom_pubsub_source"
>     being used at all, no.  I also confirmed that it is not active on
>     the existing/running job.
>
>     On Tue, Dec 12, 2023 at 4:11 PM Reuven Lax via user
>     <us...@beam.apache.org> wrote:
>
>         Are you setting the enable_custom_pubsub_source experiment by
>         any chance?
>
>         On Tue, Dec 12, 2023 at 3:24 PM Evan Galpin
>         <eg...@apache.org> wrote:
>
>             Hi all,
>
>             When attempting to upgrade a running Dataflow pipeline
>             from SDK 2.51.0 to 2.52.0, an incompatibility warning is
>             surfaced that prevents pipeline upgrade:
>
>                 The Coder or type for step .../PubsubUnboundedSource
>                 has changed
>
>
>             Was there an intentional coder change introduced for
>             PubsubMessage in 2.52.0?  I didn't note anything in the
>             release notes https://beam.apache.org/blog/beam-2.52.0/
>             nor recent changes in
>             PubsubMessageWithAttributesCoder[1].  Specifically the
>             step uses `PubsubMessageWithAttributesCoder` via
>             `PubsubIO.readMessagesWithAttributes()`
>
>             Thanks!
>
>
>             [1]
>             https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36
>

Re: [Dataflow][Java][2.52.0] Upgrading to 2.52.0 Surfaces Pubsub Coder Error

Posted by Evan Galpin <eg...@apache.org>.
The pipeline in question is using Dataflow v1 Runner (Runner v2: Disabled)
in case that's an important detail.

On Tue, Dec 12, 2023 at 4:22 PM Evan Galpin <eg...@apache.org> wrote:

> I've checked the source code and deployment command for cases of setting
> experiments. I don't see "enable_custom_pubsub_source" being used at all,
> no.  I also confirmed that it is not active on the existing/running job.
>
> On Tue, Dec 12, 2023 at 4:11 PM Reuven Lax via user <us...@beam.apache.org>
> wrote:
>
>> Are you setting the enable_custom_pubsub_source experiment by any chance?
>>
>> On Tue, Dec 12, 2023 at 3:24 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> When attempting to upgrade a running Dataflow pipeline from SDK 2.51.0
>>> to 2.52.0, an incompatibility warning is surfaced that prevents pipeline
>>> upgrade:
>>>
>>>
>>>> The Coder or type for step .../PubsubUnboundedSource has changed
>>>
>>>
>>> Was there an intentional coder change introduced for PubsubMessage in
>>> 2.52.0?  I didn't note anything in the release notes
>>> https://beam.apache.org/blog/beam-2.52.0/ nor recent changes in
>>> PubsubMessageWithAttributesCoder[1].  Specifically the step uses
>>> `PubsubMessageWithAttributesCoder` via
>>> `PubsubIO.readMessagesWithAttributes()`
>>>
>>> Thanks!
>>>
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36
>>>
>>

Re: [Dataflow][Java][2.52.0] Upgrading to 2.52.0 Surfaces Pubsub Coder Error

Posted by Evan Galpin <eg...@apache.org>.
I've checked the source code and deployment command for cases of setting
experiments. I don't see "enable_custom_pubsub_source" being used at all,
no.  I also confirmed that it is not active on the existing/running job.

On Tue, Dec 12, 2023 at 4:11 PM Reuven Lax via user <us...@beam.apache.org>
wrote:

> Are you setting the enable_custom_pubsub_source experiment by any chance?
>
> On Tue, Dec 12, 2023 at 3:24 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Hi all,
>>
>> When attempting to upgrade a running Dataflow pipeline from SDK 2.51.0 to
>> 2.52.0, an incompatibility warning is surfaced that prevents pipeline
>> upgrade:
>>
>>
>>> The Coder or type for step .../PubsubUnboundedSource has changed
>>
>>
>> Was there an intentional coder change introduced for PubsubMessage in
>> 2.52.0?  I didn't note anything in the release notes
>> https://beam.apache.org/blog/beam-2.52.0/ nor recent changes in
>> PubsubMessageWithAttributesCoder[1].  Specifically the step uses
>> `PubsubMessageWithAttributesCoder` via
>> `PubsubIO.readMessagesWithAttributes()`
>>
>> Thanks!
>>
>>
>> [1]
>> https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36
>>
>

Re: [Dataflow][Java][2.52.0] Upgrading to 2.52.0 Surfaces Pubsub Coder Error

Posted by Reuven Lax via user <us...@beam.apache.org>.
Are you setting the enable_custom_pubsub_source experiment by any chance?

On Tue, Dec 12, 2023 at 3:24 PM Evan Galpin <eg...@apache.org> wrote:

> Hi all,
>
> When attempting to upgrade a running Dataflow pipeline from SDK 2.51.0 to
> 2.52.0, an incompatibility warning is surfaced that prevents pipeline
> upgrade:
>
>
>> The Coder or type for step .../PubsubUnboundedSource has changed
>
>
> Was there an intentional coder change introduced for PubsubMessage in
> 2.52.0?  I didn't note anything in the release notes
> https://beam.apache.org/blog/beam-2.52.0/ nor recent changes in
> PubsubMessageWithAttributesCoder[1].  Specifically the step uses
> `PubsubMessageWithAttributesCoder` via
> `PubsubIO.readMessagesWithAttributes()`
>
> Thanks!
>
>
> [1]
> https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36
>