You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Daniel Oliveira (Jira)" <ji...@apache.org> on 2021/04/01 00:32:00 UTC

[jira] [Commented] (BEAM-11574) Enable x-lang on Dataflow side for integration tests

    [ https://issues.apache.org/jira/browse/BEAM-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312793#comment-17312793 ] 

Daniel Oliveira commented on BEAM-11574:
----------------------------------------

I'll use this Jira to write a summary of the steps that fixing this involved.

1. The original bug documented up top is fixed by enabling portable pipeline submission in Dataflow. This is a requirement to run cross-language pipelines, so the bug was caused because a cross-language proto pipeline was being submitted but getting executed without cross-language support. Enabling this took some trial and error, and ultimately required figuring out that portable pipelines required an additional field in job submission (SdkHarnessContainerImages) that took a list of environments, which necessitated adjusting the surrounding code to support multiple environments. It also required me adding a new flag that could provide container image overrides for multiple images, so that cross-language environments could also have custom images defined (necessary for testing at head).

2. After this I got errors where the job failed to start up properly and was hanging infinitely. The logs were very obtuse, but after getting some help from a domain expert in this we noticed the error was happening in a Warning log, describing being unable to read an empty environment. One of the environments in the list had an empty value. Using a debugger, I tracked this down to being due to the environment from the original expansion, which is a stub with no value assigned, still being preserved and merged in at the end, when expanding all cross-language transforms.

3. After doing a workaround to the issue above (by changing the stub environment into a full environment identical to Dataflow) I ran into another issue caused by an unidentified impulse transform that was causing the pipeline to fail. Turns out the step we do to add fake impulses was causing an error in Dataflow because the impulse wasn't getting properly removed. After stepping through the expansion service functionality in a debugger, I found that the fake impulses weren't even necessary anymore. After removing them, the pipelines started working.

4. I went back and did a proper fix to the stub environment issue. This proper fix was to stop namespacing environments before expansion. The later code where expanded transforms are merged into the pipeline assumes that the default environment is not namespaced, and skips it because it's already present in the proto. The bug was happening because this assumption wasn't true, so I made it true by not namespacing the default Go environment. Now the stub version of the default environment doesn't get merged into the final pipeline, it just gets skipped because there is already a default environment with the same name present.

> Enable x-lang on Dataflow side for integration tests
> ----------------------------------------------------
>
>                 Key: BEAM-11574
>                 URL: https://issues.apache.org/jira/browse/BEAM-11574
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language, sdk-go
>            Reporter: Daniel Oliveira
>            Assignee: Daniel Oliveira
>            Priority: P2
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dataflow x-lang ValidatesRunner tests are failing with the following error:
> {noformat}
> panic:  unmarshalling coder UIbLsVLhrXKvCoder
>         unmarshalling coder UIbLsVLhrXVoidCoder
> could not unmarshal coder from spec:{urn:"beam:coders:javasdk:0.1" payload:"\x82SNAPPY\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x88\xd6\x01\xe8\xac\xed\x00\x05sr\x00$org.apache.beam.sdk.coders.VoidCoder\xb9\xbfU\x9b\xe8\r\xafU\x02\x00\x00xr\x00&j3\x00\x14Atomic\x055 \xc7\xec\xb5̅tPF\x02\x055\x00*j5\x00$Structured\x059\x1cs\xbf\x12\x0e\xd5\xd46\x11\t9\x00 j9\x00\x05/0C\xddՉ\xae\xbc~\xf8\x02\x00\x00xp"}, unknown URN beam:coders:javasdk:0.1 [recovered]
>         panic:  unmarshalling coder UIbLsVLhrXKvCoder
>         unmarshalling coder UIbLsVLhrXVoidCoder
> could not unmarshal coder from spec:{urn:"beam:coders:javasdk:0.1" payload:"\x82SNAPPY\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x88\xd6\x01\xe8\xac\xed\x00\x05sr\x00$org.apache.beam.sdk.coders.VoidCoder\xb9\xbfU\x9b\xe8\r\xafU\x02\x00\x00xr\x00&j3\x00\x14Atomic\x055 \xc7\xec\xb5̅tPF\x02\x055\x00*j5\x00$Structured\x059\x1cs\xbf\x12\x0e\xd5\xd46\x11\t9\x00 j9\x00\x05/0C\xddՉ\xae\xbc~\xf8\x02\x00\x00xp"}, unknown URN beam:coders:javasdk:0.1
> goroutine 130 [running]:
> testing.tRunner.func1.1(0xe2f1a0, 0xc00059ea80)
>         /usr/lib/google-golang/src/testing/testing.go:1072 +0x30d
> testing.tRunner.func1(0xc000290f00)
>         /usr/lib/google-golang/src/testing/testing.go:1075 +0x41a
> panic(0xe2f1a0, 0xc00059ea80)
>         /usr/lib/google-golang/src/runtime/panic.go:969 +0x1b9
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateCoder(0xc0002da260, 0xc0005b5b00, 0xc0003fdee0, 0x11, 0xc000379ac8)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:337 +0xa5
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateOutputs(0xc0002da260, 0xc00079b1d0, 0xc0003fd600, 0x11, 0xc000376180)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:317 +0x190
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransform(0xc0002da260, 0xc00019e330, 0x2d, 0xc00078a660, 0x1b, 0xf15a80, 0xc000096000, 0xc0004fb898, 0x6475eb, 0xc000096000)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:111 +0x111
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransforms(0xc0002da260, 0xc00019e330, 0x2d, 0xc000265580, 0x7, 0x8, 0x2d, 0x6, 0x0, 0xc00003f41a, ...)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:97 +0xda
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransform(0xc0002da260, 0xc000280600, 0x1a, 0xc0001266d8, 0x2, 0xf15a80, 0xc000096000, 0xc0004fc710, 0x6475eb, 0xc000096000)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:285 +0x36d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransforms(0xc0002da260, 0xc000280600, 0x1a, 0xc0002be740, 0x1, 0x1, 0x1a, 0x1, 0x0, 0x0, ...)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:97 +0xda
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransform(0xc0002da260, 0x0, 0x0, 0xc0001266d4, 0x2, 0x1, 0x1, 0x2, 0x0, 0x0)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:285 +0x36d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.(*translator).translateTransforms(0xc0002da260, 0x0, 0x0, 0xc000180140, 0x5, 0x5, 0xc0000a15f8, 0x5b96a5, 0xc000102600, 0x200000003, ...)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:97 +0xda
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.translate(0xc000313800, 0x5d4fda, 0x1, 0x2, 0xc000778000, 0xa2)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/translate.go:73 +0x77
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.Translate(0x103ff00, 0xc00019c628, 0xc000313800, 0xc0002bbcb0, 0xc00019b900, 0x77, 0xc00016e2d0, 0x84, 0xc00019b880, 0x76, ...)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/job.go:77 +0x45
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib.Execute(0x103ff00, 0xc00019c628, 0xc000313800, 0xc0002bbcb0, 0xc00019b900, 0x77, 0xc00016e2d0, 0x84, 0xc00019b880, 0x76, ...)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflowlib/execute.go:91 +0x699
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow.Execute(0x103ff00, 0xc00019c628, 0xc000124888, 0x8, 0xc000142158, 0xf6ae01, 0xc00032ba40)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runners/dataflow/dataflow.go:207 +0xe7d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam.Run(0x103ff00, 0xc00019c628, 0x7ffec804dad3, 0x8, 0xc000124888, 0x102d860, 0xc00032b980, 0xc0002b7ee8, 0xba6e45)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/runner.go:50 +0x87
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest.Run(0xc000124888, 0xc00032a660, 0xc00079be00)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest/ptest.go:89 +0x8b
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest.RunAndValidate(0xc000290f00, 0xc000124888)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/testing/ptest/ptest.go:96 +0x2f
> github.com/apache/beam/sdks/go/test/integration/xlang.TestXLang_CombineGlobally(0xc000290f00)
>         /usr/local/google/home/danoliveira/go/src/github.com/apache/beam/sdks/go/test/integration/xlang/xlang_test.go:165 +0x249
> testing.tRunner(0xc000290f00, 0xf6a908)
>         /usr/lib/google-golang/src/testing/testing.go:1123 +0xef
> created by testing.(*T).Run
>         /usr/lib/google-golang/src/testing/testing.go:1168 +0x2b3
> {noformat}
> This seems to imply that the bundles intended to be sent to the Java SDK are still being sent to the Go SDK (these are Go SDK errors), so it seems that cross-language functionality still needs to be enabled for the integration tests.
> Edit:
> Looking closer, the stacktrace actually suggests that the problem is that Go's Dataflow translation doesn't properly support cross-language transforms. The solution would be to adjust dataflowlib/translate.go to support cross-language.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)