You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/09 23:06:24 UTC

[GitHub] [beam] youngoli opened a new issue, #21784: [Bug]: BigQuery cross-language ReadFromQuery is outputting Beam rows with a different UUID than expected output.

youngoli opened a new issue, #21784:
URL: https://github.com/apache/beam/issues/21784

   ### What happened?
   
   This was found in the Go SDK but the root cause seems to be in the expansion service. When performing an xlang BigQueryIO read from a query, the Beam rows it outputs end up being structurally identical to the registered output type in Go, but not actually equivalent, so it can't be converted to the named struct output despite being structurally identical.
   
   ### Workaround in Go SDK
   
   To workaround this issue in the short term, turn the named struct type that's being used as the output to a type alias of the unnamed type. This can easily be done by inserting an = sign.
   
   Before: `type OutputRow struct {...}`
   After: `type OutputRow = struct {...}`
   
   Note that this doesn't play well with Beam Go's type registration, you'll need to avoid registering the type alias.
   
   ### Log Snippets
   
   Here's a snippet of the error on the Go side to see how it manifests:
   ```
   panic: interface conversion: interface {} is struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }, not bigquery.TestRowPtrs
   Full error:
   while executing Process for Plan[s02-67]:
   2: DataSink[S[ptransform-65@localhost:12371]] Coder:W;coder-80<LP;coder-81<R[bigquery.TestRow]>>!GWC
   3: PCollection[pcollection-72] Out:[2]
   4: ParDo[bigquery.castFn] Out:[2]
   1: DataSource[S[ptransform-64@localhost:12371], 0] Coder:W;coder-76<LP;coder-77<R[struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }]>>!GWC Out:4
   	caused by:
   panic: interface conversion: interface {} is struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }, not bigquery.TestRowPtrs goroutine 58 [running]:
   runtime/debug.Stack()
   	/usr/lib/google-golang/src/runtime/debug/stack.go:24 +0x65
   github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.callNoPanic.func1()
   	{...}/repos/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:58 +0xa5
   panic({0xe0d380, 0xc0003a04e0})
   	/usr/lib/google-golang/src/runtime/panic.go:866 +0x212
   github.com/apache/beam/sdks/v2/go/pkg/beam/register.(*caller1x1[...]).Call1x1(...)
   	{...}/repos/beam/sdks/go/pkg/beam/register/register.go:3205
   ```
   And a snippet of the raw schema protos that are received by the Go SDK, compared with the schema it's expecting.
   
   Received schema:
   ```
   Schema proto: fields: {
     name: "counter"
     type: {
       nullable: true
       atomic_type: INT64
     }
   }
   fields: {
     name: "rand_data"
     type: {
       nullable: true
       row_type: {
         schema: {
           fields: {
             name: "flip"
             type: {
               nullable: true
               atomic_type: BOOLEAN
             }
           }
           fields: {
             name: "num"
             type: {
               nullable: true
               atomic_type: INT64
             }
             id: 1
             encoding_position: 1
           }
           fields: {
             name: "word"
             type: {
               nullable: true
               atomic_type: STRING
             }
             id: 2
             encoding_position: 2
           }
           id: "141b0073-d725-456c-bcdc-46c9c84e7a6d"
         }
       }
     }
     id: 1
     encoding_position: 1
   }
   id: "d520c5bd-86f8-4a7b-8cbd-af6816f09f61"
   ```
   Expected schema:
   ```
   Schema proto: fields: {
     name: "counter"
     type: {
       nullable: true
       atomic_type: INT64
     }
   }
   fields: {
     name: "rand_data"
     type: {
       nullable: true
       row_type: {
         schema: {
           fields: {
             name: "flip"
             type: {
               nullable: true
               atomic_type: BOOLEAN
             }
           }
           fields: {
             name: "num"
             type: {
               nullable: true
               atomic_type: INT64
             }
           }
           fields: {
             name: "word"
             type: {
               nullable: true
               atomic_type: STRING
             }
           }
           id: "c39b4c69-1e23-4267-9fb2-776e1a61a34f"
         }
       }
     }
   }
   id: "952f2fc2-afb0-4646-aaec-88b9a0f307be"
   ```
   To see the data above, simply add the following lines after [graphx/coder.go:371](https://github.com/apache/beam/blob/v2.39.0/sdks/go/pkg/beam/core/runtime/graphx/coder.go#L371):
   ```
   sp := prototext.Format(&s)
   log.Warnf(context.Background(), "Schema proto: %v", sp)
   log.Warnf(context.Background(), "Schema type: %v", t)
   return coder.NewR(typex.New(t)), nil
   ```
   
   ### Issue Priority
   
   Priority: 1
   
   ### Issue Component
   
   Component: cross-language


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] capthiron commented on issue #21784: [Bug]: BigQuery cross-language ReadFromQuery is outputting Beam rows with a different UUID than expected output.

Posted by GitBox <gi...@apache.org>.
capthiron commented on issue #21784:
URL: https://github.com/apache/beam/issues/21784#issuecomment-1332159669

   Hey @youngoli :) 
   
   Do you happen to have a working example with the workaround?
   I am not quite sure how to make it work with the aliasing. 
   
   Best regards!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #21784: [Bug]: BigQuery cross-language ReadFromQuery is outputting Beam rows with a different UUID than expected output.

Posted by GitBox <gi...@apache.org>.
lostluck commented on issue #21784:
URL: https://github.com/apache/beam/issues/21784#issuecomment-1332536204

   Well, the [example here exists](https://github.com/apache/beam/blob/master/sdks/go/examples/xlang/bigquery/wordcount.go ), but it doesn't have aliasing.
   
   The aliasing work around is demonstrated here, in the integration tests:
   
   https://github.com/apache/beam/blob/ffa46b330fb62d49baa3c7afc9c9cd89384ad774/sdks/go/test/integration/io/xlang/bigquery/bigquery_test.go#L153


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org