You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by ed...@gmail.com, ed...@gmail.com on 2018/06/22 00:40:04 UTC

Go SDK: Bigquery and nullable field types.

I am using the bigqueryio transform and I am using the following struct to collect a data row:

type Record {
  source_service  biquery.NullString
  .. etc...
}

This works fine with the direct runner, but when I try it with the dataflow runner, then I get the following exception:
 
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -41: execute failed: bigquery: schema field source_service of type STRING is not assignable to struct field source_service of type struct { StringVal string; Valid bool }
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:55)
	at com.google.cloud.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:274)
	at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
	at com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:101)
	at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
	at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
	at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
	at com.google.cloud.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:179)
	at com.google.cloud.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:107)
	Suppressed: java.lang.IllegalStateException: Already closed.
		at org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:97)
		at com.google.cloud.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:93)
		at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:89)
		... 6 more

Looks like the bigquery API is failing to detect the nullable type NullString, and instead is attempting a plain assignment. Could it be that some aspect of the type information has been lost thus preventing the bigquery API from identifying and handling NullString properly?


Re: Go SDK: Bigquery and nullable field types.

Posted by ed...@gmail.com, ed...@gmail.com.

On 2018/06/22 01:20:19, Henning Rohde <he...@google.com> wrote: 
> The Go SDK can't actually serialize named types -- we serialize the
> structural information and recreate assignment-compatible isomorphic
> unnamed types at runtime for convenience. This usually works fine, but
> perhaps not if inspected reflectively. Have you tried to Register the
> Record (or bigquery.NullString) type? That bypasses the serialization.

Type registration makes it work. This also takes care of  my Legacy SQL problem since I  can just handle timestamps through time.Time.

Thanks.


> 
> Thanks,
>  Henning
> 
> On Thu, Jun 21, 2018 at 5:40 PM eduardo.morales@gmail.com <
> eduardo.morales@gmail.com> wrote:
> 
> > I am using the bigqueryio transform and I am using the following struct to
> > collect a data row:
> >
> > type Record {
> >   source_service  biquery.NullString
> >   .. etc...
> > }
> >
> > This works fine with the direct runner, but when I try it with the
> > dataflow runner, then I get the following exception:
> >
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error
> > received from SDK harness for instruction -41: execute failed: bigquery:
> > schema field source_service of type STRING is not assignable to struct
> > field source_service of type struct { StringVal string; Valid bool }
> >         at
> > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> >         at
> > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> >         at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:55)
> >         at
> > com.google.cloud.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:274)
> >         at
> > com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
> >         at
> > com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:101)
> >         at
> > com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
> >         at
> > com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
> >         at
> > com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
> >         at
> > com.google.cloud.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:179)
> >         at
> > com.google.cloud.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:107)
> >         Suppressed: java.lang.IllegalStateException: Already closed.
> >                 at
> > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:97)
> >                 at
> > com.google.cloud.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:93)
> >                 at
> > com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:89)
> >                 ... 6 more
> >
> > Looks like the bigquery API is failing to detect the nullable type
> > NullString, and instead is attempting a plain assignment. Could it be that
> > some aspect of the type information has been lost thus preventing the
> > bigquery API from identifying and handling NullString properly?
> >
> >
> 

Re: Go SDK: Bigquery and nullable field types.

Posted by Henning Rohde <he...@google.com>.
The Go SDK can't actually serialize named types -- we serialize the
structural information and recreate assignment-compatible isomorphic
unnamed types at runtime for convenience. This usually works fine, but
perhaps not if inspected reflectively. Have you tried to Register the
Record (or bigquery.NullString) type? That bypasses the serialization.

Thanks,
 Henning

On Thu, Jun 21, 2018 at 5:40 PM eduardo.morales@gmail.com <
eduardo.morales@gmail.com> wrote:

> I am using the bigqueryio transform and I am using the following struct to
> collect a data row:
>
> type Record {
>   source_service  biquery.NullString
>   .. etc...
> }
>
> This works fine with the direct runner, but when I try it with the
> dataflow runner, then I get the following exception:
>
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error
> received from SDK harness for instruction -41: execute failed: bigquery:
> schema field source_service of type STRING is not assignable to struct
> field source_service of type struct { StringVal string; Valid bool }
>         at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>         at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:55)
>         at
> com.google.cloud.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:274)
>         at
> com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
>         at
> com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:101)
>         at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
>         at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
>         at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
>         at
> com.google.cloud.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:179)
>         at
> com.google.cloud.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:107)
>         Suppressed: java.lang.IllegalStateException: Already closed.
>                 at
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:97)
>                 at
> com.google.cloud.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:93)
>                 at
> com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:89)
>                 ... 6 more
>
> Looks like the bigquery API is failing to detect the nullable type
> NullString, and instead is attempting a plain assignment. Could it be that
> some aspect of the type information has been lost thus preventing the
> bigquery API from identifying and handling NullString properly?
>
>