You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Ben San Nicolas via user <us...@beam.apache.org> on 2023/02/23 22:00:03 UTC

Unable to read bigquery table with repeated int64 field

Basically this line throws an exception:
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L700

I have a repeated field of int64 in bigquery and when BigQueryIO reads it,
it assumes the value is a map and tries to extract the "v" key from it. But
it's not a map, it's just an int64. I'm not sure why it's expecting a map
but it seems very much intentional: https://github.com/apache/beam/pull/8814

Here is a stacktrace:

Caused by: java.lang.IllegalArgumentException: Unable to encode
element 'GenericData{classInfo=[f], {ds=2023-02-19,
enterprise_id=941601, window=2023-02-20 00:00:00 UTC,
UnManagedUsers_users_sum=1, UnManagedUsers_pastMonth_sum=1,
UnManagedUsers_ids_set=[2031397633]}}' with coder 'SchemaCoder<Schema:
Fields:
Field{name=ds, description=, type=LOGICAL_TYPE, options={{}}}
Field{name=enterprise_id, description=, type=INT64, options={{}}}
Field{name=window, description=, type=DATETIME, options={{}}}
Field{name=UnManagedUsers_users_sum, description=, type=INT64, options={{}}}
Field{name=UnManagedUsers_pastMonth_sum, description=, type=INT64, options={{}}}
Field{name=UnManagedUsers_ids_set, description=, type=ARRAY<INT64 NOT
NULL> NOT NULL, options={{}}}
Encoding positions:
{UnManagedUsers_ids_set=5, UnManagedUsers_users_sum=3, window=2,
enterprise_id=1, UnManagedUsers_pastMonth_sum=4, ds=0}
Options:{{}}UUID: 57f12468-88cf-42e9-97b6-7476c2fcac4c  UUID:
57f12468-88cf-42e9-97b6-7476c2fcac4c delegateCoder:
org.apache.beam.sdk.coders.Coder$ByteBuddy$gNX79fNU@24b36691'.
	org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300)
	org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291)
	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$SampleByteSizeDistribution.tryUpdate(PCollectionConsumerRegistry.java:461)
	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MultiplexingMetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:385)
	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MultiplexingMetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:329)
Caused by: java.lang.ClassCastException: class java.lang.String cannot
be cast to class java.util.Map (java.lang.String and java.util.Map are
in module java.base of loader 'bootstrap')
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamValue$12(BigQueryUtils.java:704)


Is this a bug?

Thanks