You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Leonardo Alves Miguel (JIRA)" <ji...@apache.org> on 2017/10/31 17:44:00 UTC

[jira] [Comment Edited] (BEAM-2767) BigQueryIO result different for REPEATED field between DirectRunner and DataflowRunner

    [ https://issues.apache.org/jira/browse/BEAM-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227183#comment-16227183 ] 

Leonardo Alves Miguel edited comment on BEAM-2767 at 10/31/17 5:43 PM:
-----------------------------------------------------------------------

I had the same issue reported by [~jroxtheworld].

I'm using nested TableRows, doing something like:

TableRow a = new TableRow();
TableRow b = new TableRow();
a.set("fieldName", b);

Then, after applying some other PTransforms that does not change this field, I tried doing something like:

TableRow c = (TableRow) a.get("fieldName);

Running it in DataflowRunner works fine, but when I run it using DirectRunner, I get the following exception:

*"java.lang.ClassCastException: java.base/java.util.LinkedHashMap cannot be cast to com.google.api.services.bigquery.model.TableRow"*

It looks like some internal data type for TableRow is different in Direct and Dataflow runner.


was (Author: leonardoam94):
I had the same issue reported by [~jroxtheworld].

I'm using nested TableRows, doing something like:

TableRow a = new TableRow();
TableRow b = new TableRow();
a.set("fieldName", b);

Then, after applying some other PTransforms that does not change this field, I tried doing something like:

TableRow c = (TableRow) a.get("fieldName);

Running it in DataflowRunner works fine, but when I run it using DirectRunner, I get the following exception:

"java.lang.ClassCastException: java.base/java.util.LinkedHashMap cannot be cast to com.google.api.services.bigquery.model.TableRow"

It looks like some internal data type for TableRow is different in Direct and Dataflow runner.

> BigQueryIO result different for REPEATED field between DirectRunner and DataflowRunner
> --------------------------------------------------------------------------------------
>
>                 Key: BEAM-2767
>                 URL: https://issues.apache.org/jira/browse/BEAM-2767
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, runner-direct, sdk-java-gcp
>    Affects Versions: 2.0.0
>            Reporter: Andre
>            Assignee: Thomas Groh
>            Priority: Minor
>
> When running a query against BigQueryIO with a REPEATED RECORD field the behavior is different between DirectRunner and DataflowRunner. The field containing the repeated record has to be cast to access the records. Apparently the following implementations work for each runner but I would expect them to be the same as my pipeline otherwise only runs on one.
> DirectRunner:
> {code:java}
> ArrayList<LinkedHashMap> orderLines = (ArrayList<LinkedHashMap>) c.element().get("RepeatedField");
> {code}
> DataflowRunner:
> {code:java}
> ImmutableList<TableRow> orderLines = (ImmutableList<TableRow>) c.element().get("RepeatedField");
> {code}
> 				
> For example when using the ImmutableList implementation on DirectRunner the following exception is thrown:
> {code:java}
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to com.google.common.collect.ImmutableList
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)