You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/09/19 17:08:02 UTC
[jira] [Commented] (BEAM-10544) Select Types not equal with nested
schema
[ https://issues.apache.org/jira/browse/BEAM-10544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198790#comment-17198790 ]
Beam JIRA Bot commented on BEAM-10544:
--------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.
> Select Types not equal with nested schema
> -----------------------------------------
>
> Key: BEAM-10544
> URL: https://issues.apache.org/jira/browse/BEAM-10544
> Project: Beam
> Issue Type: Bug
> Components: dsl-sql, sdk-java-core
> Reporter: Jacob Ferriero
> Priority: P2
> Labels: stale-P2
>
> When using SQL transform to join a large nested schema to a flat table getting an error about "Types not equal" from Select [1]
> We are not able the test of our use of SqlTransform to pass with direct runner. All code is checked into CSR [2].
> Things of note:
> Calcite Query Planner
> Query (the real business logic was much more complex but this is sufficient to reproduce issue in our test)
> ```sql
> SELECT
> t1.DeviceName AS DeviceName,
> t1.LinkName AS LinkName,
> t1.HostName AS HostName,
> t1.MeasuredAt AS MeasuredAt,
> t2.b_dBm AS b_dBm
> FROM
> RealtimeRows AS t1
> INNER JOIN
> --BigQuery Dimension Side Input
> TxPowerSideInput AS t2
> ON
> t1.DeviceName = t2.DeviceName
> ```
> Tables created like so (though in real tive )
> ```java
> // This table has the same schema to the real incoming Pub/Sub messages
> // in the real world use case.
> PCollection<Row> realtimeTestData = pipeline
> .apply("Read 1Hz staging",
> BigQueryIO
> .readTableRowsWithSchema()
> .fromQuery(
> "SELECT * FROM `taara-db.jake_views.staging_sample_float`")
> .usingStandardSql())
> .apply(Convert.toRows());
> PCollection<Row> txPowerCalcRows = pipeline
> .apply("Read Tx Power Calc Side Input",
> BigQueryIO
> .readTableRowsWithSchema()
> .fromQuery(
> "SELECT * FROM `taara-db`.MANUFACTURING.tx_power_timeinvariant_calculations")
> .usingStandardSql())
> .apply(Convert.toRows());
> ```
> Relevant java snippet
> ```java
> PCollection<Row> out = tables
> .apply(
> "Join to dimension Data",
> SqlTransform
> .query(sql)
> .registerUdf("POW", Pow.class)
> .registerUdf("SQRT", Sqrt.class)
> .registerUdf("LOG10", Log10.class)
> .registerUdf("GREATEST", Greatest.class)
> .registerUdf("EXTRACT_OFFSET", ExtractArrayOffset.class)
> .registerUdf("PARSE_TIMESTAMP", ParseTimestamp.class)
> .registerUdf("UNIX_SECONDS", UnixSeconds.class)
> );
> ```
> [1] https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Select.java#L203
> [2] https://source.cloud.google.com/taara-db/pso-taara-realtime-margin/+/master:streaming-join/streaming-join/src/test/java/com/google/x/taara/dataflow/transforms/RxTxPowersCorrFERCombinedSqlTransformIT.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)