You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Tomo Suzuki (Jira)" <ji...@apache.org> on 2019/12/26 19:07:00 UTC
[jira] [Closed] (BEAM-9010) BigQuery TableRow's size is
toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomo Suzuki closed BEAM-9010.
-----------------------------
Fix Version/s: Not applicable
Resolution: Fixed
Fixed by GitHub Pull Request #10444
> BigQuery TableRow's size is toString().length() ?
> -------------------------------------------------
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Tomo Suzuki
> Assignee: Tomo Suzuki
> Priority: Minor
> Fix For: Not applicable
>
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] and [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
> dataSize += row.toString().length();
> if (dataSize >= maxRowBatchSize
> || rows.size() >= maxRowsPerBatch
> || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery classes?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)