You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/17 01:13:00 UTC

[jira] [Work logged] (BEAM-4263) BigQuery connector reads the table size value from a deprecated field

     [ https://issues.apache.org/jira/browse/BEAM-4263?focusedWorklogId=102744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-102744 ]

ASF GitHub Bot logged work on BEAM-4263:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/May/18 01:12
            Start Date: 17/May/18 01:12
    Worklog Time Spent: 10m 
      Work Description: chamikaramj closed pull request #5306: [BEAM-4263] Bugfix: Read BQ bytes processed from correct field.
URL: https://github.com/apache/beam/pull/5306
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java
index f380b7d391b..979f8b9d1b3 100644
--- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java
+++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java
@@ -102,7 +102,7 @@ private BigQueryQuerySource(
   @Override
   public long getEstimatedSizeBytes(PipelineOptions options) throws Exception {
     BigQueryOptions bqOptions = options.as(BigQueryOptions.class);
-    return dryRunQueryIfNeeded(bqOptions).getTotalBytesProcessed();
+    return dryRunQueryIfNeeded(bqOptions).getQuery().getTotalBytesProcessed();
   }
 
   @Override
diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java
index b6fbe4905b4..213ee3d45f0 100644
--- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java
+++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java
@@ -560,6 +560,40 @@ public void testEstimatedSizeWithStreamingBuffer() throws Exception {
     assertEquals(118, bqSource.getEstimatedSizeBytes(options));
   }
 
+  @Test
+  public void testBigQueryQuerySourceEstimatedSize() throws Exception {
+
+    List<TableRow> data = ImmutableList.of(
+        new TableRow().set("name", "A").set("number", 10L),
+        new TableRow().set("name", "B").set("number", 11L),
+        new TableRow().set("name", "C").set("number", 12L));
+
+    PipelineOptions options = PipelineOptionsFactory.create();
+    BigQueryOptions bqOptions = options.as(BigQueryOptions.class);
+    bqOptions.setProject("project");
+    String stepUuid = "testStepUuid";
+
+    String query = FakeBigQueryServices.encodeQuery(data);
+    BigQueryQuerySource<TableRow> bqSource = BigQueryQuerySource.create(
+        stepUuid,
+        ValueProvider.StaticValueProvider.of(query),
+        true /* flattenResults */,
+        true /* useLegacySql */,
+        fakeBqServices,
+        TableRowJsonCoder.of(),
+        BigQueryIO.TableRowParser.INSTANCE,
+        QueryPriority.BATCH,
+        null);
+
+    fakeJobService.expectDryRunQuery(
+        bqOptions.getProject(),
+        query,
+        new JobStatistics().setQuery(
+            new JobStatistics2().setTotalBytesProcessed(100L)));
+
+    assertEquals(100, bqSource.getEstimatedSizeBytes(bqOptions));
+  }
+
   @Test
   public void testBigQueryQuerySourceInitSplit() throws Exception {
     TableReference dryRunTable = new TableReference();


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 102744)
            Time Spent: 10m
    Remaining Estimate: 0h

> BigQuery connector reads the table size value from a deprecated field
> ---------------------------------------------------------------------
>
>                 Key: BEAM-4263
>                 URL: https://issues.apache.org/jira/browse/BEAM-4263
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 3.0.0, 2.5.0
>            Reporter: Kenneth Jung
>            Assignee: Kenneth Jung
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The BigQuery connector in the GCP IO module reads the totalBytesProcessed value from a deprecated field in the job statistics:
> [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs]
> The non-deprecated replacement is the totalBytesProcessed field in the query statistics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)