You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/22 02:31:00 UTC

[jira] [Commented] (BEAM-3088) BigQuery source should consider streaming buffer when determining estimated sizes of tables

    [ https://issues.apache.org/jira/browse/BEAM-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214157#comment-16214157 ] 

ASF GitHub Bot commented on BEAM-3088:
--------------------------------------

GitHub user chamikaramj opened a pull request:

    https://github.com/apache/beam/pull/4025

    [BEAM-3088] Improves size estimation of BigQueryTableSource.

    Updates BigQueryTableSource to consider data in streaming buffer when determining estimated size.
    
    Follow this checklist to help us incorporate your contribution quickly and easily:
    
     - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it).  Trivial changes like typos do not require a JIRA issue.  Your pull request should address just this issue, without pulling in other changes.
     - [ ] Each commit in the pull request should have a meaningful subject line and body.
     - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue.
     - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
     - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
     - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
    
    ---


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/beam bq_size_estimation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/4025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4025
    
----
commit 501b43800e95a8722315c43c7379725407d04f7c
Author: chamikara@google.com <ch...@google.com>
Date:   2017-10-22T02:20:07Z

    Updates BigQueryTableSource to consider data in streaming buffer when determining estimated size.

----


> BigQuery source should consider streaming buffer when determining estimated sizes of tables
> -------------------------------------------------------------------------------------------
>
>                 Key: BEAM-3088
>                 URL: https://issues.apache.org/jira/browse/BEAM-3088
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
>
> Currently BigQuery table source determines estimated size using table.numBytes property.
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryTableSource.java#L100
> If BigQuery table has data in the streaming buffer, size of that data will not be reflected by table.numBytes. To better estimate size of table, data in the streaming buffer has to be considered as well. Size of data in streaming buffer can be determined based on property streamingBuffer.estimatedBytes according to following.
> https://cloud.google.com/bigquery/docs/reference/rest/v2/tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)