You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/30 18:49:00 UTC

[jira] [Commented] (BEAM-2700) BigQueryIO should support using file load jobs when using unbounded collections

    [ https://issues.apache.org/jira/browse/BEAM-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106604#comment-16106604 ] 

ASF GitHub Bot commented on BEAM-2700:
--------------------------------------

GitHub user reuvenlax opened a pull request:

    https://github.com/apache/beam/pull/3662

    [BEAM-2700] Support load jobs in streaming

    Allow BigQuery load jobs to be selected by the user even when using unbounded PCollections. If using unbounded PCollections, the user must specify a frequency indicating how often these load jobs will be generated.
    
    Note: while there are some similarities between the BigQuery transform and what is done in FileBasedSink, there are a enough differences that it does not appear easy or advisable to attempt to reuse the code.
    
    Note: a design choice is to only allow the user to specify a triggering frequency, not arbitrary windows. The reason is that this triggering frequency is merely a tuning parameter controlling the BigQuery load jobs and is usually set to keep the number of BQ load jobs under quota (ideally it wouldn't even be needed, however I don't know how to make this automatic and respect user quotas). There is no need for semantic windowing to control how often these writes happen.
    
    R:@jkff

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/reuvenlax/incubator-beam bq_load_jobs_in_streaming

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3662.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3662
    
----
commit 83fccf0cecb2b5eff1d4b814597c85256f2773f0
Author: Reuven Lax <re...@relax-macbookpro2.roam.corp.google.com>
Date:   2017-07-30T18:17:39Z

    Allow users to choose the BigQuery insertion method. If choosing file load jobs on an unbounded PCollection, a triggering frequency must be specified to control how often load jobs are generated.

commit 128984b00bb42782767ee34c74f3c6b234b83d93
Author: Reuven Lax <re...@relax-macbookpro2.roam.corp.google.com>
Date:   2017-07-30T18:36:12Z

    Cleanup

----


> BigQueryIO should support using file load jobs when using unbounded collections
> -------------------------------------------------------------------------------
>
>                 Key: BEAM-2700
>                 URL: https://issues.apache.org/jira/browse/BEAM-2700
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>    Affects Versions: 2.2.0
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>
> Currently the method used for inserting into BigQuery is based on the input PCollection. Bounded input using file load jobs, unbounded input uses streaming inserts. However while streaming inserts have far lower latency, then cost quite a bit more and they provide weaker consistency guarantees. Users should be able to choose which method to use, irrespective of the input PCollection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)