You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Lukasz Cwik <lc...@google.com> on 2018/08/29 16:32:07 UTC

Re: BigqueryIO field clustering

+dev@beam.apache.org

Wout, I assigned this task to you since it seems like your interested in
contributing.
The Apache Beam contribution guide[1] is a good place to start for
answering questions on how to contribute.

If you need help in getting stuff reviewed or having questions, feel free
to reach out on dev@beam.apache.org or on Slack.

1: https://beam.apache.org/contribute/


On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers <
Wout.Scheepers@vente-exclusive.com> wrote:

> Hey all,
>
>
>
> I’m trying to use the field clustering beta feature in bigquery [1].
>
> However, the current Beam/dataflow worker bigquery api service dependency
> is ‘google-api-services-bigquery: com.google.apis: v2-rev374-1.23.0’, which
> does not include the clustering option in the TimePartitioning class.
>
> Hereby, I can’t specify the clustering field when loading/streaming into
> bigquery. See [2] for the bigquery api error details.
>
>
>
> Does anyone know a workaround for this?
>
>
>
> I guess that in the worst case I’ll have to wait until Beam supports a
> newer version of the bigquery api service.
>
> 1.    After checking the Beam Jira I’ve found BEAM-5191
> <https://jira.apache.org/jira/browse/BEAM-5191>. Is there any way I can
> help to push this forward and make this feature possible in the near future?
>
>
>
> Thanks in advance,
>
> Wout
>
>
>
> [1] https://cloud.google.com/bigquery/docs/clustered-tables
>
> [2] "errorResult" : {
>
>       "message" : "Incompatible table partitioning specification. Expects
> partitioning specification interval(type:day,field:publish_time)
> clustering(clustering_id), but input partitioning specification is
> interval(type:day,field:publish_time)",
>
>       "reason" : "invalid"
>
>     }
>

Re: BigqueryIO field clustering

Posted by Chamikara Jayalath <ch...@google.com>.
Sent some comments. Thanks.

On Fri, Dec 21, 2018 at 2:15 AM Maximilian Michels <mx...@apache.org> wrote:

> Any updates on this? The pull request is already open for a month.
>
> I think we should at least provide some basic feedback, e.g. whether we
> intend
> to merge the PR, any problems with the code or tests.
>
> I'd like to help reviewing it but I feel like someone familiar with
> BigQuery
> should have a look first.
>
> Thanks,
> Max
>
> PS: https://github.com/apache/beam/pull/7061
>
> On 28.11.18 19:27, Chamikara Jayalath wrote:
> > Thanks for the contribution. I can take a look later this week.
> >
> > On Wed, Nov 28, 2018 at 12:29 AM Wout Scheepers
> > <Wout.Scheepers@vente-exclusive.com <mailto:
> Wout.Scheepers@vente-exclusive.com>>
> > wrote:
> >
> >     Hey all,____
> >
> >     __ __
> >
> >     Almost two weeks ago, I create a PR to support BigQuery clustering
> [1].____
> >
> >     Can someone please have a look?____
> >
> >     __ __
> >
> >     Thanks,____
> >
> >     Wout____
> >
> >     __ __
> >
> >     1: https://github.com/apache/beam/pull/7061____
> >
> >     __ __
> >
> >     __ __
> >
> >     *From: *Lukasz Cwik <lcwik@google.com <ma...@google.com>>
> >     *Reply-To: *"user@beam.apache.org <ma...@beam.apache.org>"
> >     <user@beam.apache.org <ma...@beam.apache.org>>
> >     *Date: *Wednesday, 29 August 2018 at 18:32
> >     *To: *dev <dev@beam.apache.org <ma...@beam.apache.org>>,
> >     "user@beam.apache.org <ma...@beam.apache.org>" <
> user@beam.apache.org
> >     <ma...@beam.apache.org>>
> >     *Cc: *Bob De Schutter <Bob.DeSchutter@vente-exclusive.com
> >     <ma...@vente-exclusive.com>>
> >     *Subject: *Re: BigqueryIO field clustering____
> >
> >     __ __
> >
> >     +dev@beam.apache.org <ma...@beam.apache.org> ____
> >
> >     __ __
> >
> >     Wout, I assigned this task to you since it seems like your
> interested in
> >     contributing.____
> >
> >     The Apache Beam contribution guide[1] is a good place to start for
> answering
> >     questions on how to contribute.____
> >
> >     __ __
> >
> >     If you need help in getting stuff reviewed or having questions, feel
> free to
> >     reach out on dev@beam.apache.org <ma...@beam.apache.org> or on
> Slack.____
> >
> >     __ __
> >
> >     1: https://beam.apache.org/contribute/____
> >
> >     __ __
> >
> >     __ __
> >
> >     On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers
> >     <Wout.Scheepers@vente-exclusive.com
> >     <ma...@vente-exclusive.com>> wrote:____
> >
> >         Hey all,____
> >
> >         ____
> >
> >         I’m trying to use the field clustering beta feature in bigquery
> [1].____
> >
> >         However, the current Beam/dataflow worker bigquery api service
> >         dependency is ‘google-api-services-bigquery: com.google.apis:
> >         v2-rev374-1.23.0’, which does not include the clustering option
> in the
> >         TimePartitioning class.____
> >
> >         Hereby, I can’t specify the clustering field when
> loading/streaming into
> >         bigquery. See [2] for the bigquery api error details.____
> >
> >         ____
> >
> >         Does anyone know a workaround for this? ____
> >
> >         ____
> >
> >         I guess that in the worst case I’ll have to wait until Beam
> supports a
> >         newer version of the bigquery api service.____
> >
> >         1.After checking the Beam Jira I’ve found BEAM-5191
> >         <https://jira.apache.org/jira/browse/BEAM-5191>. Is there any
> way I can
> >         help to push this forward and make this feature possible in the
> near
> >         future?____
> >
> >         ____
> >
> >         Thanks in advance,____
> >
> >         Wout____
> >
> >         ____
> >
> >         [1] https://cloud.google.com/bigquery/docs/clustered-tables____
> >
> >         [2] "errorResult" : {____
> >
> >                "message" : "Incompatible table partitioning
> specification.
> >         Expects partitioning specification
> interval(type:day,field:publish_time)
> >         clustering(clustering_id), but input partitioning specification
> is
> >         interval(type:day,field:publish_time)",____
> >
> >                "reason" : "invalid"____
> >
> >              }____
> >
>

Re: BigqueryIO field clustering

Posted by Maximilian Michels <mx...@apache.org>.
Any updates on this? The pull request is already open for a month.

I think we should at least provide some basic feedback, e.g. whether we intend 
to merge the PR, any problems with the code or tests.

I'd like to help reviewing it but I feel like someone familiar with BigQuery 
should have a look first.

Thanks,
Max

PS: https://github.com/apache/beam/pull/7061

On 28.11.18 19:27, Chamikara Jayalath wrote:
> Thanks for the contribution. I can take a look later this week.
> 
> On Wed, Nov 28, 2018 at 12:29 AM Wout Scheepers 
> <Wout.Scheepers@vente-exclusive.com <ma...@vente-exclusive.com>> 
> wrote:
> 
>     Hey all,____
> 
>     __ __
> 
>     Almost two weeks ago, I create a PR to support BigQuery clustering [1].____
> 
>     Can someone please have a look?____
> 
>     __ __
> 
>     Thanks,____
> 
>     Wout____
> 
>     __ __
> 
>     1: https://github.com/apache/beam/pull/7061____
> 
>     __ __
> 
>     __ __
> 
>     *From: *Lukasz Cwik <lcwik@google.com <ma...@google.com>>
>     *Reply-To: *"user@beam.apache.org <ma...@beam.apache.org>"
>     <user@beam.apache.org <ma...@beam.apache.org>>
>     *Date: *Wednesday, 29 August 2018 at 18:32
>     *To: *dev <dev@beam.apache.org <ma...@beam.apache.org>>,
>     "user@beam.apache.org <ma...@beam.apache.org>" <user@beam.apache.org
>     <ma...@beam.apache.org>>
>     *Cc: *Bob De Schutter <Bob.DeSchutter@vente-exclusive.com
>     <ma...@vente-exclusive.com>>
>     *Subject: *Re: BigqueryIO field clustering____
> 
>     __ __
> 
>     +dev@beam.apache.org <ma...@beam.apache.org> ____
> 
>     __ __
> 
>     Wout, I assigned this task to you since it seems like your interested in
>     contributing.____
> 
>     The Apache Beam contribution guide[1] is a good place to start for answering
>     questions on how to contribute.____
> 
>     __ __
> 
>     If you need help in getting stuff reviewed or having questions, feel free to
>     reach out on dev@beam.apache.org <ma...@beam.apache.org> or on Slack.____
> 
>     __ __
> 
>     1: https://beam.apache.org/contribute/____
> 
>     __ __
> 
>     __ __
> 
>     On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers
>     <Wout.Scheepers@vente-exclusive.com
>     <ma...@vente-exclusive.com>> wrote:____
> 
>         Hey all,____
> 
>         ____
> 
>         I’m trying to use the field clustering beta feature in bigquery [1].____
> 
>         However, the current Beam/dataflow worker bigquery api service
>         dependency is ‘google-api-services-bigquery: com.google.apis:
>         v2-rev374-1.23.0’, which does not include the clustering option in the
>         TimePartitioning class.____
> 
>         Hereby, I can’t specify the clustering field when loading/streaming into
>         bigquery. See [2] for the bigquery api error details.____
> 
>         ____
> 
>         Does anyone know a workaround for this? ____
> 
>         ____
> 
>         I guess that in the worst case I’ll have to wait until Beam supports a
>         newer version of the bigquery api service.____
> 
>         1.After checking the Beam Jira I’ve found BEAM-5191
>         <https://jira.apache.org/jira/browse/BEAM-5191>. Is there any way I can
>         help to push this forward and make this feature possible in the near
>         future?____
> 
>         ____
> 
>         Thanks in advance,____
> 
>         Wout____
> 
>         ____
> 
>         [1] https://cloud.google.com/bigquery/docs/clustered-tables____
> 
>         [2] "errorResult" : {____
> 
>                "message" : "Incompatible table partitioning specification.
>         Expects partitioning specification interval(type:day,field:publish_time)
>         clustering(clustering_id), but input partitioning specification is
>         interval(type:day,field:publish_time)",____
> 
>                "reason" : "invalid"____
> 
>              }____
> 

Re: BigqueryIO field clustering

Posted by Chamikara Jayalath <ch...@google.com>.
Thanks for the contribution. I can take a look later this week.

On Wed, Nov 28, 2018 at 12:29 AM Wout Scheepers <
Wout.Scheepers@vente-exclusive.com> wrote:

> Hey all,
>
>
>
> Almost two weeks ago, I create a PR to support BigQuery clustering [1].
>
> Can someone please have a look?
>
>
>
> Thanks,
>
> Wout
>
>
>
> 1: https://github.com/apache/beam/pull/7061
>
>
>
>
>
> *From: *Lukasz Cwik <lc...@google.com>
> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
> *Date: *Wednesday, 29 August 2018 at 18:32
> *To: *dev <de...@beam.apache.org>, "user@beam.apache.org" <
> user@beam.apache.org>
> *Cc: *Bob De Schutter <Bo...@vente-exclusive.com>
> *Subject: *Re: BigqueryIO field clustering
>
>
>
> +dev@beam.apache.org
>
>
>
> Wout, I assigned this task to you since it seems like your interested in
> contributing.
>
> The Apache Beam contribution guide[1] is a good place to start for
> answering questions on how to contribute.
>
>
>
> If you need help in getting stuff reviewed or having questions, feel free
> to reach out on dev@beam.apache.org or on Slack.
>
>
>
> 1: https://beam.apache.org/contribute/
>
>
>
>
>
> On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers <
> Wout.Scheepers@vente-exclusive.com> wrote:
>
> Hey all,
>
>
>
> I’m trying to use the field clustering beta feature in bigquery [1].
>
> However, the current Beam/dataflow worker bigquery api service dependency
> is ‘google-api-services-bigquery: com.google.apis: v2-rev374-1.23.0’, which
> does not include the clustering option in the TimePartitioning class.
>
> Hereby, I can’t specify the clustering field when loading/streaming into
> bigquery. See [2] for the bigquery api error details.
>
>
>
> Does anyone know a workaround for this?
>
>
>
> I guess that in the worst case I’ll have to wait until Beam supports a
> newer version of the bigquery api service.
>
> 1.    After checking the Beam Jira I’ve found BEAM-5191
> <https://jira.apache.org/jira/browse/BEAM-5191>. Is there any way I can
> help to push this forward and make this feature possible in the near future?
>
>
>
> Thanks in advance,
>
> Wout
>
>
>
> [1] https://cloud.google.com/bigquery/docs/clustered-tables
>
> [2] "errorResult" : {
>
>       "message" : "Incompatible table partitioning specification. Expects
> partitioning specification interval(type:day,field:publish_time)
> clustering(clustering_id), but input partitioning specification is
> interval(type:day,field:publish_time)",
>
>       "reason" : "invalid"
>
>     }
>
>

Re: BigqueryIO field clustering

Posted by Wout Scheepers <Wo...@vente-exclusive.com>.
Hey all,

Almost two weeks ago, I create a PR to support BigQuery clustering [1].
Can someone please have a look?

Thanks,
Wout

1: https://github.com/apache/beam/pull/7061


From: Lukasz Cwik <lc...@google.com>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Wednesday, 29 August 2018 at 18:32
To: dev <de...@beam.apache.org>, "user@beam.apache.org" <us...@beam.apache.org>
Cc: Bob De Schutter <Bo...@vente-exclusive.com>
Subject: Re: BigqueryIO field clustering

+dev@beam.apache.org<ma...@beam.apache.org>

Wout, I assigned this task to you since it seems like your interested in contributing.
The Apache Beam contribution guide[1] is a good place to start for answering questions on how to contribute.

If you need help in getting stuff reviewed or having questions, feel free to reach out on dev@beam.apache.org<ma...@beam.apache.org> or on Slack.

1: https://beam.apache.org/contribute/


On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers <Wo...@vente-exclusive.com>> wrote:
Hey all,

I’m trying to use the field clustering beta feature in bigquery [1].
However, the current Beam/dataflow worker bigquery api service dependency is ‘google-api-services-bigquery: com.google.apis: v2-rev374-1.23.0’, which does not include the clustering option in the TimePartitioning class.
Hereby, I can’t specify the clustering field when loading/streaming into bigquery. See [2] for the bigquery api error details.

Does anyone know a workaround for this?

I guess that in the worst case I’ll have to wait until Beam supports a newer version of the bigquery api service.
1.    After checking the Beam Jira I’ve found BEAM-5191<https://jira.apache.org/jira/browse/BEAM-5191>. Is there any way I can help to push this forward and make this feature possible in the near future?

Thanks in advance,
Wout

[1] https://cloud.google.com/bigquery/docs/clustered-tables
[2] "errorResult" : {
      "message" : "Incompatible table partitioning specification. Expects partitioning specification interval(type:day,field:publish_time) clustering(clustering_id), but input partitioning specification is interval(type:day,field:publish_time)",
      "reason" : "invalid"
    }