You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Josh <jo...@gmail.com> on 2017/04/07 17:03:24 UTC

BigQueryIO - Why is CREATE_NEVER not supported when using a tablespec?

Hi all,

I have a use case where I want to stream into BigQuery, using a tablespec
but with CreateDisposition.CREATE_NEVER.I want to partition/shard my data
by date, and use BigQuery's date partitioning feature within a single table
(rather than creating a new BigQuery table for every day). In this case
writes would be made to a partition in a single table, e.g.
`my-project:dataset.my_table$20170407`, and in my tablespec I would just be
choosing the partition decorator using the window.

Unfortunately this doesn't seem possible with BigQueryIO at the moment,
because it requires me to use CreateDisposition.CREATE_IF_NEEDED. I can't
use CreateDisposition.CREATE_IF_NEEDED because it requires me to provide a
table schema and my BigQuery schema isn't available at compile time.

Is there any good reason why CREATE_NEVER is not allowed when using a
tablespec?

Thanks,
Josh

Re: BigQueryIO - Why is CREATE_NEVER not supported when using a tablespec?

Posted by Josh <jo...@gmail.com>.
Hi Dan,

Ok great thanks for confirming. I will create a JIRA and submit a PR to
remove this check then.

Thanks,
Josh

On Fri, Apr 7, 2017 at 6:09 PM, Dan Halperin <dh...@apache.org> wrote:

> Hi Josh,
> You raise a good point. I think we had put this check in (long before
> partition tables existed) because we need schema to create a table and we
> assumed the number of tables would be unbounded. But now it's an outdated
> check, overly conservative, and probably should be removed.
>
> Would you like to send a PR to fix this?
>
> Thanks,
> Dan
>
>
> On Fri, Apr 7, 2017 at 10:03 AM, Josh <jo...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a use case where I want to stream into BigQuery, using a tablespec
>> but with CreateDisposition.CREATE_NEVER.I want to partition/shard my
>> data by date, and use BigQuery's date partitioning feature within a single
>> table (rather than creating a new BigQuery table for every day). In this
>> case writes would be made to a partition in a single table, e.g.
>> `my-project:dataset.my_table$20170407`, and in my tablespec I would just
>> be choosing the partition decorator using the window.
>>
>> Unfortunately this doesn't seem possible with BigQueryIO at the moment,
>> because it requires me to use CreateDisposition.CREATE_IF_NEEDED. I
>> can't use CreateDisposition.CREATE_IF_NEEDED because it requires me to
>> provide a table schema and my BigQuery schema isn't available at compile
>> time.
>>
>> Is there any good reason why CREATE_NEVER is not allowed when using a
>> tablespec?
>>
>> Thanks,
>> Josh
>>
>
>

Re: BigQueryIO - Why is CREATE_NEVER not supported when using a tablespec?

Posted by Dan Halperin <dh...@apache.org>.
Hi Josh,
You raise a good point. I think we had put this check in (long before
partition tables existed) because we need schema to create a table and we
assumed the number of tables would be unbounded. But now it's an outdated
check, overly conservative, and probably should be removed.

Would you like to send a PR to fix this?

Thanks,
Dan


On Fri, Apr 7, 2017 at 10:03 AM, Josh <jo...@gmail.com> wrote:

> Hi all,
>
> I have a use case where I want to stream into BigQuery, using a tablespec
> but with CreateDisposition.CREATE_NEVER.I want to partition/shard my data
> by date, and use BigQuery's date partitioning feature within a single table
> (rather than creating a new BigQuery table for every day). In this case
> writes would be made to a partition in a single table, e.g.
> `my-project:dataset.my_table$20170407`, and in my tablespec I would just
> be choosing the partition decorator using the window.
>
> Unfortunately this doesn't seem possible with BigQueryIO at the moment,
> because it requires me to use CreateDisposition.CREATE_IF_NEEDED. I can't
> use CreateDisposition.CREATE_IF_NEEDED because it requires me to provide
> a table schema and my BigQuery schema isn't available at compile time.
>
> Is there any good reason why CREATE_NEVER is not allowed when using a
> tablespec?
>
> Thanks,
> Josh
>