You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Himanshu Hazari via user <us...@beam.apache.org> on 2023/04/24 04:46:32 UTC

JDBC to BIgquery table create

I am new to Apache beam so please forgive me for the silly question but Is it possible to fetch the data from JDBC and create it in BigQuery.
I can see template where data is written to existing  BigQuery table.
But I want to create table based on the query which I run, no existing table in Bq,  so it should have same column names and need to create bigquery table dynamically each time I run the workflow.



---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Privacy of communications
In order to monitor compliance with legal and regulatory obligations and our policies, procedures and compliance programs, we may review emails and instant messages passing through our IT systems (including any personal data and customer information they contain), and record telephone calls routed via our telephone systems. We will only do so in accordance with local laws and regulations. In some countries please refer to your local DB website for a copy of our Privacy Policy.

Please refer to https://db.com/disclosures for additional EU corporate and regulatory disclosures.

Re: JDBC to BIgquery table create

Posted by Bruno Volpato via user <us...@beam.apache.org>.
Thanks for tagging Ahmed!

That's correct, that template isn't prepared to create tables.
Creating tables is a bit complicated because you need to be able to infer a
schema from the data being read, which may not be easy to generalize for
all cases.

The template accepts an arbitrary SQL query, and it may not be easy to go
from the given ResultSet to an accurate BigQuery schema.
If you have confidence in providing the schema, steps for
customizing/publishing the template are posted here
<https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/googlecloud-to-googlecloud/README_Jdbc_to_BigQuery_Flex.md>
.

BigQueryIO allows you to group elements to a specific destination, and
provide schemas based on destinations during runtime.
Take a look at this JavaDoc
<https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html>
for an example.

Best,
Bruno

On Mon, Apr 24, 2023 at 10:24 AM Ahmed Abualsaud <ah...@google.com>
wrote:

> Are you using the JdbcToBigQuery
> <https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/googlecloud-to-googlecloud/src/main/java/com/google/cloud/teleport/v2/templates/JdbcToBigQuery.java> template?
> That template uses `CREATE_NEVER` so it does not create a BigQuery table.
>
> My guess is it was designed this way because creating the table requires
> that you know the data schema before the pipeline runs, whereas the current
> implementation allows you to just funnel the data to BigQuery. If you know
> what your data schema is, you can fork the template and add it to the
> BigQuery write transform here
> <https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/77113e91695e620a2e1378690ea53afe6d884670/v2/googlecloud-to-googlecloud/src/main/java/com/google/cloud/teleport/v2/templates/JdbcToBigQuery.java#L138> with
> `withBeamSchema(schema)`. You would also need to set
> `CreateDisposition.CREATE_IF_NEEDED`.
>
> +Bruno Volpato <bv...@google.com>
>
> On Mon, Apr 24, 2023 at 9:52 AM XQ Hu via user <us...@beam.apache.org>
> wrote:
>
>> Do you mean creating the BigQuery table when it does not exist? If so,
>> you can check
>> https://beam.apache.org/documentation/io/built-in/google-bigquery/#create-disposition
>> .
>>
>> On Mon, Apr 24, 2023 at 12:47 AM Himanshu Hazari via user <
>> user@beam.apache.org> wrote:
>>
>>> I am new to Apache beam so please forgive me for the silly question but
>>> Is it possible to fetch the data from JDBC and create it in BigQuery.
>>>
>>> I can see template where data is written to *existing*  BigQuery table.
>>>
>>> But I want to create table based on the query which I run, no existing
>>> table in Bq,  so it should have same column names and need to create
>>> bigquery table dynamically each time I run the workflow.
>>>
>>>
>>>
>>>
>>> ---
>>> This e-mail may contain confidential and/or privileged information. If
>>> you are not the intended recipient (or have received this e-mail in error)
>>> please notify the sender immediately and destroy this e-mail. Any
>>> unauthorized copying, disclosure or distribution of the material in this
>>> e-mail is strictly forbidden.
>>>
>>> Privacy of communications
>>> In order to monitor compliance with legal and regulatory obligations and
>>> our policies, procedures and compliance programs, we may review emails and
>>> instant messages passing through our IT systems (including any personal
>>> data and customer information they contain), and record telephone calls
>>> routed via our telephone systems. We will only do so in accordance with
>>> local laws and regulations. In some countries please refer to your local DB
>>> website for a copy of our Privacy Policy.
>>>
>>> Please refer to https://db.com/disclosures for additional EU corporate
>>> and regulatory disclosures.
>>>
>>

Re: JDBC to BIgquery table create

Posted by Ahmed Abualsaud via user <us...@beam.apache.org>.
Are you using the JdbcToBigQuery
<https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/googlecloud-to-googlecloud/src/main/java/com/google/cloud/teleport/v2/templates/JdbcToBigQuery.java>
template?
That template uses `CREATE_NEVER` so it does not create a BigQuery table.

My guess is it was designed this way because creating the table requires
that you know the data schema before the pipeline runs, whereas the current
implementation allows you to just funnel the data to BigQuery. If you know
what your data schema is, you can fork the template and add it to the
BigQuery write transform here
<https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/77113e91695e620a2e1378690ea53afe6d884670/v2/googlecloud-to-googlecloud/src/main/java/com/google/cloud/teleport/v2/templates/JdbcToBigQuery.java#L138>
with
`withBeamSchema(schema)`. You would also need to set
`CreateDisposition.CREATE_IF_NEEDED`.

+Bruno Volpato <bv...@google.com>

On Mon, Apr 24, 2023 at 9:52 AM XQ Hu via user <us...@beam.apache.org> wrote:

> Do you mean creating the BigQuery table when it does not exist? If so, you
> can check
> https://beam.apache.org/documentation/io/built-in/google-bigquery/#create-disposition
> .
>
> On Mon, Apr 24, 2023 at 12:47 AM Himanshu Hazari via user <
> user@beam.apache.org> wrote:
>
>> I am new to Apache beam so please forgive me for the silly question but
>> Is it possible to fetch the data from JDBC and create it in BigQuery.
>>
>> I can see template where data is written to *existing*  BigQuery table.
>>
>> But I want to create table based on the query which I run, no existing
>> table in Bq,  so it should have same column names and need to create
>> bigquery table dynamically each time I run the workflow.
>>
>>
>>
>>
>> ---
>> This e-mail may contain confidential and/or privileged information. If
>> you are not the intended recipient (or have received this e-mail in error)
>> please notify the sender immediately and destroy this e-mail. Any
>> unauthorized copying, disclosure or distribution of the material in this
>> e-mail is strictly forbidden.
>>
>> Privacy of communications
>> In order to monitor compliance with legal and regulatory obligations and
>> our policies, procedures and compliance programs, we may review emails and
>> instant messages passing through our IT systems (including any personal
>> data and customer information they contain), and record telephone calls
>> routed via our telephone systems. We will only do so in accordance with
>> local laws and regulations. In some countries please refer to your local DB
>> website for a copy of our Privacy Policy.
>>
>> Please refer to https://db.com/disclosures for additional EU corporate
>> and regulatory disclosures.
>>
>

Re: JDBC to BIgquery table create

Posted by XQ Hu via user <us...@beam.apache.org>.
Do you mean creating the BigQuery table when it does not exist? If so, you
can check
https://beam.apache.org/documentation/io/built-in/google-bigquery/#create-disposition
.

On Mon, Apr 24, 2023 at 12:47 AM Himanshu Hazari via user <
user@beam.apache.org> wrote:

> I am new to Apache beam so please forgive me for the silly question but Is
> it possible to fetch the data from JDBC and create it in BigQuery.
>
> I can see template where data is written to *existing*  BigQuery table.
>
> But I want to create table based on the query which I run, no existing
> table in Bq,  so it should have same column names and need to create
> bigquery table dynamically each time I run the workflow.
>
>
>
>
> ---
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and destroy this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Privacy of communications
> In order to monitor compliance with legal and regulatory obligations and
> our policies, procedures and compliance programs, we may review emails and
> instant messages passing through our IT systems (including any personal
> data and customer information they contain), and record telephone calls
> routed via our telephone systems. We will only do so in accordance with
> local laws and regulations. In some countries please refer to your local DB
> website for a copy of our Privacy Policy.
>
> Please refer to https://db.com/disclosures for additional EU corporate
> and regulatory disclosures.
>