You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/10 16:34:29 UTC

[GitHub] [airflow] smowden opened a new pull request #13598: add examples for BigQuery load and extract jobs

smowden opened a new pull request #13598:
URL: https://github.com/apache/airflow/pull/13598


   the old operators are deprecated and it is recommended to do it this way now. However there are no examples. Hopefully this saves people some time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-797147935


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-767024044


   [The Workflow run](https://github.com/apache/airflow/actions/runs/510059546) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-758173430


   [The Workflow run](https://github.com/apache/airflow/actions/runs/478010255) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r554967821



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -199,3 +233,4 @@
         execute_insert_query >> get_data >> get_data_result >> delete_dataset
         execute_insert_query >> execute_query_save >> bigquery_execute_multi_query >> delete_dataset
         execute_insert_query >> [check_count, check_value, check_interval] >> delete_dataset
+        insert_query_job >> extract_job >> load_job

Review comment:
       ```suggestion
           insert_query_job >> extract_job >> load_job >> delete_dataset
   ```
   So we avoid deleting dataset before 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r555638888



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       System tests for BigQuery: https://github.com/apache/airflow/blob/master/tests/providers/google/cloud/operators/test_bigquery_system.py
   Docs for system tests:
   https://github.com/apache/airflow/blob/master/TESTING.rst#airflow-system-tests




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r555751716



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       No because this will require gcs test to be run before running bq test. Creating a bucket is simple like that:
   https://github.com/apache/airflow/blob/7c5cdcf30b2967c3e9f3652e58021a8553be1029/tests/providers/google/cloud/operators/test_bigquery_system.py#L38
   
   also, if you need to a file to be uploaded to this bucket you can do:
   https://github.com/apache/airflow/blob/7c5cdcf30b2967c3e9f3652e58021a8553be1029/tests/providers/google/cloud/operators/test_dataproc_system.py#L56




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] smowden commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
smowden commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r555746578



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       i found one for the gcs example dag, can I just use that one?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] smowden commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
smowden commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r555633576



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       @turbaszek where can I find the code for that? is it outside of this repo?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-758172557


   [The Workflow run](https://github.com/apache/airflow/actions/runs/477968434) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r561746380



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -40,9 +40,12 @@
 PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
 DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset")
 LOCATION = "southamerica-east1"
+BUCKET_1 = os.environ.get("GCP_GCS_BUCKET_1", "test-gcs-example-bucket")

Review comment:
       ```suggestion
   BUCKET_1 = os.environ.get("GCP_BIGQUERY_QUERIES_BUCKET", "test-biqgquery-queries-bucket")
   ```
   I found this env variable is used in `airflow/providers/google/cloud/example_dags/example_gcs.py`.
   I looked on the other system tests and env vars names are like: `GCP_DATAPROC_BUCKET` or `GCP_TEXT_TO_SPEECH_BUCKET`
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r561746380



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -40,9 +40,12 @@
 PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
 DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset")
 LOCATION = "southamerica-east1"
+BUCKET_1 = os.environ.get("GCP_GCS_BUCKET_1", "test-gcs-example-bucket")

Review comment:
       ```suggestion
   BUCKET_1 = os.environ.get("GCP_BIGQUERY_QUERIES_BUCKET", "test-biqgquery-queries-bucket")
   ```
   I found this env variable is used in `airflow/providers/google/cloud/example_dags/example_gcs.py`.
   I looked on the other system tests and env vars names are like: `GCP_DATAPROC_BUCKET` or `GCP_TEXT_TO_SPEECH_BUCKET`
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-767023352






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r554956795



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       This bucket should also be created in system test




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-758713349


   [The Workflow run](https://github.com/apache/airflow/actions/runs/480230529) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r554936086



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       Could you put bucket it in the environment variable? Avoiding hard coding will allow to run system tests (they execute example dag)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-758170813


   [The Workflow run](https://github.com/apache/airflow/actions/runs/477948632) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#discussion_r554936086



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_queries.py
##########
@@ -125,6 +126,39 @@
         )
         # [END howto_operator_bigquery_select_job]
 
+        extract_job = BigQueryInsertJobOperator(
+            task_id="extract_to_gcs_job",
+            configuration={
+                "extract": {
+                    "sourceTable": {
+                        "projectId": PROJECT_ID,
+                        "datasetId": DATASET_NAME,
+                        "tableId": TABLE_1,
+                    },
+                    "destinationUris": ["gs://example_bucket/dump/table.*.csv.gz"],

Review comment:
       Could you put bucket it in the environment variable? Avoiding hard coding will allow to run system tests




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13598:
URL: https://github.com/apache/airflow/pull/13598#issuecomment-767023352


   [The Workflow run](https://github.com/apache/airflow/actions/runs/510051434) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #13598: add examples for BigQuery load and extract jobs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #13598:
URL: https://github.com/apache/airflow/pull/13598


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org