You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/23 10:23:56 UTC

[GitHub] [airflow] turbaszek opened a new pull request #8529: Split and improve BigQuery example DAG

turbaszek opened a new pull request #8529:
URL: https://github.com/apache/airflow/pull/8529


   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #8529: Split and improve BigQuery example DAG

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #8529:
URL: https://github.com/apache/airflow/pull/8529#discussion_r415848496



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_transfer.py
##########
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Example Airflow DAG for Google BigQuery service.
+"""
+import os
+
+from airflow import models
+from airflow.providers.google.cloud.operators.bigquery import (
+    BigQueryCreateEmptyDatasetOperator, BigQueryCreateEmptyTableOperator, BigQueryDeleteDatasetOperator,
+)
+from airflow.providers.google.cloud.operators.bigquery_to_bigquery import BigQueryToBigQueryOperator
+from airflow.providers.google.cloud.operators.bigquery_to_gcs import BigQueryToGCSOperator
+from airflow.utils.dates import days_ago
+
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset_transfer")
+DATA_EXPORT_BUCKET_NAME = os.environ.get(
+    "GCP_BIGQUERY_EXPORT_BUCKET_NAME", "test-bigquery-sample-data"
+)
+ORIGIN = "origin"
+TARGET = "target"
+
+default_args = {"start_date": days_ago(1)}
+
+with models.DAG(
+    "example_bigquery_transfer",
+    default_args=default_args,
+    schedule_interval=None,  # Override to match your needs
+    tags=["example"],
+) as dag:
+    copy_selected_data = BigQueryToBigQueryOperator(

Review comment:
       I don't know. I would prefer to keep it here and have one DAG = one test. Splitting this into two dags will result in code duplication. It's not a big deal but still I prefer less code and fewer files. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #8529: Split and improve BigQuery example DAG

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #8529:
URL: https://github.com/apache/airflow/pull/8529#discussion_r415978910



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_transfer.py
##########
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Example Airflow DAG for Google BigQuery service.
+"""
+import os
+
+from airflow import models
+from airflow.providers.google.cloud.operators.bigquery import (
+    BigQueryCreateEmptyDatasetOperator, BigQueryCreateEmptyTableOperator, BigQueryDeleteDatasetOperator,
+)
+from airflow.providers.google.cloud.operators.bigquery_to_bigquery import BigQueryToBigQueryOperator
+from airflow.providers.google.cloud.operators.bigquery_to_gcs import BigQueryToGCSOperator
+from airflow.utils.dates import days_ago
+
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset_transfer")
+DATA_EXPORT_BUCKET_NAME = os.environ.get(
+    "GCP_BIGQUERY_EXPORT_BUCKET_NAME", "test-bigquery-sample-data"
+)
+ORIGIN = "origin"
+TARGET = "target"
+
+default_args = {"start_date": days_ago(1)}
+
+with models.DAG(
+    "example_bigquery_transfer",
+    default_args=default_args,
+    schedule_interval=None,  # Override to match your needs
+    tags=["example"],
+) as dag:
+    copy_selected_data = BigQueryToBigQueryOperator(

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #8529: Split and improve BigQuery example DAG

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #8529:
URL: https://github.com/apache/airflow/pull/8529#discussion_r415907894



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_transfer.py
##########
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Example Airflow DAG for Google BigQuery service.
+"""
+import os
+
+from airflow import models
+from airflow.providers.google.cloud.operators.bigquery import (
+    BigQueryCreateEmptyDatasetOperator, BigQueryCreateEmptyTableOperator, BigQueryDeleteDatasetOperator,
+)
+from airflow.providers.google.cloud.operators.bigquery_to_bigquery import BigQueryToBigQueryOperator
+from airflow.providers.google.cloud.operators.bigquery_to_gcs import BigQueryToGCSOperator
+from airflow.utils.dates import days_ago
+
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset_transfer")
+DATA_EXPORT_BUCKET_NAME = os.environ.get(
+    "GCP_BIGQUERY_EXPORT_BUCKET_NAME", "test-bigquery-sample-data"
+)
+ORIGIN = "origin"
+TARGET = "target"
+
+default_args = {"start_date": days_ago(1)}
+
+with models.DAG(
+    "example_bigquery_transfer",
+    default_args=default_args,
+    schedule_interval=None,  # Override to match your needs
+    tags=["example"],
+) as dag:
+    copy_selected_data = BigQueryToBigQueryOperator(

Review comment:
       I would prefer that each module have separate unit tests, a separate guide and separate system tests.  Otherwise it will be difficult for us to determine coverage. A few days ago one person found one test that was in the wrong file.
   https://github.com/apache/airflow/pull/8556/files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #8529: Split and improve BigQuery example DAG

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #8529:
URL: https://github.com/apache/airflow/pull/8529#discussion_r415813402



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_transfer.py
##########
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Example Airflow DAG for Google BigQuery service.
+"""
+import os
+
+from airflow import models
+from airflow.providers.google.cloud.operators.bigquery import (
+    BigQueryCreateEmptyDatasetOperator, BigQueryCreateEmptyTableOperator, BigQueryDeleteDatasetOperator,
+)
+from airflow.providers.google.cloud.operators.bigquery_to_bigquery import BigQueryToBigQueryOperator
+from airflow.providers.google.cloud.operators.bigquery_to_gcs import BigQueryToGCSOperator
+from airflow.utils.dates import days_ago
+
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset_transfer")
+DATA_EXPORT_BUCKET_NAME = os.environ.get(
+    "GCP_BIGQUERY_EXPORT_BUCKET_NAME", "test-bigquery-sample-data"
+)
+ORIGIN = "origin"
+TARGET = "target"
+
+default_args = {"start_date": days_ago(1)}
+
+with models.DAG(
+    "example_bigquery_transfer",
+    default_args=default_args,
+    schedule_interval=None,  # Override to match your needs
+    tags=["example"],
+) as dag:
+    copy_selected_data = BigQueryToBigQueryOperator(

Review comment:
       Should we not move this to example_bigquery_to_bigquery.py/example_ bigquery_to_gcs? It would be nice if each module had a separate test. This will make it easier to manage it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #8529: Split and improve BigQuery example DAG

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #8529:
URL: https://github.com/apache/airflow/pull/8529#discussion_r415907894



##########
File path: airflow/providers/google/cloud/example_dags/example_bigquery_transfer.py
##########
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Example Airflow DAG for Google BigQuery service.
+"""
+import os
+
+from airflow import models
+from airflow.providers.google.cloud.operators.bigquery import (
+    BigQueryCreateEmptyDatasetOperator, BigQueryCreateEmptyTableOperator, BigQueryDeleteDatasetOperator,
+)
+from airflow.providers.google.cloud.operators.bigquery_to_bigquery import BigQueryToBigQueryOperator
+from airflow.providers.google.cloud.operators.bigquery_to_gcs import BigQueryToGCSOperator
+from airflow.utils.dates import days_ago
+
+PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "example-project")
+DATASET_NAME = os.environ.get("GCP_BIGQUERY_DATASET_NAME", "test_dataset_transfer")
+DATA_EXPORT_BUCKET_NAME = os.environ.get(
+    "GCP_BIGQUERY_EXPORT_BUCKET_NAME", "test-bigquery-sample-data"
+)
+ORIGIN = "origin"
+TARGET = "target"
+
+default_args = {"start_date": days_ago(1)}
+
+with models.DAG(
+    "example_bigquery_transfer",
+    default_args=default_args,
+    schedule_interval=None,  # Override to match your needs
+    tags=["example"],
+) as dag:
+    copy_selected_data = BigQueryToBigQueryOperator(

Review comment:
       I would prefer that each module have separate unit tests, a separate guide and separate unit tests.  Otherwise it will be difficult for us to determine coverage. A few days ago one person found one test that was in the wrong file.
   https://github.com/apache/airflow/pull/8556/files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org