You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/25 06:14:40 UTC

[GitHub] [airflow] rsg17 opened a new pull request #21084: [wip] Upload data from GCS to Presto

rsg17 opened a new pull request #21084:
URL: https://github.com/apache/airflow/pull/21084


   related: #12246
   
   Currently, this draft PR has a base operator to upload a csv file from GCS to Presto. Does not include any testing.
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#discussion_r807996216



##########
File path: airflow/providers/presto/provider.yaml
##########
@@ -42,6 +42,12 @@ hooks:
     python-modules:
       - airflow.providers.presto.hooks.presto
 
+transfers:
+  - source-integration-name: Google Cloud Storage (GCS)
+    target-integration-name: Presto
+    how-to-guide: /docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst

Review comment:
       I think it might be related to `/transfers/` ?
   All other operators has `/transfer/` for `how-to-guide`
   
   Try to change it to
   ```suggestion
       how-to-guide: /docs/apache-airflow-providers-presto/transfer/gcs_to_presto.rst
   ```
   
   And also change the path of the file itself




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
rsg17 commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1041193297


   > static checks are still failing
   
   Yes - I was actually going to ask about the `providers.yaml` after checking it out once more tonight. In this context, what is 'left' and 'right'?
   `Checking doc files
    -- Checking document urls: expected(left), current(right)
       -- Items in the right set but not the left:
          '/docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst'`
          
    
    
    I need to check the other test failure too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #21084: [wip] Upload data from GCS to Presto

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#discussion_r806541720



##########
File path: docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst
##########
@@ -0,0 +1,51 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Google Cloud Storage to Presto Transfer Operator
+================================================
+
+Google has a service `Google Cloud Storage <https://cloud.google.com/storage/>`__. This service is
+used to store large data from various applications.
+
+`Presto <https://prestodb.io/>`__ is an open source distributed SQL query engine for running interactive
+analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto allows
+querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
+A single Presto query can combine data from multiple sources, allowing for analytics across your entire
+organization.
+
+
+Prerequisite Tasks
+^^^^^^^^^^^^^^^^^^
+
+.. include::/operators/_partials/prerequisite_tasks.rst
+
+.. _howto/operator:GCSToPresto:
+
+Upload CSV from GCS to Presto Table

Review comment:
       ```suggestion
   Load CSV from GCS to Presto Table
   ```

##########
File path: docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst
##########
@@ -0,0 +1,51 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Google Cloud Storage to Presto Transfer Operator
+================================================
+
+Google has a service `Google Cloud Storage <https://cloud.google.com/storage/>`__. This service is
+used to store large data from various applications.
+
+`Presto <https://prestodb.io/>`__ is an open source distributed SQL query engine for running interactive
+analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto allows
+querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
+A single Presto query can combine data from multiple sources, allowing for analytics across your entire
+organization.
+
+
+Prerequisite Tasks
+^^^^^^^^^^^^^^^^^^
+
+.. include::/operators/_partials/prerequisite_tasks.rst
+
+.. _howto/operator:GCSToPresto:
+
+Upload CSV from GCS to Presto Table
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To upload a csv from Google Cloud Storage to a Presto table you can use the

Review comment:
       ```suggestion
   To load CSV file from Google Cloud Storage to a Presto table you can use the
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1039963395


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 commented on pull request #21084: [wip] Upload data from GCS to Presto

Posted by GitBox <gi...@apache.org>.
rsg17 commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1038487841


   > @eladkal : yes! Thank you for checking. Will do it this weekend..
   
   @eladkal: I have added tests and docs. Let me know your feedback after you get a chance to review..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1053236110


   @rsg17 is there a need to update the code of GcsToPresto after review of https://github.com/apache/airflow/pull/21704 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #21084: [wip] Upload data from GCS to Presto

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#discussion_r806541720



##########
File path: docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst
##########
@@ -0,0 +1,51 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Google Cloud Storage to Presto Transfer Operator
+================================================
+
+Google has a service `Google Cloud Storage <https://cloud.google.com/storage/>`__. This service is
+used to store large data from various applications.
+
+`Presto <https://prestodb.io/>`__ is an open source distributed SQL query engine for running interactive
+analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto allows
+querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
+A single Presto query can combine data from multiple sources, allowing for analytics across your entire
+organization.
+
+
+Prerequisite Tasks
+^^^^^^^^^^^^^^^^^^
+
+.. include::/operators/_partials/prerequisite_tasks.rst
+
+.. _howto/operator:GCSToPresto:
+
+Upload CSV from GCS to Presto Table

Review comment:
       ```suggestion
   Load CSV from GCS to Presto Table
   ```

##########
File path: docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst
##########
@@ -0,0 +1,51 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Google Cloud Storage to Presto Transfer Operator
+================================================
+
+Google has a service `Google Cloud Storage <https://cloud.google.com/storage/>`__. This service is
+used to store large data from various applications.
+
+`Presto <https://prestodb.io/>`__ is an open source distributed SQL query engine for running interactive
+analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto allows
+querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
+A single Presto query can combine data from multiple sources, allowing for analytics across your entire
+organization.
+
+
+Prerequisite Tasks
+^^^^^^^^^^^^^^^^^^
+
+.. include::/operators/_partials/prerequisite_tasks.rst
+
+.. _howto/operator:GCSToPresto:
+
+Upload CSV from GCS to Presto Table
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To upload a csv from Google Cloud Storage to a Presto table you can use the

Review comment:
       ```suggestion
   To load CSV file from Google Cloud Storage to a Presto table you can use the
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
rsg17 commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1043505141


   > Thanks @rsg17 ! If you'd like a great additional contribution could be Add `GCSToTrinoOperator` - it should be very similar to what you did here
   
   Thank you. Will work on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #21084: [wip] Upload data from GCS to Presto

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1023069770


   @rsg17 can you add tests and docs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 edited a comment on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
rsg17 edited a comment on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1041193297


   > static checks are still failing
   
   Yes - I was actually going to ask about the `providers.yaml` after checking it out once more tonight. In this context, what is 'left' and 'right' here: https://github.com/apache/airflow/runs/5209535221?check_suite_focus=true#step:11:257?
   `Checking doc files
    -- Checking document urls: expected(left), current(right)
       -- Items in the right set but not the left:
          '/docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst'`
          
    
    
    I need to check the other test failure too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 edited a comment on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
rsg17 edited a comment on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1041193297


   > static checks are still failing
   
   Yes - I was actually going to ask about the `providers.yaml` after checking it out once more tonight. In this context, what is 'left' and 'right' here: https://github.com/apache/airflow/runs/5209535221?check_suite_focus=true#step:11:257?
   `Checking doc files
    -- Checking document urls: expected(left), current(right)
       -- Items in the right set but not the left:
          '/docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst'`
          
    
    
    I think the latest push should resolve the other test failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal merged pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
eladkal merged pull request #21084:
URL: https://github.com/apache/airflow/pull/21084


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 commented on a change in pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
rsg17 commented on a change in pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#discussion_r808585351



##########
File path: airflow/providers/presto/provider.yaml
##########
@@ -42,6 +42,12 @@ hooks:
     python-modules:
       - airflow.providers.presto.hooks.presto
 
+transfers:
+  - source-integration-name: Google Cloud Storage (GCS)
+    target-integration-name: Presto
+    how-to-guide: /docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst

Review comment:
       Thank you! I think it needs to be under `operators/transfer`. Have updated the path and pre-commit did not show the error.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 edited a comment on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
rsg17 edited a comment on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1041193297


   > static checks are still failing
   
   Yes - I was actually going to ask about the `providers.yaml` after checking it out once more tonight. In this context, what is 'left' and 'right' here: https://github.com/apache/airflow/runs/5209535221?check_suite_focus=true#step:11:257?
   `Checking doc files
    -- Checking document urls: expected(left), current(right)
       -- Items in the right set but not the left:
          '/docs/apache-airflow-providers-presto/transfers/gcs_to_presto.rst'`
          
    
    
    I think the latest push should resolve the other test failure too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 commented on pull request #21084: [wip] Upload data from GCS to Presto

Posted by GitBox <gi...@apache.org>.
rsg17 commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1023353126


   @eladkal : yes! Thank you for checking. Will do it this weekend..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1039963395


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1041145071


   static checks are still failing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #21084: Add GCSToPrestoOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1042612095


   Thanks @rsg17 !
   If you'd like a great additional contribution could be Add `GCSToTrinoOperator` -  it should be very similar to what you did here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rsg17 commented on pull request #21084: [wip] Upload data from GCS to Presto

Posted by GitBox <gi...@apache.org>.
rsg17 commented on pull request #21084:
URL: https://github.com/apache/airflow/pull/21084#issuecomment-1038487841


   > @eladkal : yes! Thank you for checking. Will do it this weekend..
   
   @eladkal: I have added tests and docs. Let me know your feedback after you get a chance to review..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org