You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/03 13:51:36 UTC

[GitHub] [airflow] JavierLopezT opened a new pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

JavierLopezT opened a new pull request #12033:
URL: https://github.com/apache/airflow/pull/12033


   Some things have been improved to match new standards of airflow:
   -Default conns 
   -Docstrings
   -Template_fields as tuple
   
   Also, the parameter allow_disk_use, which in some cases must be used has been added.
   
   I think this does not require test, doesn't it?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on a change in pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on a change in pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#discussion_r553259880



##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -25,77 +26,99 @@
 from airflow.providers.mongo.hooks.mongo import MongoHook
 from airflow.utils.decorators import apply_defaults
 
+_DEPRECATION_MSG = (
+    "The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter."
+)
+
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool
+    :param allow_disk_use: in the case you are retrieving a lot of data, you may have
+        to use the disk to save it instead of saving all in the RAM
+    :param allow_disk_use: bool
     """
 
-    template_fields = ['s3_key', 'mongo_query', 'mongo_collection']
+    template_fields = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')
     # pylint: disable=too-many-instance-attributes
 
     @apply_defaults
     def __init__(
         self,
         *,
-        mongo_conn_id: str,
-        s3_conn_id: str,
+        s3_conn_id: Optional[str] = None,
+        mongo_conn_id: str = 'mongo_default',
+        aws_conn_id: str = 'aws_default',
         mongo_collection: str,
         mongo_query: Union[list, dict],
         s3_bucket: str,
         s3_key: str,
         mongo_db: Optional[str] = None,
         replace: bool = False,
+        allow_disk_use: bool = False,
         compression: Optional[str] = None,

Review comment:
       sry @JavierLopezT I forgot to mention it in the other PR #13187, but can you also add documentation around compression. Maybe @DreamPearl can help you with that? :)
   
   That would be great. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-762808347


   @JavierLopezT this is fine now. The skipped checks are unrelated to your changes, that's why they are skipped. (Skipped doesn't mean this in general, but in this case only unrelated checks were skipped.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-732200035


   @HaloKo4 This will be included in [backport providers](http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow/latest/backport-providers.html)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on a change in pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on a change in pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#discussion_r553872191



##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -25,77 +26,99 @@
 from airflow.providers.mongo.hooks.mongo import MongoHook
 from airflow.utils.decorators import apply_defaults
 
+_DEPRECATION_MSG = (
+    "The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter."
+)
+
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool
+    :param allow_disk_use: in the case you are retrieving a lot of data, you may have
+        to use the disk to save it instead of saving all in the RAM
+    :param allow_disk_use: bool
     """
 
-    template_fields = ['s3_key', 'mongo_query', 'mongo_collection']
+    template_fields = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')
     # pylint: disable=too-many-instance-attributes
 
     @apply_defaults
     def __init__(
         self,
         *,
-        mongo_conn_id: str,
-        s3_conn_id: str,
+        s3_conn_id: Optional[str] = None,
+        mongo_conn_id: str = 'mongo_default',
+        aws_conn_id: str = 'aws_default',
         mongo_collection: str,
         mongo_query: Union[list, dict],
         s3_bucket: str,
         s3_key: str,
         mongo_db: Optional[str] = None,
         replace: bool = False,
+        allow_disk_use: bool = False,
         compression: Optional[str] = None,

Review comment:
       No problem. Just added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RosterIn commented on a change in pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
RosterIn commented on a change in pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#discussion_r559786725



##########
File path: tests/providers/amazon/aws/transfers/test_mongo_to_s3.py
##########
@@ -60,15 +60,20 @@ def setUp(self):
     def test_init(self):
         assert self.mock_operator.task_id == TASK_ID
         assert self.mock_operator.mongo_conn_id == MONGO_CONN_ID
-        assert self.mock_operator.s3_conn_id == S3_CONN_ID
+        assert self.mock_operator.aws_conn_id == AWS_CONN_ID
         assert self.mock_operator.mongo_collection == MONGO_COLLECTION
         assert self.mock_operator.mongo_query == MONGO_QUERY
         assert self.mock_operator.s3_bucket == S3_BUCKET
         assert self.mock_operator.s3_key == S3_KEY
         assert self.mock_operator.compression == COMPRESSION
 
     def test_template_field_overrides(self):
-        assert self.mock_operator.template_fields == ['s3_key', 'mongo_query', 'mongo_collection']
+        assert self.mock_operator.template_fields == (
+            's3_bucket',
+            's3_key',
+            'mongo_query',
+            'mongo_collection'

Review comment:
       ```suggestion
               'mongo_collection',
   ```
   This will make black happy
   https://test-black.readthedocs.io/en/style-guide/style_guide/trailing_commas.html




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-755447042


   I found a different error in the failing check related to https://github.com/apache/airflow/commit/003584bbf1d66a3545ad6e6fcdceb0410fc83696 which is now fixed. Please rebase :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-721653981


   [The Workflow run](https://github.com/apache/airflow/actions/runs/345353094) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-761898293


   [The Workflow run](https://github.com/apache/airflow/actions/runs/492340576) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-754152253


   Thanks @feluelle 
   
   I think the failing tests are not my fault. For instance, pylint is about datacatalog in google providers. Otherwise I don't see where is my error. could you help me, please? Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-762412540






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-753837495


   @JavierLopezT can you rebase and fix conflicts, please? If it keeps being green I am happy to merge it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-756081851


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] HaloKo4 commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
HaloKo4 commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-732103130


   thanks for this! can it be expected in next release?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-754004356


   [The Workflow run](https://github.com/apache/airflow/actions/runs/461191789) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-721653770


   [The Workflow run](https://github.com/apache/airflow/actions/runs/345352884) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-720405723






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-762324895


   [The Workflow run](https://github.com/apache/airflow/actions/runs/493871097) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-721440084


   Can you please rebased your PR on latest Master since we have applied [Black](https://github.com/apache/airflow/commit/4e8f9cc8d02b29c325b8a5a76b4837671bdf5f68) and [PyUpgrade](https://github.com/apache/airflow/commit/8c42cf1b00c90f0d7f11b8a3a455381de8e003c5) on Master
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-762512616


   > @JavierLopezT the checks still did not run. :(
   
   Again 10 tests have been skipped. It was all green a couple of months ago and I think I have made all the necessary modifications. Couldn't it be merged anyway?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#discussion_r516348513



##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -27,79 +27,89 @@
 
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool
+    :param allow_disk_use: in the case you are retrieving a lot of data, you may have
+        to use the disk to save it instead of saving all in the RAM
+    :param allow_disk_use: bool
     """
 
-    template_fields = ['s3_key', 'mongo_query']
-    # pylint: disable=too-many-instance-attributes
+    template_fields = ('s3_bucket', 's3_key', 'mongo_query')
 
     @apply_defaults
     def __init__(
         self,
         *,
-        mongo_conn_id: str,
-        s3_conn_id: str,
+        mongo_conn_id: str = 'mongo_default',
+        aws_conn_id: str = 'aws_default',

Review comment:
       Can you add backward compatibility? Example: https://github.com/apache/airflow/blob/e324b37a67e32c368df50604a00160d7766b5c33/airflow/providers/google/cloud/operators/bigquery.py#L167-L169
   Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-754169170


   You are right @JavierLopezT. I will take a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-754138948


   [The Workflow run](https://github.com/apache/airflow/actions/runs/461673985) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle merged pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle merged pull request #12033:
URL: https://github.com/apache/airflow/pull/12033


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-754788553


   > Hi @JavierLopezT, I think it is "fixed" now (see [c7d75ad](https://github.com/apache/airflow/commit/c7d75ad887cd12d5603563c5fa873c0e2f8975aa)).
   
   Thanks. Now I think the error is:
   ```
   /usr/local/lib/python3.6/site-packages/snowflake/connector/options.py:78: UserWarning: You have an incompatible version of 'pyarrow' installed (1.0.1), please install a version that adheres to: 'pyarrow<0.18.0,>=0.17.0; extra == "pandas"'
       warn_incompatible_dep('pyarrow', _installed_pyarrow_version.version, _expected_pyarrow_version)
   ```
   
   Is this something within the scope of this PR? If so, how could I solve it? Thank you very much
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-762172157


   [The Workflow run](https://github.com/apache/airflow/actions/runs/493229250) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-754519660


   Hi @JavierLopezT, I think it is "fixed" now (see c7d75ad887cd12d5603563c5fa873c0e2f8975aa).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RosterIn commented on a change in pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
RosterIn commented on a change in pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#discussion_r559525360



##########
File path: tests/providers/amazon/aws/transfers/test_mongo_to_s3.py
##########
@@ -25,7 +25,7 @@
 
 TASK_ID = 'test_mongo_to_s3_operator'
 MONGO_CONN_ID = 'default_mongo'
-S3_CONN_ID = 'default_s3'

Review comment:
       You removed S3_CONN_ID but you are referencing it in line 63

##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -25,77 +26,101 @@
 from airflow.providers.mongo.hooks.mongo import MongoHook
 from airflow.utils.decorators import apply_defaults
 
+_DEPRECATION_MSG = (
+    "The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter."
+)
+
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool
+    :param allow_disk_use: in the case you are retrieving a lot of data, you may have
+        to use the disk to save it instead of saving all in the RAM
+    :param allow_disk_use: bool
+    :param compression: Type of compression to use when saving the file in S3. Currently only gzip is supported.

Review comment:
       ```suggestion
       :param compression: type of compression to use for output file in S3. Currently only gzip is supported.
   ```
   You need line to be less than 110 chars. Yours has 112.
   Try this description with 107 chars or just use two lines like you do in lines 53-53 with the allow_disk_use param.

##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -25,77 +26,101 @@
 from airflow.providers.mongo.hooks.mongo import MongoHook
 from airflow.utils.decorators import apply_defaults
 
+_DEPRECATION_MSG = (
+    "The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter."
+)
+
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool
+    :param allow_disk_use: in the case you are retrieving a lot of data, you may have
+        to use the disk to save it instead of saving all in the RAM
+    :param allow_disk_use: bool

Review comment:
       ```suggestion
       :type allow_disk_use: bool
   ```

##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -25,77 +26,101 @@
 from airflow.providers.mongo.hooks.mongo import MongoHook
 from airflow.utils.decorators import apply_defaults
 
+_DEPRECATION_MSG = (
+    "The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter."
+)
+
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool

Review comment:
       ```suggestion
       :type replace: bool
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-748062819


   > @HaloKo4 This will be included in [backport providers](http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow/latest/backport-providers.html)
   
   Any idea when it is going to be merged? Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JavierLopezT commented on a change in pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
JavierLopezT commented on a change in pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#discussion_r553872191



##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -25,77 +26,99 @@
 from airflow.providers.mongo.hooks.mongo import MongoHook
 from airflow.utils.decorators import apply_defaults
 
+_DEPRECATION_MSG = (
+    "The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter."
+)
+
 
 class MongoToS3Operator(BaseOperator):
-    """
-    Mongo -> S3
-        A more specific baseOperator meant to move data
-        from mongo via pymongo to s3 via boto
-
-        things to note
-                .execute() is written to depend on .transform()
-                .transform() is meant to be extended by child classes
-                to perform transformations unique to those operators needs
+    """Operator meant to move data from mongo via pymongo to s3 via boto.
+
+    :param mongo_conn_id: reference to a specific mongo connection
+    :type mongo_conn_id: str
+    :param aws_conn_id: reference to a specific S3 connection
+    :type aws_conn_id: str
+    :param mongo_collection: reference to a specific collection in your mongo db
+    :type mongo_collection: str
+    :param mongo_query: query to execute. A list including a dict of the query
+    :type mongo_query: list
+    :param s3_bucket: reference to a specific S3 bucket to store the data
+    :type s3_bucket: str
+    :param s3_key: in which S3 key the file will be stored
+    :type s3_key: str
+    :param mongo_db: reference to a specific mongo database
+    :type mongo_db: str
+    :param replace: whether or not to replace the file in S3 if it previously existed
+    :param replace: bool
+    :param allow_disk_use: in the case you are retrieving a lot of data, you may have
+        to use the disk to save it instead of saving all in the RAM
+    :param allow_disk_use: bool
     """
 
-    template_fields = ['s3_key', 'mongo_query', 'mongo_collection']
+    template_fields = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')
     # pylint: disable=too-many-instance-attributes
 
     @apply_defaults
     def __init__(
         self,
         *,
-        mongo_conn_id: str,
-        s3_conn_id: str,
+        s3_conn_id: Optional[str] = None,
+        mongo_conn_id: str = 'mongo_default',
+        aws_conn_id: str = 'aws_default',
         mongo_collection: str,
         mongo_query: Union[list, dict],
         s3_bucket: str,
         s3_key: str,
         mongo_db: Optional[str] = None,
         replace: bool = False,
+        allow_disk_use: bool = False,
         compression: Optional[str] = None,

Review comment:
       No problem. Just added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on pull request #12033: AllowDiskUse parameter and docs in MongotoS3Operator

Posted by GitBox <gi...@apache.org>.
feluelle commented on pull request #12033:
URL: https://github.com/apache/airflow/pull/12033#issuecomment-761114296


   @JavierLopezT the checks still did not run. :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org