You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/14 14:24:29 UTC

[GitHub] [airflow] ivan-afonichkin opened a new pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

ivan-afonichkin opened a new pull request #9290:
URL: https://github.com/apache/airflow/pull/9290


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645500870


   @ivan-afonichkin Ohh.  CI is still sad.  Can you fix it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440875731



##########
File path: airflow/providers/amazon/aws/operators/ecs.py
##########
@@ -101,6 +101,10 @@ class ECSOperator(BaseOperator):  # pylint: disable=too-many-instance-attributes
         Only required if you want logs to be shown in the Airflow UI after your job has
         finished.
     :type awslogs_stream_prefix: str
+
+    .. seealso::

Review comment:
       It would be fantastic if the parameter description was at the end of the class description. The link to the guide should be above the description of the parameters. Look how it is done in airflow.providers.google




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440877632



##########
File path: airflow/providers/amazon/aws/operators/ecs.py
##########
@@ -101,6 +101,10 @@ class ECSOperator(BaseOperator):  # pylint: disable=too-many-instance-attributes
         Only required if you want logs to be shown in the Airflow UI after your job has
         finished.
     :type awslogs_stream_prefix: str
+
+    .. seealso::

Review comment:
       Here is example: https://airflow.readthedocs.io/en/latest/_api/airflow/providers/google/cloud/operators/cloud_storage_transfer_service/index.html#airflow.providers.google.cloud.operators.cloud_storage_transfer_service.CloudDataTransferServiceCreateJobOperator
   When this is under the parameters description, it will not fulfill its role, because it will be hardly visible. This will not promote guides.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645646207


   @ivan-afonichkin It looks like one of the changes that I merged had a defect. I'm already looking at it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-afonichkin commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-afonichkin commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645849474


   Hey @mik-laj it seems all checks have passed, thanks for your change! :)
   Can we merge this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-transferwise commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-transferwise commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-644025172


   Hello @ashb, I believe there is misunderstanding on your side. What this PR does exactly is IF there is a separate guide, THEN require link from operator's docstring. IF there is no separate guide, THEN no link needed. Does it somehow clarify the situation? 
   
   You are absolutely right though that separate guides are not needed all the time, but when they exist, it's better to promote them by putting link in the docstring.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440910982



##########
File path: airflow/operators/google_api_to_s3_transfer.py
##########
@@ -35,7 +35,12 @@ class GoogleApiToS3Transfer(GoogleApiToS3TransferOperator):
     """
     This class is deprecated.
     Please use:
-    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`."""
+    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`.
+
+    .. seealso::

Review comment:
       It's a deprecated class. We will transfer many classes to the airflow.providers package. We only have references here to maintain backward compatibility.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440181201



##########
File path: docs/build
##########
@@ -127,6 +128,71 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f".. seealso::\n"
+                        f"For more information on how to use this operator, take a look at the guide:\n"
+                        f":ref:`howto/operator:{operator_name}`\n"

Review comment:
       ```suggestion
                           f"\n"
                           f".. seealso::\n"
                           f"    For more information on how to use this operator, take a look at the guide:\n"
                           f"    :ref:`howto/operator:{operator_name}`\n"
   ```
   This will make it easier to copy a snippet.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645673472


   @ivan-afonichkin  Fix has been merged to master. Can you do a rebase?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440874272



##########
File path: airflow/operators/google_api_to_s3_transfer.py
##########
@@ -35,7 +35,12 @@ class GoogleApiToS3Transfer(GoogleApiToS3TransferOperator):
     """
     This class is deprecated.
     Please use:
-    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`."""
+    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`.
+
+    .. seealso::

Review comment:
       This should not be added to this class. You can skip all files from the airflow.contrib directory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-644975847


   Could you wait with your change until the other change is merged? Otherwise, we will have a lot of conflicts, which can be a bit problematic.  This change is needed to release backport packages, so many users hope that we will finish it ASAP.
   https://github.com/apache/airflow/pull/9320


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645235588


   The transfer package change is merged now - so please rebase. those "to" operators moved to "transfers" package. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj merged pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj merged pull request #9290:
URL: https://github.com/apache/airflow/pull/9290


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-afonichkin commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-afonichkin commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645518946


   @mik-laj Yes, I will do it, just didn't have time to fix everything. I will be back home in 3-4h and fix it :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645656570


   @kaxil you self-requested a review 3 days ago. Would you like to add something here?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r441097579



##########
File path: docs/build
##########
@@ -127,6 +129,76 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f"\n"
+                        f".. seealso::\n"
+                        f"    For more information on how to use this operator, take a look at the guide:\n"
+                        f"    :ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = chain(
+        glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True),
+    )
+
+    for py_module_path in python_module_paths:
+        with open(py_module_path) as f:
+            py_content = f.read()
+        for existing_operator in operator_names:
+            if f"class {existing_operator}" not in py_content:
+                continue
+            # This is a potential file with necessary class definition.
+            # To make sure it's a real Python class definition, we build AST tree
+            ast_tree = ast.parse(py_content)
+            class_def = extract_ast_class_def_by_name(ast_tree, existing_operator)
+
+            # Real class definition is found and docstring does not contain reference to the existing guide
+            if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):

Review comment:
       ```suggestion
               # Real class definition is not found
               if class_def is None:
                   continue
               doc = ast.get_docstring(class_def)
               if "This class is deprecated." in docstring:
                   continue
               if f":ref:`howto/operator:{existing_operator}`" in doc: 
                   continue
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645928949


   @mik-laj Your experience with AST would probably be helpful with this ticket, but we still need to do a few things before we can implement it.
   https://github.com/apache/airflow/issues/8765
   If we can set the details and unblock the work, I will call you  If you or your company uses Airflow in production, then maybe you would like to share your opinion in this thread
   https://github.com/apache/airflow/pull/9276
   After that, we don't have an idea for a ticket that might be of interest to you. However, I create all new tickets all the time, so you will definitely find something for you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-afonichkin commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-afonichkin commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645286791


   Thank you everyone for the input, especially @mik-laj! πŸ‘ 
   
   Will do all these fixes later today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-afonichkin commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-afonichkin commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645641041


   Hey @mik-laj, `CI Build / Build docs` has finally passed, but some databases checks failed with no apparent reason to me, any idea what could be wrong there?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-afonichkin commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-afonichkin commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645917765


   @mik-laj Thanks a lot, very glad it was merged, and thanks a lot for all the help and support! 
   Regarding the next change, I didn't choose an issue yet, but maybe you have something on your mind that will be very beneficial and needs to be done? If not, I will just go through the backlog :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r441098885



##########
File path: docs/build
##########
@@ -127,6 +129,76 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f"\n"
+                        f".. seealso::\n"
+                        f"    For more information on how to use this operator, take a look at the guide:\n"
+                        f"    :ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = chain(
+        glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True),
+    )
+
+    for py_module_path in python_module_paths:
+        with open(py_module_path) as f:
+            py_content = f.read()
+        for existing_operator in operator_names:
+            if f"class {existing_operator}" not in py_content:
+                continue
+            # This is a potential file with necessary class definition.
+            # To make sure it's a real Python class definition, we build AST tree
+            ast_tree = ast.parse(py_content)
+            class_def = extract_ast_class_def_by_name(ast_tree, existing_operator)
+
+            # Real class definition is found and docstring does not contain reference to the existing guide
+            if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):

Review comment:
       The documentation must build properly for this change to be accepted. In some places we don't want links, so we have to handle it in the code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r441810628



##########
File path: docs/build
##########
@@ -127,6 +129,88 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f"\n"
+                        f".. seealso::\n"
+                        f"    For more information on how to use this operator, take a look at the guide:\n"
+                        f"    :ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = chain(
+        glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True),

Review comment:
       Can you add `transfers` package here also?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ivan-afonichkin commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ivan-afonichkin commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440899254



##########
File path: airflow/operators/google_api_to_s3_transfer.py
##########
@@ -35,7 +35,12 @@ class GoogleApiToS3Transfer(GoogleApiToS3TransferOperator):
     """
     This class is deprecated.
     Please use:
-    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`."""
+    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`.
+
+    .. seealso::

Review comment:
       But this is not under `airflow.contrib`. So why should it be skipped in this case?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r441095280



##########
File path: docs/build
##########
@@ -127,6 +129,76 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f"\n"
+                        f".. seealso::\n"
+                        f"    For more information on how to use this operator, take a look at the guide:\n"
+                        f"    :ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = chain(
+        glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True),
+        glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True),
+    )
+
+    for py_module_path in python_module_paths:
+        with open(py_module_path) as f:
+            py_content = f.read()
+        for existing_operator in operator_names:

Review comment:
       ```suggestion
           if "This module is deprecated" in py_content:
               continue
           for existing_operator in operator_names:
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440913097



##########
File path: airflow/operators/google_api_to_s3_transfer.py
##########
@@ -35,7 +35,12 @@ class GoogleApiToS3Transfer(GoogleApiToS3TransferOperator):
     """
     This class is deprecated.
     Please use:
-    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`."""
+    `airflow.providers.amazon.aws.operators.google_api_to_s3_transfer.GoogleApiToS3TransferOperator`.
+
+    .. seealso::

Review comment:
       Formerly the core code was maintained by the original creators - Airbnb. The code that was in the contrib package was supported by the community. The project was passed to the Apache community and currently the entire code is maintained by the community, so now the division has no justification, and it is only due to historical reasons.
   https://airflow.readthedocs.io/en/stable/_api/index.html
   We wanted to fix this, so we created a new airflow.providers package to organize these operators.
   In the airflow.{operators,sensors} package, we only have a small set of core classes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645913488


   @ivan-afonichkin Thank you for a great contribution. This will make our work much easier, because documentation reviews will be easier.  What are your plans for the next change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440185083



##########
File path: docs/build
##########
@@ -127,6 +128,71 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f".. seealso::\n"
+                        f"For more information on how to use this operator, take a look at the guide:\n"
+                        f":ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = glob(f"{ROOT_PACKAGE_DIR}/**/*.py", recursive=True)

Review comment:
       ```suggestion
       python_module_paths = itertools(
           glob(f"{ROOT_PACKAGE_DIR}/operators/*.py"),
           glob(f"{ROOT_PACKAGE_DIR}/sensors/*.py"),
           glob(f"{ROOT_PACKAGE_DIR}/providers/**/operators/*.py", recursive=True),
           glob(f"{ROOT_PACKAGE_DIR}/providers/**/sensors/*.py", recursive=True),
       )
   ```
   I am mainly worried about checking files that are in www/node_modules.  Some engineers are creative and publish Python code in Javascript packages. However, it is not guaranteed that this file will be correctly handled by the 
   current interpreter or whether it will be a valid Python file at all.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440189048



##########
File path: docs/build
##########
@@ -127,6 +128,71 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f".. seealso::\n"
+                        f"For more information on how to use this operator, take a look at the guide:\n"
+                        f":ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = glob(f"{ROOT_PACKAGE_DIR}/**/*.py", recursive=True)
+
+    for py_module_path in python_module_paths:
+        with open(py_module_path) as f:
+            py_content = f.read()
+            for existing_operator in operator_names:
+                if f"class {existing_operator}" in py_content:
+                    # This is a potential file with necessary class definition.
+                    # To make sure it's a real Python class definition, we build AST tree
+                    ast_tree = ast.parse(py_content)
+                    class_def = extract_ast_class_def_by_name(ast_tree, existing_operator)
+
+                    if class_def is not None:
+                        # Real class definition is found
+                        if f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):
+                            # Docstring does not contain reference to the existing guide
+                            build_errors.append(
+                                generate_build_error(py_module_path, class_def.lineno, existing_operator)
+                            )

Review comment:
       ```suggestion
       for py_module_path in python_module_paths:
           with open(py_module_path) as f:
               py_content = f.read()
           for existing_operator in operator_names:
               if f"class {existing_operator}" not in py_content:
                   continue
               # This is a potential file with necessary class definition.
               # To make sure it's a real Python class definition, we build AST tree
               ast_tree = ast.parse(py_content)
               class_def = extract_ast_class_def_by_name(ast_tree, existing_operator)
   
               # Real class definition is found and docstring does not contain reference to the existing guide
               if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):
                   build_errors.append(
                       generate_build_error(py_module_path, class_def.lineno, existing_operator)
                   )
   ```
   There are a little too many indentations in the code, which makes it difficult to understand.  I really like using AST.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-645902207


   Awesome work, congrats on your first merged pull request!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-643773538


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better πŸš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://apache-airflow-slack.herokuapp.com/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#discussion_r440189048



##########
File path: docs/build
##########
@@ -127,6 +128,71 @@ def display_errors_summary() -> None:
     print("=" * 50)
 
 
+def find_existing_guide_operator_names():
+    operator_names = set()
+
+    paths = glob("howto/operator/**/*.rst", recursive=True)
+    for path in paths:
+        with open(path) as f:
+            operator_names |= set(re.findall(".. _howto/operator:(.+?):", f.read()))
+
+    return operator_names
+
+
+def extract_ast_class_def_by_name(ast_tree, class_name):
+    class ClassVisitor(ast.NodeVisitor):
+        def __init__(self):
+            self.found_class_node = None
+
+        def visit_ClassDef(self, node):
+            if node.name == class_name:
+                self.found_class_node = node
+
+    visitor = ClassVisitor()
+    visitor.visit(ast_tree)
+
+    return visitor.found_class_node
+
+
+def check_guide_links_in_operator_descriptions():
+    def generate_build_error(path, line_no, operator_name):
+        return DocBuildError(
+                    file_path=path,
+                    line_no=line_no,
+                    message=(
+                        f"Link to the guide is missing in operator's description: {operator_name}.\n"
+                        f"Please add link to the guide to the description in the following form:\n"
+                        f".. seealso::\n"
+                        f"For more information on how to use this operator, take a look at the guide:\n"
+                        f":ref:`howto/operator:{operator_name}`\n"
+                    )
+        )
+
+    # Extract operators for which there are existing .rst guides
+    operator_names = find_existing_guide_operator_names()
+
+    # Extract all potential python modules that can contain operators
+    python_module_paths = glob(f"{ROOT_PACKAGE_DIR}/**/*.py", recursive=True)
+
+    for py_module_path in python_module_paths:
+        with open(py_module_path) as f:
+            py_content = f.read()
+            for existing_operator in operator_names:
+                if f"class {existing_operator}" in py_content:
+                    # This is a potential file with necessary class definition.
+                    # To make sure it's a real Python class definition, we build AST tree
+                    ast_tree = ast.parse(py_content)
+                    class_def = extract_ast_class_def_by_name(ast_tree, existing_operator)
+
+                    if class_def is not None:
+                        # Real class definition is found
+                        if f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):
+                            # Docstring does not contain reference to the existing guide
+                            build_errors.append(
+                                generate_build_error(py_module_path, class_def.lineno, existing_operator)
+                            )

Review comment:
       ```suggestion
       for py_module_path in python_module_paths:
           with open(py_module_path) as f:
               py_content = f.read()
           for existing_operator in operator_names:
               if f"class {existing_operator}" not in py_content:
                   continue
               # This is a potential file with necessary class definition.
               # To make sure it's a real Python class definition, we build AST tree
               ast_tree = ast.parse(py_content)
               class_def = extract_ast_class_def_by_name(ast_tree, existing_operator)
   
               # Real class definition is found
               if class_def is not None and f":ref:`howto/operator:{existing_operator}`" not in ast.get_docstring(class_def):
                   # Docstring does not contain reference to the existing guide
                   build_errors.append(
                       generate_build_error(py_module_path, class_def.lineno, existing_operator)
                   )
   ```
   There are a little too many indentations in the code, which makes it difficult to understand.  I really like using AST.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9290: Detect automatically the lack of reference to the guide in the operator descriptions

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9290:
URL: https://github.com/apache/airflow/pull/9290#issuecomment-644030569


   Gotcha, yes totally misread what it was doing. Sorry!
   
   (That also explains why there were only 31 errors!)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org