You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/07 08:53:54 UTC

[GitHub] [airflow] alexkruc opened a new pull request, #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

alexkruc opened a new pull request, #24895:
URL: https://github.com/apache/airflow/pull/24895

   Adding a new hook for work with Elasticsearch, using the official Elasticsearch Python client.
   Currently I've implemented only one function (search), but if any more will be needed, we can add even more.
   
   closes: #21929
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915691575


##########
airflow/providers/elasticsearch/hooks/elasticsearch_python.py:
##########
@@ -0,0 +1,56 @@
+#

Review Comment:
   Done :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915637328


##########
airflow/providers/elasticsearch/hooks/elasticsearch_python.py:
##########
@@ -0,0 +1,56 @@
+#

Review Comment:
   So we have two ways to interact with elastic one is Python based with the SDK and one is SQL based by using sql query syntax.
   
   i think we should rename this file as `python.py` or something similar?
   and the current `ElasticsearchHook` in hooks/elasticsearch.py  
   https://github.com/apache/airflow/blob/7a9c03caaf5b954057f7a566b7406982e67f9749/airflow/providers/elasticsearch/hooks/elasticsearch.py#L27 should be hooks/sql.py with `ElasticsearchSqlHook`
   
   
   Alternatively, we can have both classes `ElasticsearchSqlHook` + `ElasticsearchPythonHook` in the same `hooks/elasticsearch.py` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915693615


##########
tests/providers/elasticsearch/hooks/test_elasticsearch.py:
##########
@@ -20,8 +20,10 @@
 import unittest
 from unittest import mock
 
+from elasticsearch import Elasticsearch
+
 from airflow.models import Connection
-from airflow.providers.elasticsearch.hooks.elasticsearch import ElasticsearchHook
+from airflow.providers.elasticsearch.hooks.elasticsearch import ElasticsearchHook, ElasticsearchPythonHook

Review Comment:
   Please use the new class name on the import



##########
tests/system/providers/elasticsearch/example_elasticsearch_python_hook.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Rename the file example_elasticsearch_hook.py ?



##########
airflow/providers/elasticsearch/hooks/elasticsearch.py:
##########
@@ -93,3 +95,53 @@ def get_uri(self) -> str:
                 uri += '&'
 
         return uri
+
+
+class ElasticsearchHook(ElasticsearchSQLHook):
+    """
+    This class is deprecated and was renamed to ElasticsearchSQLHook.
+    Please use `airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchSQLHook`.
+    """
+
+    def __init__(self, *args, **kwargs):
+        warnings.warn(
+            """This class is deprecated.
+            Please use `airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchSQLHook`.""",
+            DeprecationWarning,
+            stacklevel=3,
+        )
+        super().__init__(*args, **kwargs)
+
+
+class ElasticsearchPythonHook(BaseHook):
+    """
+    Interacts with Elasticsearch. This hook uses the official Elasticsearch Python Client.
+
+    :param hosts: list: A list of a single or many Elasticsearch instances. Example: ["http://localhost:9200"]
+    :param es_conn_args: dict: Additional arguments you might need to enter to connect to Elasticsearch.
+                                Example: {"ca_cert":"/path/to/http_ca.crt", "basic_auth": "(user, pass)"}

Review Comment:
   It looks like we need also a connection.rst to explain this additional settings for the connection



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1179370846

   Finally, all the tests are passing! :D 
   To recap, the Sphinx message are ambiguous and doesn't point. to the actual issues..
   The issue with the `indent` was real, but it referred to the indent of the example code. As I'm passing the Python block without any indents, I had to just remove the `indent` clause from the rst file.
   
   Later, I had issues with a "typo" - but Sphinx just set it as a generic typo, and did not always point to the correct file with the error..
   
   In any case, at least now I know how to handle these issues - the more I know, the more I can help other people in the community :) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1179384873

   > To recap, the Sphinx error messages are ambiguous and don't point to the actual issues..
   
   Yeah. quite often the message are rather unhelpful, I must agree.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1178961679

   @potiuk Sorry to trouble you again.. I followed your tips, and now the integration tests are passing! But the build docs are still failing on the same thing - `/hooks/elasticsearch.rst.rst:37:non-whitespace stripped by dedent`.
   Do you have a tip regarding what's wrong there? Breeze static-checks and pre-commit are passing, and the `dedent` clause is the same as in other files (at least from what my IDE sees)..
   
   Really appreciate your help 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1178570419

   @potiuk Is it possible to get your assistance on the failing tests?
   I have 2 failing tests:
   
   - The test [TestElasticsearchProviderProjectStructure](https://github.com/apache/airflow/blob/main/tests/always/test_project_structure.py#L473) is failing. From what I understand, it looks for all the available hooks, and looks that they have  a matching example in the examples folder [actual test](https://github.com/apache/airflow/blob/main/tests/always/test_project_structure.py#L182). 
   The thing is, that it's failing on the `ElasticsearchHook` which I deprecated (because I renamed it) in this PR. I don't want to add a test for a deprecated hook, because it's kinda beets the purpose.. Do you have any tips?
   
   - The `build docs` test is failing, on the new documentation that I've added - `hooks/elasticsearch.rst.rst:37:non-whitespace stripped by dedent`.  I compared the `dedent` clause with several other rst files, and it looked fine to me.. Also, the pre-commit hooks are passing on this file without any changes/failures..  Is there any possibility you can help me with this? 
   
   Sorry for the trouble :( I'm just kinda stuck and need an external opinion 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1178739874

   > The test [TestElasticsearchProviderProjectStructure](https://github.com/apache/airflow/blob/main/tests/always/test_project_structure.py#L473) is failing. From what I understand, it looks for all the available hooks, and looks that they have a matching example in the examples folder [actual test](https://github.com/apache/airflow/blob/main/tests/always/test_project_structure.py#L182).
   The thing is, that it's failing on the ElasticsearchHook which I deprecated (because I renamed it) in this PR. I don't want to add an example for a deprecated hook, because it kinda beets the purpose.. Do you have any tips?
   
   Look at the tests - there some other deprecated classes there mentioned in the exclusion list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915790308


##########
tests/system/providers/elasticsearch/example_elasticsearch_python_hook.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   I've merged the new example into the old examples file :) 
   Now it should give the example to both hooks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1178373316

   Docs now :( 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1178751094

   > > The test [TestElasticsearchProviderProjectStructure](https://github.com/apache/airflow/blob/main/tests/always/test_project_structure.py#L473) is failing. From what I understand, it looks for all the available hooks, and looks that they have a matching example in the examples folder [actual test](https://github.com/apache/airflow/blob/main/tests/always/test_project_structure.py#L182).
   > > The thing is, that it's failing on the ElasticsearchHook which I deprecated (because I renamed it) in this PR. I don't want to add an example for a deprecated hook, because it kinda beets the purpose.. Do you have any tips?
   > 
   > Look at the tests - there some other deprecated classes there mentioned in the exclusion list.
   
   Wow I completely missed the `DEPRECATED_CLASSES` somehow. Thanks for the tip! :)  I'll try that out after the build (that will probably fail, but I want to see if the build docs part succeeds after the rebase) :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1179041364

   Well. Not sure - seems like really "cryptic" error from Sphinx (which is known from rather cryptic messages). Look at the actual output of the sphix build (Above the summary) and you will find the warning there.
   
   Also there is apparently a bug (@mik-laj?) that in this case our error parsing added second .rst to display the line of .rst file that was actually causing it.
   
   However I made an intelligent guess and looked closely. From what I see, I think the lines below are all indented ONE SPACE MORE than all the others. so maybe fixing that will help.
   
   ```
       Connection types <connections/elasticsearch>
       Logging for Tasks <logging/index>
       Elasticsearch hook with native Python client <hooks/elasticsearch>
   ```
   
   Also few other tips - and ways I deal with similar cases (been there, done that many times). You can easily iterate locally with it (and this is what I do when I have similar issues with docs):
   
   ```
   breeze build-docs --package-filter apache-airlfow-provider-elasticsearch --docs-only
   ```
   
   This will run much faster than CI circle and allows you to also take a look at sphinx-generated intermediate artifacts which usually helps with debugging it. 
   
   If all else fails. I usually fall-back to just looking at teh source code of Sphins and finding what the error message actually means (i.e. when it will be produced). In this case: `WARNING: non-whitespace stripped by dedent` - my intelligent guess is that Sphinx took the first indentation depth and stripped the same identation from the next block (which lead to stripping some non-white space character) - that's why the guess that one-space-more of indentation in the first block might be the reason. But I have not looked it up in the sources yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915789926


##########
airflow/providers/elasticsearch/hooks/elasticsearch.py:
##########
@@ -93,3 +95,53 @@ def get_uri(self) -> str:
                 uri += '&'
 
         return uri
+
+
+class ElasticsearchHook(ElasticsearchSQLHook):
+    """
+    This class is deprecated and was renamed to ElasticsearchSQLHook.
+    Please use `airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchSQLHook`.
+    """
+
+    def __init__(self, *args, **kwargs):
+        warnings.warn(
+            """This class is deprecated.
+            Please use `airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchSQLHook`.""",
+            DeprecationWarning,
+            stacklevel=3,
+        )
+        super().__init__(*args, **kwargs)
+
+
+class ElasticsearchPythonHook(BaseHook):
+    """
+    Interacts with Elasticsearch. This hook uses the official Elasticsearch Python Client.
+
+    :param hosts: list: A list of a single or many Elasticsearch instances. Example: ["http://localhost:9200"]
+    :param es_conn_args: dict: Additional arguments you might need to enter to connect to Elasticsearch.
+                                Example: {"ca_cert":"/path/to/http_ca.crt", "basic_auth": "(user, pass)"}

Review Comment:
   I've added a new document for both hooks under hooks/elasticsearch.rst



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #24895:
URL: https://github.com/apache/airflow/pull/24895


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915643212


##########
airflow/providers/elasticsearch/hooks/elasticsearch_python.py:
##########
@@ -0,0 +1,56 @@
+#

Review Comment:
   I will add them to the same files, as I think having a file named `python.py` is a bit confusing..
   I don't know how it will play with the documentation files, but I'll try to make it work :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexkruc commented on a diff in pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
alexkruc commented on code in PR #24895:
URL: https://github.com/apache/airflow/pull/24895#discussion_r915790942


##########
tests/providers/elasticsearch/hooks/test_elasticsearch.py:
##########
@@ -20,8 +20,10 @@
 import unittest
 from unittest import mock
 
+from elasticsearch import Elasticsearch
+
 from airflow.models import Connection
-from airflow.providers.elasticsearch.hooks.elasticsearch import ElasticsearchHook
+from airflow.providers.elasticsearch.hooks.elasticsearch import ElasticsearchHook, ElasticsearchPythonHook

Review Comment:
   Done- the old hook tests are now using the new name.
   I've also added a test that checks the generation of a DepracationWarning for `ElasticsearchHook` :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24895: Adding ElasticserachPythonHook - ES Hook With The Python Client

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24895:
URL: https://github.com/apache/airflow/pull/24895#issuecomment-1178738179

   1. Rebase (I just did).
   2. Let's see. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org