You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/20 22:35:13 UTC

[GitHub] [airflow] NiklasBeierl opened a new pull request, #23136: Add special exception for "host field is not hashable"

NiklasBeierl opened a new pull request, #23136:
URL: https://github.com/apache/airflow/pull/23136

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #15613
   related: #15613
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   I am adding a special exception to the es_task_handler for cases in which the "hosts" field in an elastic record is not hashable. (i.e. a dict) It seems to be a common problem, but its not really a problem with airflow itself, rather "incomaptible defaults" of airflow and filebeat. See my comment here: https://github.com/apache/airflow/issues/15613#issuecomment-1104487752
   
   I considered making `_group_logs_by_host` "smart" and let it handle "common" `host` objects. (Like the ones add_host_metadata or filebeat itself are producing) 
   But it struck me as risky in case someone does something else with the `host` field. Making an assumption like taking the `host.hostname` if it is available might have unintended side-effects if a worker node changes hostname / ip / Mac. Turning the entire `host` into a string might also cause issues if the `host` object happens to contain volatile values like "uptime" or "load". 
   
   closes: #15613
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r855052079


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -22,7 +22,7 @@
 from datetime import datetime
 from operator import attrgetter
 from time import time
-from typing import List, Optional, Tuple, Union
+from typing import Hashable, List, Optional, Tuple, Union

Review Comment:
   Use `collections.abc.Hashable` instead (I believe that’s the canonical import)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r857991677


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -153,7 +153,17 @@ def _group_logs_by_host(self, logs):
         grouped_logs = defaultdict(list)
         for log in logs:
             key = getattr(log, self.host_field, 'default_host')
-            grouped_logs[key].append(log)
+            
+            try:
+                grouped_logs[key].append(log) 
+            except TypeError as e:
+                if not isinstance(key, Hashable): 
+                    raise ValueError("The host field in all log records needs to be hashable. "
+                    "If you are using filebeat, read here: "
+                    "https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e

Review Comment:
   Actually better solution will be to copy the explanation to our ElasticSearch documentation (at airflow.apache.org) and link from it to there. The error message should explain the reason and link to the detailed discussion/explanation why - but linking to an issue is only fine only in a source comment, rather than in a user message. Theree we should only link to a documentation we control.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r875019456


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -153,7 +153,17 @@ def _group_logs_by_host(self, logs):
         grouped_logs = defaultdict(list)
         for log in logs:
             key = getattr(log, self.host_field, 'default_host')
-            grouped_logs[key].append(log)
+            
+            try:
+                grouped_logs[key].append(log) 
+            except TypeError as e:
+                if not isinstance(key, Hashable): 
+                    raise ValueError("The host field in all log records needs to be hashable. "
+                    "If you are using filebeat, read here: "
+                    "https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e

Review Comment:
   I have no idea about details of it to be hones. - I would have to - similarly to you dive deep in the code to understand it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r855052449


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -153,7 +153,17 @@ def _group_logs_by_host(self, logs):
         grouped_logs = defaultdict(list)
         for log in logs:
             key = getattr(log, self.host_field, 'default_host')
-            grouped_logs[key].append(log)
+            
+            try:
+                grouped_logs[key].append(log) 
+            except TypeError as e:
+                if not isinstance(key, Hashable): 
+                    raise ValueError("The host field in all log records needs to be hashable. "
+                    "If you are using filebeat, read here: "
+                    "https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e

Review Comment:
   Instead of linking to GitHub, we should have documentation for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
eladkal commented on PR #23136:
URL: https://github.com/apache/airflow/pull/23136#issuecomment-1140395822

   @NiklasBeierl can you fix the static checks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r875020454


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -153,7 +153,17 @@ def _group_logs_by_host(self, logs):
         grouped_logs = defaultdict(list)
         for log in logs:
             key = getattr(log, self.host_field, 'default_host')
-            grouped_logs[key].append(log)
+            
+            try:
+                grouped_logs[key].append(log) 
+            except TypeError as e:
+                if not isinstance(key, Hashable): 
+                    raise ValueError("The host field in all log records needs to be hashable. "
+                    "If you are using filebeat, read here: "
+                    "https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e

Review Comment:
   Actually - I know that - (Re-read this) - I believe each logging handler can be configured with parameters - you can read it in "logging" configugration in our docs./ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #23136:
URL: https://github.com/apache/airflow/pull/23136#issuecomment-1196117809

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] NiklasBeierl commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
NiklasBeierl commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r872293252


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -22,7 +22,7 @@
 from datetime import datetime
 from operator import attrgetter
 from time import time
-from typing import List, Optional, Tuple, Union
+from typing import Hashable, List, Optional, Tuple, Union

Review Comment:
   Done: 27aa8e3c8c1ddd5a6b22d562d1ae1006c51b1ded



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #23136:
URL: https://github.com/apache/airflow/pull/23136#issuecomment-1104522731

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] NiklasBeierl commented on a diff in pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
NiklasBeierl commented on code in PR #23136:
URL: https://github.com/apache/airflow/pull/23136#discussion_r872299155


##########
airflow/providers/elasticsearch/log/es_task_handler.py:
##########
@@ -153,7 +153,17 @@ def _group_logs_by_host(self, logs):
         grouped_logs = defaultdict(list)
         for log in logs:
             key = getattr(log, self.host_field, 'default_host')
-            grouped_logs[key].append(log)
+            
+            try:
+                grouped_logs[key].append(log) 
+            except TypeError as e:
+                if not isinstance(key, Hashable): 
+                    raise ValueError("The host field in all log records needs to be hashable. "
+                    "If you are using filebeat, read here: "
+                    "https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e

Review Comment:
   Get your point, just got set up with breeze to write some proper documentation. 
   
   I have a question: `airflow.providers.elasticsearch.log.es_task_handler.ElasticsearchTaskHandler` has `offset_field` and `host_field` paramaters in its constructor. I have a hard time figuring out where these are being set / come from. Are they configurable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed pull request #23136: Add special exception for "host field is not hashable"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #23136: Add special exception for "host field is not hashable"
URL: https://github.com/apache/airflow/pull/23136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org