You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/17 00:23:33 UTC

[GitHub] [airflow] flolas opened a new pull request #17057: Strip non-ascii characters in pod name on k8s executor

flolas opened a new pull request #17057:
URL: https://github.com/apache/airflow/pull/17057


   Strip non-ascii characters in pod name on k8s executor.
   (Maybe we can use package unidecode to translate for example á -> a or ñ -> n, but I think that's not necessary, because this is only for the pod name, ignoring the characters is simpler)
   
   closes: #16992 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671856154



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       FWIW if we're going for a complex solution, Mozilla's [unicode-slugify](https://github.com/mozilla/unicode-slugify) normalises even more things, including characters from Chinese (my native language) which consists entirely non-ASCII characters 😁




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk removed a comment on pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk removed a comment on pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#issuecomment-882109416


   ![image](https://user-images.githubusercontent.com/595491/126080580-347eef62-45b4-4f07-8855-845f9d84468f.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#issuecomment-882113614


   some "real" tests failing in "upgrade" case - do not worry about MSSQl


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671876426



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Ah yeah. VERY GOOD CALL @uranusjr . I did not notice it's GPL. In this case we should definitely NOT bring it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671848040



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       BTW. In case you had any doubts, Polish is one of the hardest languages in the world.
   
   Here you can see about 1/3 of all the possible 'variants' of words that you can make from "play" = "gra". 
   
   ![Screenshot 2021-07-18 16 03 14](https://user-images.githubusercontent.com/595491/126070241-1f27188f-9eb5-4854-87f4-316d5c6e73d0.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671871903



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       I see no problem whatsoever with adding a proven dependency that does a good job and has rather straightforward and expected dependencies - especially if it can save not only us but also our current and future users from even seeing similar errors. 
   
   From the `product` point of view, if we can get such proven solution that saves us headeaches in the future is the right way. The case where POD has Chinese-only characters mentioned by @uranusjr WILL eventually happen, and when it will, we will have to do it anyway, so why not doing it now :).
   
   ```
   unicode-slugify==0.1.3
     - six [required: Any, installed: 1.16.0]
     - unidecode [required: Any, installed: 1.2.0]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671864779



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       @potiuk The issue its only for the pod name, do we really need to translate with a more complex solution? If so, i can make the changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17057:
URL: https://github.com/apache/airflow/pull/17057


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671869441



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       @potiuk Yeah, but we are adding a package dependency to airflow only for handling the complexity of translating chars in pod name which we already strip characters incompatible ascii chars like - or _.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671872559



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       I just realised unicode-slugify has a hard dependency on unidecode, which brings back the GPL issue. Since unicode-slugify is a pretty thin layer over unidecode anyway (plus unicodedata, which is built-in), it’s probably to hand-roll an implementation in Airflow instead. I’ll take some time to look into this tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17057:
URL: https://github.com/apache/airflow/pull/17057


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671866001



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Well, I'd argue that it can't get less complex than calling a "slugify" method - I think trying to fix it manually when you have good and proven library to do it (stamped by mozilla) and even (as @uranusjr pointed out) handles Chinese characters is in many ways simpler solution. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671872559



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       I just realised unicode-slugify has a hard dependency on unidecode, which brings back the GPL issue. Since unicode-slugify is a pretty thin layer over unidecode anyway (plus unicodedata, which is built-in), it’s probably to hand-roll an implementation based on text-unidecode in Airflow instead. I’ll take some time to look into this tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas edited a comment on pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
flolas edited a comment on pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#issuecomment-882107672


   @potiuk @uranusjr Could you take a look to the new commit? Uses the python-slugify package which doesnt have GPL issues. Feel free to change the current PR or create another PR for resolving the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671857206



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       That would be even better! BTW. I once investigated the problem (turned out to be MySQL encoding problem) where someone used 🦅 in the dag_id 😱 . So called "Eagle problem".  I wonder how unicode-slugify would deal with that one :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671857436



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Good enough:
   
   ![Screenshot 2021-07-18 17 12 13](https://user-images.githubusercontent.com/595491/126072419-1da3c9b4-1f34-437d-ad44-f0288ec60c63.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
flolas commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671870221



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       @potiuk anyways, its my opinion, if everybody agree adding the dependency unicode-uglify to airflow, i can make the changes in the PR...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671871903



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       I see no problem whatsoever with adding a proven dependency that does a good job and has rather straightforward and expected dependencies - especially if it can save not only us (but also our current and future users) from even seeing similar errors. 
   
   From the `product` point of view, if we can get such proven solution that saves us headeaches in the future is the right way. The case where POD has Chinese-only characters mentioned by @uranusjr WILL eventually happen, and when it will, we will have to do it anyway, so why not doing it now :).
   
   ```
   unicode-slugify==0.1.3
     - six [required: Any, installed: 1.16.0]
     - unidecode [required: Any, installed: 1.2.0]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671847272



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Ah I see the comment, above. But I think it would be better to use unidecode. Maybe a little extreme but there are a few words in Polish for example, that contain only, or mostly accented characters. Not that they are often used, but this might lead to quite some ambiguity. And it might be worse in other languages.
   
   Example words in Polish with only accented characters:
   
   żółć, łóż, łżą, żąć, żął
   
   Few short ones:
   
   łódź, łażącą, łożącą, łóżmyż, łóżże, łżącą, żąłeś, żęłaś, żółcą, żółcę
   
   Few longer words with mostly 
   
   niedołężność, współdźwięcznością, żółtoróżowością, pięćdziesięciopięcioipółlatkąście
   
   :D
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671847272



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Ah I see the comment, above. But I think it would be better to use unidecode. Maybe a little extreme but there are a few words in Polish for example, that contain only, or mostly accented characters. Not that they are often used, but this might lead to quite some ambiguity. And it might be worse in other languages.
   
   Example words in Polish with only accented characters:
   
   żółć, łóż, łżą, żąć, żął
   
   Few short ones:
   
   łódź, łażącą, łożącą, łóżmyż, łóżże, łżącą, żąłeś, żęłaś, żółcą, żółcę
   
   Few longer words with mostly accented ones:
   
   niedołężność, współdźwięcznością, żółtoróżowością, pięćdziesięciopięcioipółlatkąście
   
   :D
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] flolas commented on pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
flolas commented on pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#issuecomment-882107672


   @potiuk @uranusjr Can you take a look to the new commit? Uses the python-slugify package which doesnt have GPL issues. Feel free to change the current PR or create another PR for resolving the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671845987



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       How about using unidecode insteead of stripping? https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17057:
URL: https://github.com/apache/airflow/pull/17057


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671872559



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       I just realised unicode-slugify has a hard dependency on unidecode, which brings back the GPL issue. Since unicode-slugify is a pretty thin layer over unidecode anyway, it’s probably to hand-roll an implementation in Airflow instead. I’ll take some time to look into this tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671586155



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       ```suggestion
       return string.encode('ascii', 'ignore').decode().lower()
   ```
   
   I think this will be a bit more performant (less split and join happening)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671866001



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Well, I'd argue that it can't get less complex than calling a "slugify" method - I think trying to fix it manually when you have good and proven library to do it (stamped by mozilla) and even (as @uranusjr pointed out) handles Chinese characters is in many ways more complex solution. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671866001



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       Well, I'd argue that it can't get less complex than calling a "slugify" method - I think trying to fix it manually when you have good and proven library to do it (stamped by mozilla) that even (as @uranusjr pointed out) handles Chinese characters is in many ways more complex solution. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#issuecomment-882109416


   ![image](https://user-images.githubusercontent.com/595491/126080580-347eef62-45b4-4f07-8855-845f9d84468f.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671856154



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       FWIW if we're going for a complex solution, Mozilla's [unicode-slugify](https://github.com/mozilla/unicode-slugify) normalises even more things, including characters from Chinese (my native language) which consists entirely of non-ASCII characters 😁




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#issuecomment-882109094






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671848040



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       BTW. In case you had any doubts, Polish is one of the hardest languages in the world.
   
   Here you can see about 1/3 of all the possible 'variants' of words that you can make from "plays" = "gra". 
   
   ![Screenshot 2021-07-18 16 03 14](https://user-images.githubusercontent.com/595491/126070241-1f27188f-9eb5-4854-87f4-316d5c6e73d0.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671856154



##########
File path: airflow/kubernetes/kubernetes_helper_functions.py
##########
@@ -37,7 +37,7 @@ def _strip_unsafe_kubernetes_special_chars(string: str) -> str:
     :param string: The requested Pod name
     :return: Pod name stripped of any unsafe characters
     """
-    return ''.join(ch.lower() for ch in list(string) if ch.isalnum())
+    return ''.join(ch.lower() for ch in list(string) if ch.isalnum()).encode('ascii', 'ignore').decode()

Review comment:
       FWIW if we're going for a complex solution, Mozilla's [unicode-slugify](https://github.com/mozilla/unicode-slugify) normalises even more things, including characters from Chinese (my native language) 😁




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17057: Strip non-ascii characters in pod name on k8s executor

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17057:
URL: https://github.com/apache/airflow/pull/17057#discussion_r671889253



##########
File path: tests/executors/test_kubernetes_executor.py
##########
@@ -64,6 +64,9 @@ def _cases(self):
             ("MYDAGID", "MYTASKID"),
             ("my_dag_id", "my_task_id"),
             ("mydagid" * 200, "my_task_id" * 200),
+            ("my_dág_id", "my_tásk_id"),
+            ("Компьютер", "niedołężność"),

Review comment:
       :D




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org