You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/02 15:45:11 UTC

[GitHub] [airflow] Narendra-Neerukonda opened a new issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Narendra-Neerukonda opened a new issue #17993:
URL: https://github.com/apache/airflow/issues/17993


   ### Description
   
   Airflow uses the system default encoding for reading task logs. This creates problem on few systems when reading a log file with characters out of system default range (Ex: out of ascii range) will cause error like "codec can't decode byte **** in position **** "
   
   ### Use case/motivation
   
   We should allow users to specify which default encoding they would like to use for reading and writing logs. If this option is not specified, the default behavior should be to read/write in utf-8
   
   ### Related issues
   
   #16834 
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anything about reading default encoding in `airflow.cfg` and set default encoding: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'setup.cfg` is not possible. This is not a good way to achieve that IMHO (bacause of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911960059


   @Narendra-Neerukonda  My point is that I believe you "CAN'T" reliably set encoding in Python code in all scenarios. Once Python interpreter starts, you are basically done (and this is inherently because different libraries of python might be initialized in different sequence and depending on how you enter the interpreter, the sequence might be different). This at least has been my experience so far. 
   
   The only way I could reliably in the past override the default encoding in complex python using multi-processing was to set the right LANG variables in the OS.
   
   And we parse the .cfg file in Python after it starts, and when we parse the file, we've already initialized the parts related to encoding  and used the encoding at least once (to parse the .cfg file), And then the defaullt encoding might be cached by various libraries that might or might not be pre-loaded before.
   
   IMHO the ONLY way to make it happen is at most to check in airflow if the encoding is as expected and fail if it is not directing the user on how to set LANG variables properly to set the encoding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anything about reading default encoding from `airflow.cfg` and set default encoding based on that: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'setup.cfg` is not possible. This is not a good way to achieve that IMHO (because of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911869434


   Question - how do you want to do it ? 
   
   From what I know, the best and recommended way to set the encoding for python is to configure the right language settings in  environment (basically LANG settings).  Any attempt to do it in python code is a futile effort and is a band-aid rather than solution (especially when you use multiprocessing, forking and multiple entry-points which make Python interpreted started in a new process sometimes in unexpected ways, initialising and importing libraries in different sequence in different processes.
   
   Last time when I checked the only recommended way to set encoding for python was to properly set LANG variables in the underlying operating system. 
   
   Has anything changed since ? What is your idea to approach it and how does it play with all the various multi-process pieces we have (specifically: Celery, Kubernetes, Webserver, Scheduler and LocalExecutor - each use a different way of running  Python processes / multi-processing/billiard).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911975711


   Hi @potiuk 
   This feature request is in reference to your comment: https://github.com/apache/airflow/issues/16834#issuecomment-877607117
   
   Can you please once confirm if we are on the same page. This feature request is just to make the utf-8 encoding part in your comment configurable from airflow.cfg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anything about reading default encoding from `airflow.cfg` and set default encoding based on that: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'setup.cfg` is not possible. This is not a good way to achieve that IMHO (bacause of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911869434


   Question - how do you want to do it ? 
   
   From what I know, the best and recommended way to set the encoding for python is to configure the right language settings in  environment (basically LANG settings).  Any attempt to do it in python code is a futile effort and is a band-aid rather than solution (especially when you use multiprocessing, forking and multiple entry-points which make Python interpreted started in a new process sometimes in unexpected ways, initialising and importing libraries in different sequence in different processes.
   
   Last time when I checked the only recommended way to set encoding for python was to properly set LANG variables in the underlying operating system. 
   
   Has anything changed since ? What is your idea to approach it and how does it play with all the various multi-process pieces we have (specifically: Celery, Kubernetes, Webserver, Scheduler and LocalExecutor - each use a different way of running underlying Python processes).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anything about reading default encoding from `airflow.cfg` and set default encoding based on that: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'airflow.cfg` is not possible. This is not a good way to achieve that IMHO (because of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911826531


   Extension to #17965 
   Related to #16834 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anythint to reading default encoding in `airflow.cfg` and set default encoding: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8". But I think setting default encoding in python code based on 'setup.cfg` is not possible. This is not a good way to achieve that IMHO (bacause of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911916814


   @potiuk ,
   As of now i was thinking to allow users to specify the desired encoding for task logs in airflow.cfg and use utf-8 as default in case users don't specify (only for reading/writing task logs). As far as i know, most systems stick well with utf-8. In the issue associated with this feature, the problem i noticed was that the system default was set to ANSI-1968(in my case, if i recollect the name properly). I feel not all users might be aware of this setting(including me till few days ago) and setting utf-8 as default shouldn't create a problem as all characters in day-day life are covered in it. I feel if this part is not standardized, it might continue to create problems like #16834 with various encodings.
   
   If this approach is not satisfactory, can you please point me in the right direction :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911916814


   @potiuk ,
   As of now i was thinking to allow users to specify the desired encoding for task logs in airflow.cfg and use utf-8 as default in case users don't specify (only for reading/writing task logs). As far as i know, most systems stick well with utf-8. In the issue associated with this feature, the problem i noticed was that the system default was set to ANSI-1968(in my case, if i recollect the name properly). I feel not all users might be aware of this setting(including me till few days ago) and setting utf-8 as default shouldn't create a problem as all characters in day-day life are covered in it. I feel if this part is not standardized, it might continue to create problems like #16834 with various encodings.
   
   If this approach is not satisfactory, can you please guide me in the right direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anythint to reading default encoding in `airflow.cfg` and set default encoding: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'setup.cfg` is not possible. This is not a good way to achieve that IMHO (bacause of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911869434


   Question - how do you want to do it ? 
   
   From what I know, the best and recommended way to set the encoding for python is to configure the right language settings in  environment (basically LANG settings).  Any attempt to do it in python code is a futile effort and is a band-aid rather than solution This is especially when you use multiprocessing, forking and multiple entry-points which make Python interpreter started in a new process sometimes in unexpected ways, initialising and importing libraries in different sequence in different processes.
   
   Last time when I checked the only recommended way to set encoding for python was to properly set LANG variables in the underlying operating system. 
   
   Has anything changed since ? What is your idea to approach it and how does it play with all the various multi-process pieces we have (specifically: Celery, Kubernetes, Webserver, Scheduler and LocalExecutor - each use a different way of running  Python processes / multi-processing/billiard).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911985020


   It's nothing personal @Narendra-Neerukonda - It's just comment that I think what you proposed (reading from config and making sure that it is used for all reads/writes by setting default encoding) is not feasible. Prove me I am wrong. I lost many hours on trying to do similar things in the past and failed miserably, but maybe I am simply wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anything about reading default encoding from `airflow.cfg` and set default encoding: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'setup.cfg` is not possible. This is not a good way to achieve that IMHO (bacause of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-912107128


   Ah. I see. Apologies -  this time It was also my extrapolation of what "default encoding" means :D. If we do not try to set "global default enconding" but limit it to log writes/reads and try to male sure we cover all cases, this is all good. 
   
   Glad that we sorted it out :) and thanks in return for patience on your side. This is the bad side of the written communication that assumptions can take over.
   
   Yeah. The way you propose is good - we probably should try to find  some simple ways of passing the value without too much overhead. Maybe one part of it - to add it to LoggingMixin for the "write part" somehow (or make a way to inject it in the logger hierarchy) ? It's used pretty universally (it's part of the BaseOperator). And for reading it should be easy cause there are just a very few places where logs are read.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-912012496


   Ahh... All well from my end too @potiuk
   Very thankful for your patience in explaining the details above.  
   
   I was just trying to add a config like say "task_log_encoding" in airflow.cfg in reference to a comment: https://github.com/apache/airflow/pull/17965#discussion_r700831719 which is from the PR raised to close #16834 
   
   This config would only be used to read/write the task logs to prevent any task log rendering issues in the UI.
   Ex: with open(task_log_filename, 'rw', encoding=airflow_config.get("task_log_encoding"), errors="replace"):
   
   Please let me know if i can proceed on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911916814


   @potiuk , I'll need some time to research on this to find the best solution. 
   As of now i was thinking to allow users to specify the desired encoding for task logs in airflow.cfg and use utf-8 as default in case users don't specify (only for reading/writing task logs). As far as i know, most systems stick well with utf-8. In the issue associated with this feature, the problem i noticed was that the system default was set to ANSI-1968(in my case, if i recollect the name properly). I feel not all users might be aware of this (including me till few days ago) and setting utf-8 as default shouldn't create a problem as all characters in day-day life are covered in it. I feel if this part is not standardized, it might continue to create problems like #16834 with various encodings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911916814


   @potiuk , I'll need some time to research on this to find the best solution. 
   As of now i was thinking to allow users to specify the desired encoding for task logs in airflow.cfg and use utf-8 as default in case users don't specify (only for reading/writing task logs). As far as i know, most systems stick well with utf-8. In the issue associated with this feature, the problem i noticed was that the system default was set to ANSI-1968(in my case, if i recollect the name properly). I feel not all users might be aware of this setting(including me till few days ago) and setting utf-8 as default shouldn't create a problem as all characters in day-day life are covered in it. I feel if this part is not standardized, it might continue to create problems like #16834 with various encodings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911869434


   Question - how do you want to do it ? 
   
   From what I know, the best and recommended way to set the encoding for python is to configure the right language settings in  environment (basically LANG settings).  Any attempt to do it in python code is a futile effort and is a band-aid rather than solution This is especially when you use multiprocessing, forking and multiple entry-points which make Python interpreted started in a new process sometimes in unexpected ways, initialising and importing libraries in different sequence in different processes.
   
   Last time when I checked the only recommended way to set encoding for python was to properly set LANG variables in the underlying operating system. 
   
   Has anything changed since ? What is your idea to approach it and how does it play with all the various multi-process pieces we have (specifically: Celery, Kubernetes, Webserver, Scheduler and LocalExecutor - each use a different way of running  Python processes / multi-processing/billiard).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda removed a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda removed a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911826531


   Extension to #17965 
   Related to #16834 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-912659161


   I'll be back with a PR for implementation review in few days (with all the above suggestions).
   Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda removed a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda removed a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911823175


   i would like to work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911916814


   @potiuk , I'll need some time to research on this to find the best solution. 
   As of now i was thinking to allow users to specify the desired encoding for task logs in airflow.cfg and use utf-8 as default in case users don't specify (only for reading/writing task logs). As far as i know, most systems stick well with utf-8. In the issue associated with this feature, the problem i noticed was that the system default was set to ANSI-1968(in my case, if i recollect the name properly). I feel not all users might be aware of this (including me till few days ago) and setting utf-8 as default shouldn't create a problem as all characters in day-day life are covered in it. I feel if this part is not standardized from airflow, it might continue to create problems like #16834 with various encodings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-912012496


   Ahh... All well from my end too @potiuk
   Very thankful for your patience in explaining the details above.  I understood what you are trying to convey.
   
   I was just trying to add a config like say "task_log_encoding" in airflow.cfg in reference to a comment: https://github.com/apache/airflow/pull/17965#discussion_r700831719 which is from the PR raised to close #16834 
   
   This config would only be used to read/write the task logs to prevent any task log rendering issues in the UI.
   Ex: with open(task_log_filename, 'rw', encoding=airflow_config.get("task_log_encoding"), errors="replace"):
   
   Please let me know if i can proceed on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Narendra-Neerukonda commented on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
Narendra-Neerukonda commented on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911823175


   i would like to work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911978583


   I never said anything about reading default encoding from `airflow.cfg` and set default encoding based on that: 
   
   > It would be great to standardise encoding to "utf-8" (default) when writing and reading logs. Seems like this should be a good first issue for someone. @internetcoffeephone - maybe you would like to contribute a fix for that?
   
   I thought there more about making sure it is ALWAYS  "utf-8" (i.e. EXPLICITLY find all places and set "utf-8" there). But I think setting default encoding in python code based on 'airflow.cfg` is not possible. This is not a good way to achieve what I commented about IMHO (because of the reasons I explained above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17993: Allow the default encoding to be set in airflow config to read/write task logs

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17993:
URL: https://github.com/apache/airflow/issues/17993#issuecomment-911960059


   @Narendra-Neerukonda  My point is that I believe you "CAN'T" reliably set encoding in Python code in all scenarios. Once Python interpreter starts, you are basically done (and this is inherently because different libraries of python might be initialized in different sequence and depending on how you enter the interpreter, the sequence might be different). This at least has been my experience so far. 
   
   The only way I could reliably in the past override the default encoding in complex python using multi-processing was to set the right LANG variables in the OS.
   
   And remember that we parse the .cfg file in Python after it starts, and when we parse the file, we've already initialized the parts related to encoding  and used the encoding at least once (to parse the .cfg file), And then the defaullt encoding might be cached by various libraries that might or might not be pre-loaded before.
   
   IMHO the ONLY way to make it happen is at most to check in airflow if the encoding is as expected and fail if it is not directing the user on how to set LANG variables properly to set the encoding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org