You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/15 09:39:20 UTC

[GitHub] [airflow] KennethanCeyer opened a new issue #18264: An idea to add disable the option of XCOM log for the returned value

KennethanCeyer opened a new issue #18264:
URL: https://github.com/apache/airflow/issues/18264


   ### Description
   
   ## Purpose
   
   Starting with Airflow 2.0, using `PythonOperator` through `XComArg` design and `@task` decorator became easy and frequently used.
   `XComArg` of Airflow 2.0 supports to transmit `pd.DataFrame` data structure, and it is also possible to transmit huge data through backends such as `S3`.
   
   The problem is, When we use PythonOperator with huge data transmission in XCom, Every task makes a log when they are finished. The log contains XCom data, Which means that when the data is huge, Log also become huge, and Browser can not show those log to the user (Browser will be died due to the memory issue).
   
   So I think logging XCom data should be provided as a selectable option from the user.
   
   ## Scenario
   
   Suppose we send a `pd.DataFrame` of 1 million rows between tasks via Airflow. When data containing 1 million rows requires about 2 GB, the task that returns this data to XCom leaves a 2 GB log.
   
   ```python
   # an example, pseudo-code
   @task
   def get_original_data_from_web(urls: List[str]) -> pd.DataFrame:
       with ThreadPool() as pool:
           data = pool.map(process_web_data, urls) # 1 million rows, assume that this type is List[dict] data
           return pd.DataFrame(data)
   
   URLS = [....]
   with DAG(...) as dag:
       original_data_df = get_original_data_from_web(URLS) # This task will make a huge logs
       do_something(original_data_df)
   ```
   
   And when the user tries to get log data in the Airflow web...
   
   ![image](https://user-images.githubusercontent.com/7090315/133407925-82e15e12-3a73-48e2-ad42-ab441d2b0315.png)
   
   ## Why?
   
   The reason why you need the option to control the XCom log is as follows.
   
   - XCom logs are currently using the `INFO` level in global logging. Many users are using `INFO` level loggers, so it is not possible to change the logging level to obscure the XCom logs.
   - Airflow logging information is stored in DB. Therefore, without a separate logging cleaning configuration, an unexpected DB capacity limit may cause failure.
   
   ## Code
   
   ## Discussion
   
   **Design**
   
   I think there are two ways to control XCom logging.
   
   1. Controlled via `airflow.cfg` settings (global)
   2. Passing as a parameter to the constructor of the `XXXOperator (i.g. PythonOperator)` class (local)
   
   I'm thinking of adding the feature as a second way to give the user more control over their purpose, but it's by Airflow design, so I'd love to hear from you folks.
   
   ### Use case/motivation
   
   You can check the scenario in **Description** section.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #18264:
URL: https://github.com/apache/airflow/issues/18264#issuecomment-919863088


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #18264:
URL: https://github.com/apache/airflow/issues/18264#issuecomment-919881539


   Every `self.log` is named after the class’s full name, so you can disable the logger with [custom logging](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#advanced-configuration) via [standard logging configuration](https://docs.python.org/3/library/logging.config.html). I believe the logger used in PythonOperator would be named something like `airflow.operators.python.PythonOperator`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #18264:
URL: https://github.com/apache/airflow/issues/18264#issuecomment-920328670


   I saw similar problem in the past and I think it's worth handling (precisely because of the custom backends). Previously such logs could work fine with limited size of xcom  but this might not hold after allowing custom backends.
   
   So I think this is a valid comment however,  it should be handled per-operator. 
   
   There is simply no way to disable "globally"  all xcom related logs, because -similarly as in case of Python operator - they migh just log XCom value on their own and we simply do not know which logs are xcom-related. 
   
   @KennethanCeyer  - maybe you can make PR to change log level in Python to Debug, and maybe even you could make some smart search to see if there are other operators doing similar logs. and change those too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] KennethanCeyer commented on issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
KennethanCeyer commented on issue #18264:
URL: https://github.com/apache/airflow/issues/18264#issuecomment-972549530


   Close this issue due to #19378 has been merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] KennethanCeyer closed issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
KennethanCeyer closed issue #18264:
URL: https://github.com/apache/airflow/issues/18264


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] KennethanCeyer commented on issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
KennethanCeyer commented on issue #18264:
URL: https://github.com/apache/airflow/issues/18264#issuecomment-921681357


   @potiuk 
   Thank you for advising!
   
   First, I will modify the XCom related loggin's logLevel of `PythonOperator` from `info` to `debug`,
   And I will try to find out if the associated log exists in other operators as well,
   and if it does, I'll modify them as well.
   
   Please let me know if there is a better way 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] KennethanCeyer commented on issue #18264: An idea to add disable the option of XCOM log for the returned value

Posted by GitBox <gi...@apache.org>.
KennethanCeyer commented on issue #18264:
URL: https://github.com/apache/airflow/issues/18264#issuecomment-919886413


   ```python
   python_logger = logging.getLogger("airflow.operators.python.PythonOperator")
   python_logger.setLevel(logging.ERROR)
   ```
   
   Oh well, We might be able to solve it to some setting with the above method!
   Thanks for your comments! 👍 
   
   However, if it is not just `PythonOperator` that takes XCom related logs,
   it would still be good to add an option to disable XCom logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org