You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@dolphinscheduler.apache.org by Rick Cheng <ri...@gmail.com> on 2023/01/18 07:11:31 UTC

Re: [DISCUSS][Feature][Remote Logging] Add support for writing task logs to remote storage

Hi, Community

The following is the latest progress on this feature:

[Feature-13331](https://github.com/apache/dolphinscheduler/pull/13332) adds
support for writing task logs to OSS. Any comments or suggestions are
welcome.

[image: 截屏2023-01-18 14.20.57.png]
Here are some brief changes:

### Task log writing

* master / worker will send the task log to the remote storage
asynchronously after the task is finished if `remote.logging.enable=true`
(By default it's false)

### Task log reading

* master / worker will first read the task log if the log file exists on
the local file system
* if the task log file does not exist, master / worker will download the
task log file from the remote storage to the local file system and then
read the task log file

### Log Retention
* Log retention can be directly configured using the retention policy
provided by the remote storage
* E.g., [log retention on OSS](
https://help.aliyun.com/document_detail/326319.html)

Best Regards,
Rick Cheng


Rick Cheng <ri...@gmail.com> 于2022年12月8日周四 10:31写道:

> Hi community,
>
> Here are some discussions on the weekly meeting about this feature:
>
> **Q1: In the k8s environment, users can choose to mount persistent volumes
> (E.g., [OSS](https://help.aliyun.com/document_detail/130911.html)) to
> synchronize task logs to remote storage.**
> R1: This is indeed a way to synchronize logs to remote storage, and only
> need to mount persistent volumes (PV). But there are still some
> shortcomings as below:
> * **Efficiency**: Since the PV is connected to the remote storage, the
> speed of log writing will be reduced, which will further affect the task
> execution of the worker. On the contrary, uploading the task log to the
> remote storage asynchronously through the remote logging mechanism will not
> affect the execution of the task.
> * **Generality**: PV is not suitable for some remote storage, such as
> elasticsearch. And also it is not applicable to DS deployed in non-k8s
> environment.
>
> **Q2: Users can configure whether to use remote storage for task logs**
> R2: Yes, users can decide whether to enable log remote storage through
> configuration, and specify the corresponding configuration of remote
> logging.
>
> **Q3: The master-server also has task logs, which need to be uploaded to
> remote storage in a unified manner.**
> R3: Yes, users can set the master's task log related remote storage
> configuration in Master's configuration.
>
> **Q4: Is it possible to set the task log retention policy through the
> configuration supported by the remote storage itself?**
> R4: This is a good idea and it can simplify the design of remote logging,
> I'll look into it.
>
> Related issue: https://github.com/apache/dolphinscheduler/issues/13017
>
> Thanks again for all the suggestions at the weekly meeting, please correct
> me if I'm wrong.
>
> Best Regards,
> Rick Cheng
>
>
> Rick Cheng <ri...@gmail.com> 于2022年11月28日周一 13:24写道:
>
>> Hi community,
>>
>> Related issue: https://github.com/apache/dolphinscheduler/issues/13017
>>
>> Currently, DS only supports writing task logs to the local file system in
>> worker. So this issue discusses the feature design of remote logging.
>>
>> # Why remote logging?
>> * Avoid task log loss after worker is torn down
>> * Easier to obtain logs and troubleshoot after logs are aggregated in
>> remote storage
>> * Enhanced cloud-native support for DS
>>
>> # Feature Design
>>
>> ## Connect to different remote targets
>> DS can support a variety of common remote storage, and has strong
>> scalability to support other types of remote storage
>> * S3
>> * OSS
>> * ElasticSearch
>> * Azure Blob Storage
>> * Google Cloud Storage
>> * ...
>>
>> ## When to write logs to remote storage
>> Like airflow, DS writes the task logs to remote storage after the task
>> completes (success or fail).
>>
>> ## How to read logs
>> Since the task log is stored in both the worker's local and remote
>> storage, when the `api-server` needs to read the log of a certain task
>> instance, it needs to determine the reading strategy.
>>
>> Airflow first tries to read the logs stored remotely, and if it fails,
>> reads the local logs. But I prefer to try to read the local log first, and
>> then read the remote log if the local log file does not exist.
>>
>> We could discuss this further.
>>
>> ## Log retention strategy
>>
>> For example, the maximum capacity of remote storage can be set, and old
>> logs can be deleted by rolling.
>>
>> # Sub-tasks
>> WIP
>>
>> Any comments or suggestions are welcome.
>>
>> Best Regards,
>> Rick Cheng
>>
>