You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2020/10/20 07:58:51 UTC

[GitHub] [incubator-dolphinscheduler] jepsonzhang opened a new issue #3618: [Bug][worker] Too many open files

jepsonzhang opened a new issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618


   *For better global communication, please give priority to using English description, thx! *
   
   *Please review https://dolphinscheduler.apache.org/en-us/docs/development/issue.html when describe an issue.*
   
   **Describe the bug**
   A clear and concise description of what the bug is.
   
   **To Reproduce**
   
   版本是 [1.3.1-release]
   Steps to reproduce the behavior, for example:
   1. 在3台物理机部署了ds系统
   2. 在一个项目中创建了1万个流,每个流1个shell任务,内容为简单的echo "xxx",流调度的频率为每小时
   3. 当系统运行一段时间之后,大量的任务失败,抛出的错误为
   4. See error
   ![image](https://user-images.githubusercontent.com/18161585/91426098-a0dc1780-e88e-11ea-8ac1-a6e943b4400d.png)
   我用lsof 查看了下,发现ds占用的文件句柄到达了几十万,每个节点都占用了几十万,而我已经把对应的系统用户的/etc/security/limit.conf中的配置调整到了655350,但是依然爆炸了。
   经过我的排查,发现是ds的日志查看机制引起的,ds中为了方便的查看每个任务的日志,将每个任务运行的日志,写到了日志目录下的baseDir/${process_define_id}/${process_instance_id}/${task_id}.log,这样每次运行一个任务,都会生成一个日志文件,随着系统的运行,日志文件越来越多,占用了文件句柄数越来越多,触发了上面的异常,我觉得这个是日志的机制问题,在我看来,
   要么去掉log,直接用文件读写的方式来实现输出,这种情况下改动较小,但是依然会导致服务器上的小文件数量过多的问题
   要么将日志的内容写到数据库或其他便于检索的存储引擎中,不要使用本地文件的方式来搞。
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] yangyichao-mango commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
yangyichao-mango commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-685290337


   I think maybe we can discuss about this feature about writing log to fs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] geosmart edited a comment on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
geosmart edited a comment on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681893722


   just like airflow, wirte task run log to disk produce many small file,
   but airflow  produce plugin to read log from remote (like es), 
   so you can use logstash to collect task logs,and clean the log after collected. 
   
   ---
   
   ds can add some pr to read log from remote log server
   
   this is the airflow Writing Logs we can reference
   * Writing Logs Locally
   * Writing Logs to Amazon S3
   * Writing Logs to Azure Blob Storage
   * Writing Logs to Google Cloud Storage
   * Writing Logs to Elasticsearch
   * Writing Logs to Elasticsearch over TLS


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] felix-thinkingdata commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
felix-thinkingdata commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681890102


   I'll try to reproduce it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] geosmart edited a comment on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
geosmart edited a comment on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681893722


   just like airflow, wirte task run log to disk produce many small file,
   but airflow  produce plugin to read log from remote (like es), 
   so you can use logstash to collect task logs,and clean the log after collected. 
   
   ---
   
   ds can add some pr to read log from remote log server


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] geosmart edited a comment on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
geosmart edited a comment on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681893722






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] geosmart commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
geosmart commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-684902809


   > How to quickly configure 10,000 workflows
   
   use api with token to  import 10000 same dag?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
dailidong commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-683711807


   could you give the detail info for running `lsof ` and `ulimit -a`, I want to locate the problem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] geosmart commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
geosmart commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681893722


   just like airflow, wirte task run log to disk produce many small file,
   but airflow  produce plugin to read log from remote (like es), 
   so you can use logstash to collect task logs,and clean the log after collected. 
   ---
   ds can add some pr to read log from remote log server


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] xingchun-chen closed issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
xingchun-chen closed issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
dailidong commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-692612767


   > > could you give the detail info for running `lsof ` and `ulimit -a`, I want to locate the problem
   > 
   > [lsof_result.zip](https://github.com/apache/incubator-dolphinscheduler/files/5167299/lsof_result.zip)
   
   please give  the log of ` lsof -p  PID` for worker server, this zip shows too many repeat messages
   
   By the way, please give a message on my wechat(510570367), can't find your wechat,  sorry 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] felix-thinkingdata commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
felix-thinkingdata commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-684856363


   How to quickly configure 10,000 workflows


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] jepsonzhang commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
jepsonzhang commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686347632


   > This issue seems to related with Logback SiftingAppender's setting. Please check http://logback.qos.ch/manual/appenders.html#SiftingAppender
   > 
   > there are 2 settings, one is:
   > 
   > * timeout:  The default value for timeout is 30 minutes.    You can decrease this timeout if necessary
   > 
   > the other is:
   > 
   > maxAppenderCount, which is unlimited by default. I think you may want to set it to a fix number during stress test, like 10000?
   > 
   > @jepsonzhang can you try these 2 settings in your stress test?
   
   sorry for reply later,my job is busy,but the paramter still make me puzzled,my test flow period is one hour,if the param timeout is 30 minutes, it shouldn't cause this problem. the maxAppenderCount param ,i will have a try to limit this two parameters


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] jepsonzhang commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
jepsonzhang commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681855984


   ![xxxx](https://user-images.githubusercontent.com/18161585/91427398-6c695b00-e890-11ea-8c4f-7194e91c4123.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] jepsonzhang commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
jepsonzhang commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686953273


   > This issue seems to related with Logback SiftingAppender's setting. Please check http://logback.qos.ch/manual/appenders.html#SiftingAppender
   > 
   > there are 2 settings, one is:
   > 
   > * timeout:  The default value for timeout is 30 minutes.    You can decrease this timeout if necessary
   > 
   > the other is:
   > 
   > maxAppenderCount, which is unlimited by default. I think you may want to set it to a fix number during stress test, like 10000?
   > 
   > @jepsonzhang can you try these 2 settings in your stress test?
   
   ![image](https://user-images.githubusercontent.com/18161585/92208733-6857c180-eebe-11ea-9668-412e95deb60e.png)
   the bug still happened 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong closed issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
dailidong closed issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong edited a comment on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
dailidong edited a comment on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-692612767


   > > could you give the detail info for running `lsof ` and `ulimit -a`, I want to locate the problem
   > 
   > [lsof_result.zip](https://github.com/apache/incubator-dolphinscheduler/files/5167299/lsof_result.zip)
   
   please give  the log of ` lsof -p  PID` for worker server, this zip shows too many repeat messages
   
   By the way, please give a message to my wechat(510570367), can't find your wechat,  sorry 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Baoqi edited a comment on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
Baoqi edited a comment on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686325576


   This issue seems to related with Logback SiftingAppender's setting. Please check http://logback.qos.ch/manual/appenders.html#SiftingAppender
   
   there are 2 settings, one is:
   
      - timeout:  The default value for timeout is 30 minutes.    You can decrease this timeout if necessary 
   
   the other is:  
   
   maxAppenderCount, which is unlimited by default. I think you may want to set it to  a fix number during stress test, like 10000?
   
   
    
   @jepsonzhang can you try these 2 settings in your stress test?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] jepsonzhang commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
jepsonzhang commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686342794


   > could you give the detail info for running `lsof ` and `ulimit -a`, I want to locate the problem
   
   ![image](https://user-images.githubusercontent.com/18161585/92090721-587aa780-ee02-11ea-9a14-66a5bd3f9608.png)
   
   the file of lsof is too big,i have a try to upload


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] dailidong commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
dailidong commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-707177699


   temporarily close this issue, if other user also meet this problem , please reopen it 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] geosmart edited a comment on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
geosmart edited a comment on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-684902809


   > How to quickly configure 10,000 workflows
   
   how about use rest api with token to  import 10000 same dag?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] jepsonzhang commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
jepsonzhang commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686345295


   > could you give the detail info for running `lsof ` and `ulimit -a`, I want to locate the problem
   
   
   [lsof_result.zip](https://github.com/apache/incubator-dolphinscheduler/files/5167299/lsof_result.zip)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] zhuangchong commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
zhuangchong commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-681887376


   +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] jepsonzhang commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
jepsonzhang commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686941016


   > This issue seems to related with Logback SiftingAppender's setting. Please check http://logback.qos.ch/manual/appenders.html#SiftingAppender
   > 
   > there are 2 settings, one is:
   > 
   > * timeout:  The default value for timeout is 30 minutes.    You can decrease this timeout if necessary
   > 
   > the other is:
   > 
   > maxAppenderCount, which is unlimited by default. I think you may want to set it to a fix number during stress test, like 10000?
   > 
   > @jepsonzhang can you try these 2 settings in your stress test?
   
   ![image](https://user-images.githubusercontent.com/18161585/92206609-5f64f100-eeba-11ea-98e6-1f042e54db9b.png)
   i change the conf file of the worker ,and then restart the dolphinscheduer cluster ,wait my test result ,i will report later


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-dolphinscheduler] Baoqi commented on issue #3618: [Bug][worker] Too many open files

Posted by GitBox <gi...@apache.org>.
Baoqi commented on issue #3618:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3618#issuecomment-686325576


   This issue seems to related with Logback SiftingAppender's setting. Please check http://logback.qos.ch/manual/appenders.html#SiftingAppender


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org