You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Robert Metzger <rm...@apache.org> on 2020/07/03 11:43:32 UTC

Re: datadog failed to send report

Hi,
could this be another symptom of this issue:
https://issues.apache.org/jira/browse/FLINK-16611?

I guess you'll have to ask DataDog to check at their end, maybe you are
running into some rate limit there?

On Fri, Jun 26, 2020 at 5:42 PM seeksst <se...@163.com> wrote:

>
>
>  原始邮件
> *发件人:* seeksst<se...@163.com>
> *收件人:* Fanbin Bu<fa...@coinbase.com>
> *发送时间:* 2020年6月26日(周五) 23:36
> *主题:* Re: datadog failed to send report
>
> Hi, I’m sorry for not explaining it clearly and misread the exception.
>
> log4j.logger.org.apache.flink.metrics.datadog.DatadogHttpClient=ERROR
>
> log4j.logger.org.apache.flink.runtime.metrics will not work on flink.metrics, it effect on flink.runtime.metrics。
>
>
> if it does work again, you can see that there are many log profiles in the
> folder /conf.
>
> Modifying config is helpful to control the log output. If it doesn’t
> work,may be log4j.properties is not being used.
>
> You can read this artical for answers[1]. If you’re still not sure, you
> can change all. A more granular configuration is recommended.
>
>
>
> I’m not familiar with datadog (I use influxdb to collect metrics). but i
> think if it can collect metrics, and network is not a problem, the
> bottleneck may be processing the request but not sure. SocketTimeoutException
> can occur in serveral situations:
>
> 1.the network is down
>
> you think the network is ok
>
> 2.server processing is slow
>
> datadog may deal many requests, and can’t answer fast.
>
> you can check cpu usage of the datadog machine. Sometimes it depends on
> the program, if it use one thread deal all request(this is something that i
> don’t know about datadog).if cup usage is high, this may be reason, if not,
> need know about datadog.
>
>   3.slow network transmission
>
> you need check network,whether the network traffic is full or the machine
> physical location is far away.
>
> you can also find ways to adjust the timeout.
>
>   4.your job frequently triggered full gc.
>
> you can check gc log, this need to edit flink-conf.yml
>
>        something like : env.java.opts.taskmanager:
> -Xloggc:<LOG_DIR>/taskmanager-gc.log
>
> Best wish to you.
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/logging.html
>
>  原始邮件
> *发件人:* Fanbin Bu<fa...@coinbase.com>
> *收件人:* seeksst<se...@163.com>
> *发送时间:* 2020年6月26日(周五) 05:38
> *主题:* Re: datadog failed to send report
>
> this does not help.
>
> log4j.logger.org.apache.flink.runtime.metrics=ERROR
>
>
> i believe all machines can telnet datadog port since there are other metrics reported correctly.
>
> how do i check the number of requests capacity?
>
>
> On Tue, Jun 23, 2020 at 11:32 PM seeksst <se...@163.com> wrote:
>
>> Hi,
>>
>>
>> If you don’t care about losing some metrics, you can edit
>> log4j.properties to ignore it.
>>
>> log4j.logger.org.apache.flink.runtime.metrics=ERROR
>>
>> BTW, Whether all machines can telnet datadog port?
>>
>> Whether the number of requests exceeds the datadog's processing capacity?
>>
>>
>>  原始邮件
>> *发件人:* Fanbin Bu<fa...@coinbase.com>
>> *收件人:* user<us...@flink.apache.org>
>> *发送时间:* 2020年6月24日(周三) 12:05
>> *主题:* datadog failed to send report
>>
>> Hi,
>>
>> Does any have any idea on the following error msg: (it flooded my task
>> manager log)
>> I do have datadog metrics present so this is probably only happens for
>> some metrics.
>>
>> 2020-06-24 03:27:15,362 WARN  org.apache.flink.metrics.datadog.DatadogHttpClient            - Failed sending request to Datadog
>> java.net.SocketTimeoutException: timeout
>> 	at org.apache.flink.shaded.okio.Okio$4.newTimeoutException(Okio.java:227)
>> 	at org.apache.flink.shaded.okio.AsyncTimeout.exit(AsyncTimeout.java:284)
>> 	at org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:240)
>> 	at org.apache.flink.shaded.okio.RealBufferedSource.indexOf(RealBufferedSource.java:344)
>> 	at org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:216)
>> 	at org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210)
>> 	at org.apache.flink.shaded.okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>> 	at org.apache.flink.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
>> 	at org.apache.flink.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>> 	at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
>> 	at org.apache.flink.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
>> 	at org.apache.flink.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:135)
>> 	at org.apache.flink.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 	at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.net.SocketException: Socket closed
>> 	at java.net.SocketInputStream.read(SocketInputStream.java:204)
>> 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
>> 	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
>> 	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
>> 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
>> 	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
>> 	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
>> 	at org.apache.flink.shaded.okio.Okio$2.read(Okio.java:138)
>> 	at org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:236)
>> 	... 23 more
>>
>>