You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Claude M <cl...@gmail.com> on 2021/02/02 18:31:37 UTC

Flink Datadog Timeout

Hello,

I have a Flink jobmanager and taskmanagers deployed in a Kubernetes
cluster.  I integrated it with Datadog by having the following specified in
the flink-conf.yaml.

metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey: <DD_API_KEY>

However, I'm seeing random timeouts in the log and don't know why this is
occurring and how to solve the issue.   Please see attached file showing
the error.


Thanks

Re: Flink Datadog Timeout

Posted by Chesnay Schepler <ch...@apache.org>.

The reported exception looks quite similar to the one in this thread 
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Datadog-reporter-timeout-amp-OOM-issue-tt40997.html#a41010>, 
which was supposedly caused by Datadog rate limits but I don't think 
this was thoroughly investigated.
(bear in mind that each container has its own reporter; with the default 
reporting interval of 10 seconds you quickly reach fairly high 
reports/second rates)

Alternatively it could just be plain connectivity issues.

If the issues do not persist for a long time then no metrics /should /be 
lost however, so you may be able to ignore them.

On 2/2/2021 7:31 PM, Claude M wrote:
>
> Hello,
>
> I have a Flink jobmanager and taskmanagers deployed in a Kubernetes 
> cluster.  I integrated it with Datadog by having the following 
> specified in the flink-conf.yaml.
>
> metrics.reporter.dghttp.class: 
> org.apache.flink.metrics.datadog.DatadogHttpReporter
> metrics.reporter.dghttp.apikey: <DD_API_KEY>
>
> However, I'm seeing random timeouts in the log and don't know why this 
> is occurring and how to solve the issue.  Please see attached file 
> showing the error.
>
>
> Thanks
>
>
>
>