You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Andras Nagy <an...@gmail.com> on 2020/07/21 12:42:53 UTC

Connection leak when using S3

Dear All,

We run into an issue where after an extended uptime, both Kylin query
server and jobs running on EMR stop working. The root cause of the issue in
both sides is this exception:

Caused by: java.io.IOException:
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable
to execute HTTP request: Timeout waiting for connection from pool
        at
com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
~[emrfs-hadoop-assembly-2.37.0.jar:?]

In our setup, S3 is used for both intermediate data storage as well as
persistence under HBase.

Based on
https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/
increasing the connection pool size (fs.s3.maxConnections property) to 10
000 is just delaying the issue thus the underlying issue is likely a
connection leak.
It also indicates a leak that restarting the kylin service solves the
problem.

We opened a ticket about the issue, it is
https://issues.apache.org/jira/browse/KYLIN-4500.
A full stack trace from the QueryService is attached to the ticket.

Since this is seriously affecting our production service, any hint would be
much appreciated. Is there any chance someone could look into this?

Many thanks,
Andras

Re: Connection leak when using S3

Posted by Andras Nagy <an...@gmail.com>.
Hi Xiaoxiang,

Thank you, this indeed seems to be related to what we have.
> In the cloud env, FD leak will be convert to connection leak issue, am I
right?
Yes, that sounds plausible. We will check with netstat.

Thanks again, best regards,
Andras


On Tue, Jul 21, 2020 at 3:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> Dear sir,
>   If you are using Real-time OLAP, you may check this issue :
> https://issues.apache.org/jira/browse/KYLIN-4396, and it is the patch
> link https://github.com/apache/kylin/pull/1134. It is a FD leak issue
> what I find early this year. In the cloud env, FD leak will be convert to
> connection leak issue, am I right?
>   If you think it is a connection leak issue which maybe cause by other
> reason, please let us know your network stats information, maybe command
> output of "netstat -anp" ?
>   Good luck to you!
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2020-07-21 20:42:53, "Andras Nagy" <an...@gmail.com>
> wrote:
>
> Dear All,
>
> We run into an issue where after an extended uptime, both Kylin query
> server and jobs running on EMR stop working. The root cause of the issue in
> both sides is this exception:
>
> Caused by: java.io.IOException:
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable
> to execute HTTP request: Timeout waiting for connection from pool
>         at
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
> ~[emrfs-hadoop-assembly-2.37.0.jar:?]
>
> In our setup, S3 is used for both intermediate data storage as well as
> persistence under HBase.
>
> Based on
> https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/
> increasing the connection pool size (fs.s3.maxConnections property) to 10
> 000 is just delaying the issue thus the underlying issue is likely a
> connection leak.
> It also indicates a leak that restarting the kylin service solves the
> problem.
>
> We opened a ticket about the issue, it is
> https://issues.apache.org/jira/browse/KYLIN-4500.
> A full stack trace from the QueryService is attached to the ticket.
>
> Since this is seriously affecting our production service, any hint would
> be much appreciated. Is there any chance someone could look into this?
>
> Many thanks,
> Andras
>
>

Re:Connection leak when using S3

Posted by Xiaoxiang Yu <xx...@apache.org>.
Dear sir,
  If you are using Real-time OLAP, you may check this issue : https://issues.apache.org/jira/browse/KYLIN-4396, and it is the patch link https://github.com/apache/kylin/pull/1134. It is a FD leak issue what I find early this year. In the cloud env, FD leak will be convert to connection leak issue, am I right?
  If you think it is a connection leak issue which maybe cause by other reason, please let us know your network stats information, maybe command output of "netstat -anp" ?
  Good luck to you!







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2020-07-21 20:42:53, "Andras Nagy" <an...@gmail.com> wrote:

Dear All,


We run into an issue where after an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause of the issue in both sides is this exception:

Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
        at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]

In our setup, S3 is used for both intermediate data storage as well as persistence under HBase.
 
Based on https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/ increasing the connection pool size (fs.s3.maxConnections property) to 10 000 is just delaying the issue thus the underlying issue is likely a connection leak.
It also indicates a leak that restarting the kylin service solves the problem.

We opened a ticket about the issue, it is https://issues.apache.org/jira/browse/KYLIN-4500.
A full stack trace from the QueryService is attached to the ticket.

Since this is seriously affecting our production service, any hint would be much appreciated. Is there any chance someone could look into this?

Many thanks,
Andras