You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Mars Hall (JIRA)" <ji...@apache.org> on 2017/07/18 19:35:00 UTC

[jira] [Updated] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient

     [ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mars Hall updated PIO-106:
--------------------------
    Description: 
When using the proposed [PIO-105 Batch Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's REST interface appears to become overloaded, ending with the Spark job being killed from errors like:

{noformat}
[ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search
[ERROR] [Utils] Aborting task
[ERROR] [ESApps] Failed to access to /pio_meta/apps/_search
[ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749)
[ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737)
[ERROR] [Common$] Invalid app name ur
[ERROR] [Utils] Aborting task
[ERROR] [URAlgorithm] Error when read recent events: java.lang.IllegalArgumentException: Invalid app name ur
[ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751)
[ERROR] [Utils] Aborting task
[ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750)
[WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, executor driver): java.net.BindException: Can't assign requested address
  at sun.nio.ch.Net.connect0(Native Method)
  at sun.nio.ch.Net.connect(Net.java:454)
  at sun.nio.ch.Net.connect(Net.java:446)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273)
  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139)
  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
  at java.lang.Thread.run(Thread.java:745)
{noformat}

After these errors happen & the job is killed, Elasticsearch immediately recovers. It responds to queries normally. I researched what could cause this and found an [old issue in the main Elasticsearch repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints given therein about *using keep-alive in the ES client* to avoid these performance issues, I investigated how PredictionIO's [Elasticsearch StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] manages its connections.

I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), Elasticsearch creates a new underlying connection, an Elasticsearch RestClient, for [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] & [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] with its API. As a result, *there is no way Elasticsearch TCP connections can be reused via HTTP keep-alive*.

High-performance workloads with Elasticsearch 5.x will suffer from these issues unless we refactor Elasticsearch StorageClient to share the underlying RestClient instead of [building a new one everytime the client is used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31].

There are certainly different approaches we could take to sharing a RestClient so that its keep-alive behavior may work as designed:

* maintain a singleton RestClient that is reused throughout the ES storage classes
* create a RestClient on-demand and pass it as an argument to ES storage methods
* other ideas?

  was:
When using the proposed [PIO-105 Batch Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's REST interface appears to become overloaded, ending with the Spark job being killed from errors like:

{noformat}
[ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search
[ERROR] [Utils] Aborting task
[ERROR] [ESApps] Failed to access to /pio_meta/apps/_search
[ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749)
[ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737)
[ERROR] [Common$] Invalid app name ur
[ERROR] [Utils] Aborting task
[ERROR] [URAlgorithm] Error when read recent events: java.lang.IllegalArgumentException: Invalid app name ur
[ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751)
[ERROR] [Utils] Aborting task
[ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750)
[WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, executor driver): java.net.BindException: Can't assign requested address
  at sun.nio.ch.Net.connect0(Native Method)
  at sun.nio.ch.Net.connect(Net.java:454)
  at sun.nio.ch.Net.connect(Net.java:446)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273)
  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139)
  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
  at java.lang.Thread.run(Thread.java:745)
{noformat}

After these errors happen & the job is killed, Elasticsearch immediately recovers. It responds to queries normally. I researched what could cause this and found an [old issue in the main Elasticsearch repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints given therein about *using keep-alive in the ES client* to avoid these performance issues, I investigated how PredictionIO's [Elasticsearch StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] manages its connections.

I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), Elasticsearch creates a new underlying Elasticsearch RestClient for [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] & [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] with its API. As a result, *there is no way Elasticsearch TCP connections can be reused via HTTP keep-alive*.

High-performance workloads with Elasticsearch 5.x will suffer from these issues unless we refactor Elasticsearch StorageClient to share the underlying RestClient instead of [building a new one everytime the client is used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31].

There are certainly different approaches we could take to sharing a RestClient so that its keep-alive behavior may work as designed:

* maintain a singleton RestClient that is reused throughout the ES storage classes
* create a RestClient on-demand and pass it as an argument to ES storage methods
* other ideas?


> Elasticsearch 5.x StorageClient should reuse RestClient
> -------------------------------------------------------
>
>                 Key: PIO-106
>                 URL: https://issues.apache.org/jira/browse/PIO-106
>             Project: PredictionIO
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.11.0-incubating
>            Reporter: Mars Hall
>
> When using the proposed [PIO-105 Batch Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's REST interface appears to become overloaded, ending with the Spark job being killed from errors like:
> {noformat}
> [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search
> [ERROR] [Utils] Aborting task
> [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search
> [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749)
> [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737)
> [ERROR] [Common$] Invalid app name ur
> [ERROR] [Utils] Aborting task
> [ERROR] [URAlgorithm] Error when read recent events: java.lang.IllegalArgumentException: Invalid app name ur
> [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751)
> [ERROR] [Utils] Aborting task
> [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750)
> [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, executor driver): java.net.BindException: Can't assign requested address
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:454)
>   at sun.nio.ch.Net.connect(Net.java:446)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>   at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273)
>   at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139)
>   at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
>   at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
>   at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After these errors happen & the job is killed, Elasticsearch immediately recovers. It responds to queries normally. I researched what could cause this and found an [old issue in the main Elasticsearch repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints given therein about *using keep-alive in the ES client* to avoid these performance issues, I investigated how PredictionIO's [Elasticsearch StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] manages its connections.
> I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), Elasticsearch creates a new underlying connection, an Elasticsearch RestClient, for [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] & [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] with its API. As a result, *there is no way Elasticsearch TCP connections can be reused via HTTP keep-alive*.
> High-performance workloads with Elasticsearch 5.x will suffer from these issues unless we refactor Elasticsearch StorageClient to share the underlying RestClient instead of [building a new one everytime the client is used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31].
> There are certainly different approaches we could take to sharing a RestClient so that its keep-alive behavior may work as designed:
> * maintain a singleton RestClient that is reused throughout the ES storage classes
> * create a RestClient on-demand and pass it as an argument to ES storage methods
> * other ideas?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)