You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Samuel Garcia Martinez <sa...@inditex.com> on 2020/04/13 20:08:36 UTC

SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Hi!

Today, I've seen a weird issue in production workloads when the gzip compression was enabled. After some minutes, the client app ran out of connections and stopped responding.

The cluster setup is pretty simple:
Solr version: 7.7.2
Solr cloud enabled
Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 HTTP LB using Round Robin over all nodes
All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME types.
Solr client: HttpSolrClient targeting the HTTP LB

Problem description: when the Solr node that receives the request has to forward the request to a Solr Node that actually can perform the query, the response headers are added incorrectly to the client response, causing the SolrJ client to fail and to never release the connection back to the pool.

To simplify the case, let's try to start from the following repro scenario:

  *   Start one node with cloud mode and port 8983
  *   Create one single collection (1 shard, 1 replica)
  *   Start another node with port 8984 and the previusly started zk (-z localhost:9983)
  *   Start a java application and query the cluster using the node on port 8984 (the one that doesn't host the collection)

So, the steps occur like:

  *   The application queries node:8984 with compression enabled ("Accept-Encoding: gzip") and wt=javabin
  *   Node:8984 can't perform the query and creates a http request behind the scenes to node:8983
  *   Node:8983 returns a gzipped response with "Content-Encoding: gzip" and "Content-Type: application/octet-stream"
  *   Node:8984 adds the "Content-Encoding: gzip" header as character stream to the response (it should be forwarded as "Content-Encoding" header, not character encoding)
  *   HttpSolrClient receives a "Content-Type: application/octet-stream;charset=gzip", causing an exception.
  *   HttpSolrClient tries to quietly close the connection, but since the stream is broken, the Utils.consumeFully fails to actually consume the entity (it throws another exception in GzipDecompressingEntity#getContent() with "not in GZIP format")

The exception thrown by HttpSolrClient is:
java.nio.charset.UnsupportedCharsetException: gzip
               at java.nio.charset.Charset.forName(Charset.java:531)
               at org.apache.http.entity.ContentType.create(ContentType.java:271)
               at org.apache.http.entity.ContentType.create(ContentType.java:261)
               at org.apache.http.entity.ContentType.parse(ContentType.java:319)
               at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
               at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
               at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
               at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
               at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
               at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
               at org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke(<generated>)
               at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

Here I can see three different problems:

  *   HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to set the Content-Encoding header. This is obviously a typo.
  *   HttpSolrClient, specially the HttpClientUtil should be modified to prevent that if the Content-Encoding header lies about the actual content, there should be an exception, but shouldn't leak the connection forever.
  *   HttpSolrClient should allow clients to customize HttpClient's connectionRequestTimeout, preventing the application to respond to any other incoming request because all requests that used could be forever blocked waiting for a free connection that will never be free.

I think the two points are to bugs and the third one is a feature improvement. Unless I missed something, I'll file the two bugs and provide a patch for them. The same goes for the the feature improvement.



En el caso de haber recibido este mensaje por error, le rogamos que nos lo comunique por esta misma v?a, proceda a su eliminaci?n y se abstenga de utilizarlo en modo alguno.
If you receive this message by error, please notify the sender by return e-mail and delete it. Its use is forbidden.



Re: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Posted by Samuel Garcia Martinez <sa...@inditex.com>.
I created two different issues: one for the Content-Type issue in the server and another one for the reliability issue in the SolrClient for unexpected/malformed responses.

ContentType issue: https://issues.apache.org/jira/browse/SOLR-14456
Client issue: https://issues.apache.org/jira/browse/SOLR-14457
________________________________
From: Jason Gerlowski <ge...@gmail.com>
Sent: Wednesday, April 22, 2020 4:43 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Hi Samuel,

Thanks for the very detailed description of the problem here.  Very
thorough!  I don't think you're missing anything obvious, please file the
jira tickets if you haven't already.

Best,

Jason

On Mon, Apr 13, 2020 at 6:12 PM Samuel Garcia Martinez <
samuelgma@inditex.com> wrote:

> Reading again the last two paragraphs I realized that, those two
> specially, are very poorly worded (grammar 😓). I tried to rephrase them
> and correct some of the errors below.
>
> Here I can see three different problems:
>
> * HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to
> set the Content-Encoding header. This is obviously a mistake.
> * HttpSolrClient, specifically the HttpClientUtil, should be modified to
> prevent that if the Content-Encoding header lies about the actual content,
> the connection is leaked forever. It should the exception though.
> * HttpSolrClient should allow clients to customize HttpClient's
> connectionRequestTimeout, preventing the application to be blocked forever
> waiting for a connection to be available. This way, the application could
> respond to requests that won’t use Solr instead of rejecting any incoming
> requests because all threads are blocked forever for a connection that
> won’t be available ever.
>
> I think the two first points are bugs that should be fixed.  The third one
> is a feature improvement to me.
>
> Unless I missed something, I'll file the two bugs and provide a patch for
> them. The same goes for the the feature improvement.
>
>
>
> Get Outlook for iOS<https://clicktime.symantec.com/3HCbuRyy1nsrbJk46YT1vS76H2?u=https%3A%2F%2Faka.ms%2Fo0ukef>
>
>
>
> En el caso de haber recibido este mensaje por error, le rogamos que nos lo
> comunique por esta misma vía, proceda a su eliminación y se abstenga de
> utilizarlo en modo alguno.
> If you receive this message by error, please notify the sender by return
> e-mail and delete it. Its use is forbidden.
>
>
>
> ________________________________
> From: Samuel Garcia Martinez <sa...@inditex.com>
> Sent: Monday, April 13, 2020 10:08:36 PM
> To: solr-user@lucene.apache.orG <so...@lucene.apache.orG>
> Subject: SolrJ connection leak with SolrCloud and Jetty Gzip compression
> enabled
>
> Hi!
>
> Today, I've seen a weird issue in production workloads when the gzip
> compression was enabled. After some minutes, the client app ran out of
> connections and stopped responding.
>
> The cluster setup is pretty simple:
> Solr version: 7.7.2
> Solr cloud enabled
> Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas.
> 1 HTTP LB using Round Robin over all nodes
> All cluster nodes have gzip enabled for all paths, all HTTP verbs and all
> MIME types.
> Solr client: HttpSolrClient targeting the HTTP LB
>
> Problem description: when the Solr node that receives the request has to
> forward the request to a Solr Node that actually can perform the query, the
> response headers are added incorrectly to the client response, causing the
> SolrJ client to fail and to never release the connection back to the pool.
>
> To simplify the case, let's try to start from the following repro scenario:
>
>   *   Start one node with cloud mode and port 8983
>   *   Create one single collection (1 shard, 1 replica)
>   *   Start another node with port 8984 and the previusly started zk (-z
> localhost:9983)
>   *   Start a java application and query the cluster using the node on
> port 8984 (the one that doesn't host the collection)
>
> So, the steps occur like:
>
>   *   The application queries node:8984 with compression enabled
> ("Accept-Encoding: gzip") and wt=javabin
>   *   Node:8984 can't perform the query and creates a http request behind
> the scenes to node:8983
>   *   Node:8983 returns a gzipped response with "Content-Encoding: gzip"
> and "Content-Type: application/octet-stream"
>   *   Node:8984 adds the "Content-Encoding: gzip" header as character
> stream to the response (it should be forwarded as "Content-Encoding"
> header, not character encoding)
>   *   HttpSolrClient receives a "Content-Type:
> application/octet-stream;charset=gzip", causing an exception.
>   *   HttpSolrClient tries to quietly close the connection, but since the
> stream is broken, the Utils.consumeFully fails to actually consume the
> entity (it throws another exception in GzipDecompressingEntity#getContent()
> with "not in GZIP format")
>
> The exception thrown by HttpSolrClient is:
> java.nio.charset.UnsupportedCharsetException: gzip
>                at java.nio.charset.Charset.forName(Charset.java:531)
>                at
> org.apache.http.entity.ContentType.create(ContentType.java:271)
>                at
> org.apache.http.entity.ContentType.create(ContentType.java:261)
>                at
> org.apache.http.entity.ContentType.parse(ContentType.java:319)
>                at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
>                at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>                at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>                at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>                at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
>                at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
>                at
> org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke(<generated>)
>                at
> org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
>
> Here I can see three different problems:
>
>   *   HttpSolrCall should not use HttpServletResponse#setCharacterEncoding
> to set the Content-Encoding header. This is obviously a typo.
>   *   HttpSolrClient, specially the HttpClientUtil should be modified to
> prevent that if the Content-Encoding header lies about the actual content,
> there should be an exception, but shouldn't leak the connection forever.
>   *   HttpSolrClient should allow clients to customize HttpClient's
> connectionRequestTimeout, preventing the application to respond to any
> other incoming request because all requests that used could be forever
> blocked waiting for a free connection that will never be free.
>
> I think the two points are to bugs and the third one is a feature
> improvement. Unless I missed something, I'll file the two bugs and provide
> a patch for them. The same goes for the the feature improvement.
>
>
>
> En el caso de haber recibido este mensaje por error, le rogamos que nos lo
> comunique por esta misma v?a, proceda a su eliminaci?n y se abstenga de
> utilizarlo en modo alguno.
> If you receive this message by error, please notify the sender by return
> e-mail and delete it. Its use is forbidden.
>
>
>

Re: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Posted by Jason Gerlowski <ge...@gmail.com>.
Hi Samuel,

Thanks for the very detailed description of the problem here.  Very
thorough!  I don't think you're missing anything obvious, please file the
jira tickets if you haven't already.

Best,

Jason

On Mon, Apr 13, 2020 at 6:12 PM Samuel Garcia Martinez <
samuelgma@inditex.com> wrote:

> Reading again the last two paragraphs I realized that, those two
> specially, are very poorly worded (grammar 😓). I tried to rephrase them
> and correct some of the errors below.
>
> Here I can see three different problems:
>
> * HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to
> set the Content-Encoding header. This is obviously a mistake.
> * HttpSolrClient, specifically the HttpClientUtil, should be modified to
> prevent that if the Content-Encoding header lies about the actual content,
> the connection is leaked forever. It should the exception though.
> * HttpSolrClient should allow clients to customize HttpClient's
> connectionRequestTimeout, preventing the application to be blocked forever
> waiting for a connection to be available. This way, the application could
> respond to requests that won’t use Solr instead of rejecting any incoming
> requests because all threads are blocked forever for a connection that
> won’t be available ever.
>
> I think the two first points are bugs that should be fixed.  The third one
> is a feature improvement to me.
>
> Unless I missed something, I'll file the two bugs and provide a patch for
> them. The same goes for the the feature improvement.
>
>
>
> Get Outlook for iOS<https://aka.ms/o0ukef>
>
>
>
> En el caso de haber recibido este mensaje por error, le rogamos que nos lo
> comunique por esta misma vía, proceda a su eliminación y se abstenga de
> utilizarlo en modo alguno.
> If you receive this message by error, please notify the sender by return
> e-mail and delete it. Its use is forbidden.
>
>
>
> ________________________________
> From: Samuel Garcia Martinez <sa...@inditex.com>
> Sent: Monday, April 13, 2020 10:08:36 PM
> To: solr-user@lucene.apache.orG <so...@lucene.apache.orG>
> Subject: SolrJ connection leak with SolrCloud and Jetty Gzip compression
> enabled
>
> Hi!
>
> Today, I've seen a weird issue in production workloads when the gzip
> compression was enabled. After some minutes, the client app ran out of
> connections and stopped responding.
>
> The cluster setup is pretty simple:
> Solr version: 7.7.2
> Solr cloud enabled
> Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas.
> 1 HTTP LB using Round Robin over all nodes
> All cluster nodes have gzip enabled for all paths, all HTTP verbs and all
> MIME types.
> Solr client: HttpSolrClient targeting the HTTP LB
>
> Problem description: when the Solr node that receives the request has to
> forward the request to a Solr Node that actually can perform the query, the
> response headers are added incorrectly to the client response, causing the
> SolrJ client to fail and to never release the connection back to the pool.
>
> To simplify the case, let's try to start from the following repro scenario:
>
>   *   Start one node with cloud mode and port 8983
>   *   Create one single collection (1 shard, 1 replica)
>   *   Start another node with port 8984 and the previusly started zk (-z
> localhost:9983)
>   *   Start a java application and query the cluster using the node on
> port 8984 (the one that doesn't host the collection)
>
> So, the steps occur like:
>
>   *   The application queries node:8984 with compression enabled
> ("Accept-Encoding: gzip") and wt=javabin
>   *   Node:8984 can't perform the query and creates a http request behind
> the scenes to node:8983
>   *   Node:8983 returns a gzipped response with "Content-Encoding: gzip"
> and "Content-Type: application/octet-stream"
>   *   Node:8984 adds the "Content-Encoding: gzip" header as character
> stream to the response (it should be forwarded as "Content-Encoding"
> header, not character encoding)
>   *   HttpSolrClient receives a "Content-Type:
> application/octet-stream;charset=gzip", causing an exception.
>   *   HttpSolrClient tries to quietly close the connection, but since the
> stream is broken, the Utils.consumeFully fails to actually consume the
> entity (it throws another exception in GzipDecompressingEntity#getContent()
> with "not in GZIP format")
>
> The exception thrown by HttpSolrClient is:
> java.nio.charset.UnsupportedCharsetException: gzip
>                at java.nio.charset.Charset.forName(Charset.java:531)
>                at
> org.apache.http.entity.ContentType.create(ContentType.java:271)
>                at
> org.apache.http.entity.ContentType.create(ContentType.java:261)
>                at
> org.apache.http.entity.ContentType.parse(ContentType.java:319)
>                at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
>                at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>                at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>                at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>                at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
>                at
> org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
>                at
> org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke(<generated>)
>                at
> org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
>
> Here I can see three different problems:
>
>   *   HttpSolrCall should not use HttpServletResponse#setCharacterEncoding
> to set the Content-Encoding header. This is obviously a typo.
>   *   HttpSolrClient, specially the HttpClientUtil should be modified to
> prevent that if the Content-Encoding header lies about the actual content,
> there should be an exception, but shouldn't leak the connection forever.
>   *   HttpSolrClient should allow clients to customize HttpClient's
> connectionRequestTimeout, preventing the application to respond to any
> other incoming request because all requests that used could be forever
> blocked waiting for a free connection that will never be free.
>
> I think the two points are to bugs and the third one is a feature
> improvement. Unless I missed something, I'll file the two bugs and provide
> a patch for them. The same goes for the the feature improvement.
>
>
>
> En el caso de haber recibido este mensaje por error, le rogamos que nos lo
> comunique por esta misma v?a, proceda a su eliminaci?n y se abstenga de
> utilizarlo en modo alguno.
> If you receive this message by error, please notify the sender by return
> e-mail and delete it. Its use is forbidden.
>
>
>

Re: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Posted by Samuel Garcia Martinez <sa...@inditex.com>.
Reading again the last two paragraphs I realized that, those two specially, are very poorly worded (grammar 😓). I tried to rephrase them and correct some of the errors below.

Here I can see three different problems:

* HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to set the Content-Encoding header. This is obviously a mistake.
* HttpSolrClient, specifically the HttpClientUtil, should be modified to prevent that if the Content-Encoding header lies about the actual content, the connection is leaked forever. It should the exception though.
* HttpSolrClient should allow clients to customize HttpClient's connectionRequestTimeout, preventing the application to be blocked forever waiting for a connection to be available. This way, the application could respond to requests that won’t use Solr instead of rejecting any incoming requests because all threads are blocked forever for a connection that won’t be available ever.

I think the two first points are bugs that should be fixed.  The third one is a feature improvement to me.

Unless I missed something, I'll file the two bugs and provide a patch for them. The same goes for the the feature improvement.



Get Outlook for iOS<https://aka.ms/o0ukef>



En el caso de haber recibido este mensaje por error, le rogamos que nos lo comunique por esta misma vía, proceda a su eliminación y se abstenga de utilizarlo en modo alguno.
If you receive this message by error, please notify the sender by return e-mail and delete it. Its use is forbidden.



________________________________
From: Samuel Garcia Martinez <sa...@inditex.com>
Sent: Monday, April 13, 2020 10:08:36 PM
To: solr-user@lucene.apache.orG <so...@lucene.apache.orG>
Subject: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Hi!

Today, I've seen a weird issue in production workloads when the gzip compression was enabled. After some minutes, the client app ran out of connections and stopped responding.

The cluster setup is pretty simple:
Solr version: 7.7.2
Solr cloud enabled
Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 HTTP LB using Round Robin over all nodes
All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME types.
Solr client: HttpSolrClient targeting the HTTP LB

Problem description: when the Solr node that receives the request has to forward the request to a Solr Node that actually can perform the query, the response headers are added incorrectly to the client response, causing the SolrJ client to fail and to never release the connection back to the pool.

To simplify the case, let's try to start from the following repro scenario:

  *   Start one node with cloud mode and port 8983
  *   Create one single collection (1 shard, 1 replica)
  *   Start another node with port 8984 and the previusly started zk (-z localhost:9983)
  *   Start a java application and query the cluster using the node on port 8984 (the one that doesn't host the collection)

So, the steps occur like:

  *   The application queries node:8984 with compression enabled ("Accept-Encoding: gzip") and wt=javabin
  *   Node:8984 can't perform the query and creates a http request behind the scenes to node:8983
  *   Node:8983 returns a gzipped response with "Content-Encoding: gzip" and "Content-Type: application/octet-stream"
  *   Node:8984 adds the "Content-Encoding: gzip" header as character stream to the response (it should be forwarded as "Content-Encoding" header, not character encoding)
  *   HttpSolrClient receives a "Content-Type: application/octet-stream;charset=gzip", causing an exception.
  *   HttpSolrClient tries to quietly close the connection, but since the stream is broken, the Utils.consumeFully fails to actually consume the entity (it throws another exception in GzipDecompressingEntity#getContent() with "not in GZIP format")

The exception thrown by HttpSolrClient is:
java.nio.charset.UnsupportedCharsetException: gzip
               at java.nio.charset.Charset.forName(Charset.java:531)
               at org.apache.http.entity.ContentType.create(ContentType.java:271)
               at org.apache.http.entity.ContentType.create(ContentType.java:261)
               at org.apache.http.entity.ContentType.parse(ContentType.java:319)
               at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
               at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
               at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
               at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
               at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
               at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
               at org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke(<generated>)
               at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

Here I can see three different problems:

  *   HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to set the Content-Encoding header. This is obviously a typo.
  *   HttpSolrClient, specially the HttpClientUtil should be modified to prevent that if the Content-Encoding header lies about the actual content, there should be an exception, but shouldn't leak the connection forever.
  *   HttpSolrClient should allow clients to customize HttpClient's connectionRequestTimeout, preventing the application to respond to any other incoming request because all requests that used could be forever blocked waiting for a free connection that will never be free.

I think the two points are to bugs and the third one is a feature improvement. Unless I missed something, I'll file the two bugs and provide a patch for them. The same goes for the the feature improvement.



En el caso de haber recibido este mensaje por error, le rogamos que nos lo comunique por esta misma v?a, proceda a su eliminaci?n y se abstenga de utilizarlo en modo alguno.
If you receive this message by error, please notify the sender by return e-mail and delete it. Its use is forbidden.