You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luthien Dulk <ma...@europeana.eu> on 2019/02/21 11:03:37 UTC

Trying to enable HTTP gzip compression

hi all,

I was wondering if anyone could point me in the right direction. 

I am looking into whether enabling Gzip HTTP compression for our Solr clusters (all running Solr 6.6.5) would help performance; my problem is that I can’t figure out how to do that.

Our infrastructure setup is like this: our applications are running on a Cloud Foundry PAAS environment, but our Solr clusters run elsewhere. 
Communication between applications and Solr clusters is secured by firewalls on every Solr machine (we do have a Socks Proxy set up in the CF environment, but unfortunately we can't use that for Solr because of the incompatibility between Zookeeper and Java Nio I/O - much to the chagrin of our sysadmin).

We think that HTTP compression might be very interesting for us because of the hight volume of traffic between two separate environments.

Here’s what I found out so far: 

(re. config changes in Solr’s embedded Jetty)
- I’m aware that this is mostly a matter of configuring Jetty;
- it seems that this should preferably be set in the solr-jetty-context.xml file;
- this seems to relate to enabling Jetty's “GzipHandler”

(re. gzip ‘module’ activation ..?)
- it puzzles me that https://aroratimus.blogspot.com/2017/08/jettyserver-9.html mentions that Jetty’s GzipHandler should be enabled using two files not found in Solr's embedded Jetty: server/etc/jetty-gzip.xml and server/modules/gzip.mod (they are available when installing Jetty separately though);
- apparently, Jetty's Gzip module should be activated by adding —add-to-start=gzip to the server startup command. For the embedded Jetty in Solr, it seems that this would require changing the solr startup script

(re. changes in Solr client)
- the calling application should add the HTTP Accept-Encoding: gzip, deflate ( according to https://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/ )


I wonder, has anyone ever got this working? In particular:

- is that gzip ‘module’ activation necessary? That would seem a bit far-fetched, because it involves files not found in the Solr installation and possibly hacking the Solr startup script;
- what did you add to solr-jetty-context.xml in order to enable gzip compression?


I suppose that situations with high volumes of external network traffic between Solr and Client must be quite rare. Otherwise I’d think that a feature that potentially offers such obvious benefits (one of the pages above mentions a drop of 75% of network traffic and a 60% faster response time) would have been turned into an “enable http compression yes/no” setting by now :)

Anyhow, we’re stuck with it … I hope I can get it working.


Thank in advance for any advice!

Luthien
Api developer
Europeana.eu






-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.

(erratum) Trying to enable HTTP gzip compression

Posted by Luthien Dulk <ma...@europeana.eu>.
hi, there was an error in my description of how to enable HTTP compression: the "--add-to-start=gzip” command should NOT be executed from the Solr root directory, but from the server/ (Jetty) directory. The command produces a start.ini file that otherwise ends up in the wrong place. 
I guess this also settles whether the setting persists or not :)

Corrected description below.


(using Solr 6.6.5)

1) add a gzip configuration file called jetty-gzip.xml in /server/etc/ 
Default values provided in the standalone Jetty installation are OK, but make sure to only include properties listed under “gzip configuration” of the appropriate version (e.g.: https://www.eclipse.org/jetty/documentation/9.3.25.v20180904/gzip-filter.html). 
The jetty-gzip.xml taken from my Jetty v.9.4.6 installation contained some fields that prevented Solr’s embedded Jetty v.9.3.14 from starting

2) add a file called gzip.mod to /server/modules
Our sysadmin had provided me with this one; the only lines that are not commented out in there are:

- - - - - - -

[depend]
server

[xml]
etc/jetty-gzip.xml

- - - - - - -

3) from the server/ directory:
> java -jar start.jar --list-modules
- shows the installed modules, and if they are enabled or not

> java -jar start.jar --add-to-start=gzip
writes a start.ini file that activates the gzip module on startup

Start Solr in the usual way 
> bin/solr start


Thanks,
Lúthien
-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.

Re: Trying to enable HTTP gzip compression

Posted by Luthien Dulk <ma...@europeana.eu>.
Hi Walter,

You’re right, this is going nowhere. 
We thought that the bottleneck might be the http connection between the API running on Cloud Foundry, and the Solr Cluster on an external host. 
Potentially saving bandwidth on that seemed (at the first glance) like a too good option to not look into.

I just would have liked to see that confirmed in a performance test, but after discussing it again with our sysadmin I don’t want to waste any more of his or my time trying to make that work. 
Ah well, at least I learned a few things about Solr :)

Thanks for your comments. 

Lúthien



> On 27 Feb 2019, at 17:08, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> I really do not expect it to make anything faster. I think you are wasting your time. Compression also adds some latency because the compression happens before data is sent out. 
> 
> If your CPUs are idle, that is a red flag for performance. In every one of our clusters, CPU is the limiting factor in both latency and throughput. Our largest production cluster is 32 nodes, each with 36 CPUs.
> 
> Where is the bottleneck? Are the processes waiting on disk? If they are, you need more RAM. Do you have magnetic disks? Get SSDs.
> 
> You should have enough RAM to hold the index in memory, after allowing for the Solr JVM, kernel, and other processes.


-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.

Re: Trying to enable HTTP gzip compression

Posted by Walter Underwood <wu...@wunderwood.org>.
I really do not expect it to make anything faster. I think you are wasting your time. Compression also adds some latency because the compression happens before data is sent out. 

If your CPUs are idle, that is a red flag for performance. In every one of our clusters, CPU is the limiting factor in both latency and throughput. Our largest production cluster is 32 nodes, each with 36 CPUs.

Where is the bottleneck? Are the processes waiting on disk? If they are, you need more RAM. Do you have magnetic disks? Get SSDs.

You should have enough RAM to hold the index in memory, after allowing for the Solr JVM, kernel, and other processes.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 27, 2019, at 2:44 AM, Luthien Dulk <ma...@europeana.eu> wrote:
> 
> Hi Walter and Jörn,
> 
> thanks for your suggestions! I will keep them in mind.
> 
> According to our sysadmin, the CPU's on the Solr nodes are “doing basically nothing", so that’s a plentiful resource in our case. We’re most interested in reducing the response time of the whole chain, that (for search API requests) involves a roundtrip to the Solr cluster hosted on another location. 
> 
> I don’t expect all that much of http compression either, but I am nonetheless interested to see what happens in a performance test with one node with and without gzip enabled, using a copy of the full dataset. 
> I’ll share the results of that. 
> 
> We did manage to figure out how to enable the compression, puzzling pieces found here and there together and a fair bit of trial, teeth-gnashing and error. 
> Here’s how to do it, maybe it will save someone else some time: 
> 
> (using Solr 6.6.5)
> 
> 1) add a gzip configuration file called jetty-gzip.xml in /server/etc/ 
> Default values provided in the standalone Jetty installation are OK, but make sure to only include properties listed under “gzip configuration” of the appropriate version (e.g.: https://www.eclipse.org/jetty/documentation/9.3.25.v20180904/gzip-filter.html). 
> The jetty-gzip.xml taken from my Jetty v.9.4.6 installation contained some fields that prevented Solr’s embedded Jetty v.9.3.14 from starting
> 
> 2) add a file called gzip.mod to /server/modules
> Our sysadmin had provided me with this one; the only lines that are not commented out in there are:
> 
> - - - - - - -
> 
> [depend]
> server
> 
> [xml]
> etc/jetty-gzip.xml
> 
> - - - - - - -
> 
> 3) from the Solr root:
>> java -jar server/start.jar --list-modules
> - shows the installed modules, and if they are enabled or not
> 
>> java -jar server/start.jar --add-to-start=gzip
> will activate the gzip module
> 
> Start Solr in the usual way 
>> bin/solr start
> 
> and the response then contains the Content-Encoding →gzip header.
> I don’t know yet if that compression setting is persistent across Solr restarts, it doesn’t feel very solid. But for this test it'll do.
> 
> Thanks,
> Lúthien
> 
> 
>> On 21 Feb 2019, at 15:38, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>> Years ago we did some testing with HTTP compression for search results with the Ultraseek search engine. It wasn’t faster. It was sometimes slower.
>> 
>> Once you have enough RAM, search is a CPU-limited problem. HTTP compression uses more CPU to save network bandwidth. But search isn’t limited by network bandwidth, so this uses more of the bottleneck resource (CPU) to reduce usage of a plentiful resource (network bandwidth).
>> 
>> Look at the amount of data going in and out of your nodes. I bet it is far below the maximum.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 21, 2019, at 6:07 AM, Jörn Franke <jo...@gmail.com> wrote:
>>> 
>>> You could also change the responsewriter from json to javabin to improve performance. 
>>> Or increase network bandwidth. Then often people fetch more from solr than they need. There is a huge saving potential. Increasing the cores for https encryption can sometimes help.
>>> 
>>> Compression also leads to other issues (performance but potentially also security wise).
> 
> -- 
> Disclaimer: This email and any files transmitted with it are confidential 
> and intended solely for the use of the individual or entity to whom they 
> are
> addressed. If you have received this email in error please notify the 
> system manager. If you are not the named addressee you should not 
> disseminate,
> distribute or copy this email. Please notify the sender 
> immediately by email if you have received this email by mistake and delete 
> this email from your
> system.


Re: Trying to enable HTTP gzip compression

Posted by Luthien Dulk <ma...@europeana.eu>.
Hi Walter and Jörn,

thanks for your suggestions! I will keep them in mind.

According to our sysadmin, the CPU's on the Solr nodes are “doing basically nothing", so that’s a plentiful resource in our case. We’re most interested in reducing the response time of the whole chain, that (for search API requests) involves a roundtrip to the Solr cluster hosted on another location. 

I don’t expect all that much of http compression either, but I am nonetheless interested to see what happens in a performance test with one node with and without gzip enabled, using a copy of the full dataset. 
I’ll share the results of that. 

We did manage to figure out how to enable the compression, puzzling pieces found here and there together and a fair bit of trial, teeth-gnashing and error. 
Here’s how to do it, maybe it will save someone else some time: 

(using Solr 6.6.5)

1) add a gzip configuration file called jetty-gzip.xml in /server/etc/ 
Default values provided in the standalone Jetty installation are OK, but make sure to only include properties listed under “gzip configuration” of the appropriate version (e.g.: https://www.eclipse.org/jetty/documentation/9.3.25.v20180904/gzip-filter.html). 
The jetty-gzip.xml taken from my Jetty v.9.4.6 installation contained some fields that prevented Solr’s embedded Jetty v.9.3.14 from starting

2) add a file called gzip.mod to /server/modules
Our sysadmin had provided me with this one; the only lines that are not commented out in there are:

- - - - - - -

[depend]
server

[xml]
etc/jetty-gzip.xml

- - - - - - -

3) from the Solr root:
> java -jar server/start.jar --list-modules
- shows the installed modules, and if they are enabled or not

> java -jar server/start.jar --add-to-start=gzip
will activate the gzip module

Start Solr in the usual way 
> bin/solr start

and the response then contains the Content-Encoding →gzip header.
I don’t know yet if that compression setting is persistent across Solr restarts, it doesn’t feel very solid. But for this test it'll do.

Thanks,
Lúthien


> On 21 Feb 2019, at 15:38, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> Years ago we did some testing with HTTP compression for search results with the Ultraseek search engine. It wasn’t faster. It was sometimes slower.
> 
> Once you have enough RAM, search is a CPU-limited problem. HTTP compression uses more CPU to save network bandwidth. But search isn’t limited by network bandwidth, so this uses more of the bottleneck resource (CPU) to reduce usage of a plentiful resource (network bandwidth).
> 
> Look at the amount of data going in and out of your nodes. I bet it is far below the maximum.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Feb 21, 2019, at 6:07 AM, Jörn Franke <jo...@gmail.com> wrote:
>> 
>> You could also change the responsewriter from json to javabin to improve performance. 
>> Or increase network bandwidth. Then often people fetch more from solr than they need. There is a huge saving potential. Increasing the cores for https encryption can sometimes help.
>> 
>> Compression also leads to other issues (performance but potentially also security wise).

-- 
Disclaimer: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they 
are
addressed. If you have received this email in error please notify the 
system manager. If you are not the named addressee you should not 
disseminate,
distribute or copy this email. Please notify the sender 
immediately by email if you have received this email by mistake and delete 
this email from your
system.

Re: Trying to enable HTTP gzip compression

Posted by Walter Underwood <wu...@wunderwood.org>.
Years ago we did some testing with HTTP compression for search results with the Ultraseek search engine. It wasn’t faster. It was sometimes slower.

Once you have enough RAM, search is a CPU-limited problem. HTTP compression uses more CPU to save network bandwidth. But search isn’t limited by network bandwidth, so this uses more of the bottleneck resource (CPU) to reduce usage of a plentiful resource (network bandwidth).

Look at the amount of data going in and out of your nodes. I bet it is far below the maximum.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 21, 2019, at 6:07 AM, Jörn Franke <jo...@gmail.com> wrote:
> 
> You could also change the responsewriter from json to javabin to improve performance. 
> Or increase network bandwidth. Then often people fetch more from solr than they need. There is a huge saving potential. Increasing the cores for https encryption can sometimes help.
> 
> Compression also leads to other issues (performance but potentially also security wise).
> 
>> Am 21.02.2019 um 12:03 schrieb Luthien Dulk <ma...@europeana.eu>:
>> 
>> hi all,
>> 
>> I was wondering if anyone could point me in the right direction. 
>> 
>> I am looking into whether enabling Gzip HTTP compression for our Solr clusters (all running Solr 6.6.5) would help performance; my problem is that I can’t figure out how to do that.
>> 
>> Our infrastructure setup is like this: our applications are running on a Cloud Foundry PAAS environment, but our Solr clusters run elsewhere. 
>> Communication between applications and Solr clusters is secured by firewalls on every Solr machine (we do have a Socks Proxy set up in the CF environment, but unfortunately we can't use that for Solr because of the incompatibility between Zookeeper and Java Nio I/O - much to the chagrin of our sysadmin).
>> 
>> We think that HTTP compression might be very interesting for us because of the hight volume of traffic between two separate environments.
>> 
>> Here’s what I found out so far: 
>> 
>> (re. config changes in Solr’s embedded Jetty)
>> - I’m aware that this is mostly a matter of configuring Jetty;
>> - it seems that this should preferably be set in the solr-jetty-context.xml file;
>> - this seems to relate to enabling Jetty's “GzipHandler”
>> 
>> (re. gzip ‘module’ activation ..?)
>> - it puzzles me that https://aroratimus.blogspot.com/2017/08/jettyserver-9.html mentions that Jetty’s GzipHandler should be enabled using two files not found in Solr's embedded Jetty: server/etc/jetty-gzip.xml and server/modules/gzip.mod (they are available when installing Jetty separately though);
>> - apparently, Jetty's Gzip module should be activated by adding —add-to-start=gzip to the server startup command. For the embedded Jetty in Solr, it seems that this would require changing the solr startup script
>> 
>> (re. changes in Solr client)
>> - the calling application should add the HTTP Accept-Encoding: gzip, deflate ( according to https://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/ )
>> 
>> 
>> I wonder, has anyone ever got this working? In particular:
>> 
>> - is that gzip ‘module’ activation necessary? That would seem a bit far-fetched, because it involves files not found in the Solr installation and possibly hacking the Solr startup script;
>> - what did you add to solr-jetty-context.xml in order to enable gzip compression?
>> 
>> 
>> I suppose that situations with high volumes of external network traffic between Solr and Client must be quite rare. Otherwise I’d think that a feature that potentially offers such obvious benefits (one of the pages above mentions a drop of 75% of network traffic and a 60% faster response time) would have been turned into an “enable http compression yes/no” setting by now :)
>> 
>> Anyhow, we’re stuck with it … I hope I can get it working.
>> 
>> 
>> Thank in advance for any advice!
>> 
>> Luthien
>> Api developer
>> Europeana.eu
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Disclaimer: This email and any files transmitted with it are confidential 
>> and intended solely for the use of the individual or entity to whom they 
>> are
>> addressed. If you have received this email in error please notify the 
>> system manager. If you are not the named addressee you should not 
>> disseminate,
>> distribute or copy this email. Please notify the sender 
>> immediately by email if you have received this email by mistake and delete 
>> this email from your
>> system.


Re: Trying to enable HTTP gzip compression

Posted by Jörn Franke <jo...@gmail.com>.
You could also change the responsewriter from json to javabin to improve performance. 
Or increase network bandwidth. Then often people fetch more from solr than they need. There is a huge saving potential. Increasing the cores for https encryption can sometimes help.

Compression also leads to other issues (performance but potentially also security wise).

> Am 21.02.2019 um 12:03 schrieb Luthien Dulk <ma...@europeana.eu>:
> 
> hi all,
> 
> I was wondering if anyone could point me in the right direction. 
> 
> I am looking into whether enabling Gzip HTTP compression for our Solr clusters (all running Solr 6.6.5) would help performance; my problem is that I can’t figure out how to do that.
> 
> Our infrastructure setup is like this: our applications are running on a Cloud Foundry PAAS environment, but our Solr clusters run elsewhere. 
> Communication between applications and Solr clusters is secured by firewalls on every Solr machine (we do have a Socks Proxy set up in the CF environment, but unfortunately we can't use that for Solr because of the incompatibility between Zookeeper and Java Nio I/O - much to the chagrin of our sysadmin).
> 
> We think that HTTP compression might be very interesting for us because of the hight volume of traffic between two separate environments.
> 
> Here’s what I found out so far: 
> 
> (re. config changes in Solr’s embedded Jetty)
> - I’m aware that this is mostly a matter of configuring Jetty;
> - it seems that this should preferably be set in the solr-jetty-context.xml file;
> - this seems to relate to enabling Jetty's “GzipHandler”
> 
> (re. gzip ‘module’ activation ..?)
> - it puzzles me that https://aroratimus.blogspot.com/2017/08/jettyserver-9.html mentions that Jetty’s GzipHandler should be enabled using two files not found in Solr's embedded Jetty: server/etc/jetty-gzip.xml and server/modules/gzip.mod (they are available when installing Jetty separately though);
> - apparently, Jetty's Gzip module should be activated by adding —add-to-start=gzip to the server startup command. For the embedded Jetty in Solr, it seems that this would require changing the solr startup script
> 
> (re. changes in Solr client)
> - the calling application should add the HTTP Accept-Encoding: gzip, deflate ( according to https://menelic.com/2015/12/04/deploying-solrcloud-across-multiple-data-centers-dc-performance/ )
> 
> 
> I wonder, has anyone ever got this working? In particular:
> 
> - is that gzip ‘module’ activation necessary? That would seem a bit far-fetched, because it involves files not found in the Solr installation and possibly hacking the Solr startup script;
> - what did you add to solr-jetty-context.xml in order to enable gzip compression?
> 
> 
> I suppose that situations with high volumes of external network traffic between Solr and Client must be quite rare. Otherwise I’d think that a feature that potentially offers such obvious benefits (one of the pages above mentions a drop of 75% of network traffic and a 60% faster response time) would have been turned into an “enable http compression yes/no” setting by now :)
> 
> Anyhow, we’re stuck with it … I hope I can get it working.
> 
> 
> Thank in advance for any advice!
> 
> Luthien
> Api developer
> Europeana.eu
> 
> 
> 
> 
> 
> 
> -- 
> Disclaimer: This email and any files transmitted with it are confidential 
> and intended solely for the use of the individual or entity to whom they 
> are
> addressed. If you have received this email in error please notify the 
> system manager. If you are not the named addressee you should not 
> disseminate,
> distribute or copy this email. Please notify the sender 
> immediately by email if you have received this email by mistake and delete 
> this email from your
> system.