You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Torstein Krause Johansen <to...@gmail.com> on 2011/05/25 09:01:07 UTC

Does CouchDB support gzip encoding?

Hi all,

first of all, thanks for a great product. I've been enjoying my first
weeks with CouchDB, its simplicity and RESTfulness as well as couchapp
allowing me to spread out the map, reduce & lists into separate JS
files. Wonderful. Now, after searching the high and low on the web,
I'm still left with one big question:

<short>
Does Couch support gzip encoding? If yes, how can I enable it?
</short>

<longer>
Now, my current problem is read performance as in: the returned data
from my view is 14M and it takes 6-8 seconds to download because of
the sheer size. I've tried to add accept encoding=gzip,deflate to my
HTTP client, but to no avail, I still get text back. Is there a way I
can get gzip-ed content back from Couch?

A bit of background for these sizes: Each row looks like this:

{"id":"30b095cb2ec80ee789374be98736e2bc","key":[3,7766,"2011-05-24"],"value":434}

I first do a GET on this view to get the entries matching the first +
second integer + the date (and corresponding end date). My second GET
to Couch is then done for each of the values found in "value" on which
I do a _count/reduce to get the number of occurrences of this number.
The total time of an accumulated overview of all values & with count
(within the three given restraints you can see in the key) is 7-8
seconds.

Most of this time is spent getting the initial result and it's mostly
down to the time it takes to download the 14M of data, ~165,000 rows.
If these would be delivered with encoding=gzip, the data size would be
around 1.1M (this is what running gzip on the downloaded content
suggests).
</longer>

Another related question I have is if there's a way to do bulk reads
just as we can do bulk updates (which are great by the way!). As other
people have written before, it's the HTTP negotiation which is the
real hurdle with Couch performance (as well as view checkpoint
updates, of course).


Best regards,

-Torstein

couchdb:  0.11.0-2.3
OS: Debian wheezy/sid

Re: Does CouchDB support gzip encoding?

Posted by Paul Hirst <pa...@sophos.com>.
On Wed, 2011-05-25 at 09:53 +0100, Torstein Krause Johansen wrote:

> The GET still took ~5.5 seconds, though, even though the amount of
> data is 1.1M instead of 14M.
>

I query some pretty large views from our couch server, up to ~50M. These
queries are over the local network and they are quite slow. It's clearly
couch which is the bottleneck and not the network. I notice querying a
view twice in a row, the second time runs a lot faster, presumably
because the required data is in the disk cache.

So I'm not surprised by your result that gzipping doesn't seem to help.


Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Re: Does CouchDB support gzip encoding?

Posted by Torstein Krause Johansen <to...@gmail.com>.
On 25/05/11 20:25, Andrew Stuart (SuperCoders) wrote:
> Are you running your curl test from local or remote machine?
>
> Is nginx on the same machine or local?

Yes, I'm running nginx, couchdb and curl from the same host, so there's 
no DNS lookup or network latency in the mix.

Cheers,

-Torstein

Re: Does CouchDB support gzip encoding?

Posted by "Andrew Stuart (SuperCoders)" <an...@supercoders.com.au>.
Are you running your curl test from local or remote machine?

Is nginx on the same machine or local?

as



On 25/05/2011, at 6:53 PM, Torstein Krause Johansen wrote:

Hi again,

On 25 May 2011 16:10, Torstein Krause Johansen
<to...@gmail.com> wrote:
> On 25 May 2011 15:05, Andrew Stuart (SuperCoders)
> <an...@supercoders.com.au> wrote:
>> I put nginx in front of the server as a reverse proxy and configure  
>> it to do
>> the GZIP compression.

> But after restarting nginx, it seems like I'm still getting text/ 
> plain:
>
>  wget  --header="Accept-Encoding: gzip" -S -O /dev/null
> "http://localhost:80/memento/ ...
> [..]
>  HTTP/1.1 200 OK
>  Server: nginx/1.0.1
>  Date: Wed, 25 May 2011 08:06:34 GMT
>  Content-Type: text/plain;charset=utf-8
>  Connection: close
>  Vary: Accept-Encoding
>  Etag: "8YHQMMO1RSGW6DKQHQDBM8X6L"
>  Cache-Control: must-revalidate
>
> What am I missing here?

Mhmm, running the same test with curl instead of wget gave me the
desired results. Odd.

The GET still took ~5.5 seconds, though, even though the amount of
data is 1.1M instead of 14M.

Since the _id is half of the data and I don't really need it; is there
any way (outside of a list which takes too long time for so many rows)
to omit the _id (and key) in the view result? (I need the key when
doing the view "search" so I have to emit it in my map.js).

Cheers,

-Torstein
-- 
Message  protected by MailGuard: e-mail anti-virus, anti-spam and  
content filtering.http://www.mailguard.com.au/mg
Click here to report this message as spam:
https://login.mailguard.com.au/report/1CnmRtyJ2L/5Usl0QXNAzu3Y3ljTmQZOv/1

Re: Does CouchDB support gzip encoding?

Posted by Torstein Krause Johansen <to...@gmail.com>.
Hi again,

On 25 May 2011 16:10, Torstein Krause Johansen
<to...@gmail.com> wrote:
> On 25 May 2011 15:05, Andrew Stuart (SuperCoders)
> <an...@supercoders.com.au> wrote:
>> I put nginx in front of the server as a reverse proxy and configure it to do
>> the GZIP compression.

> But after restarting nginx, it seems like I'm still getting text/plain:
>
>  wget  --header="Accept-Encoding: gzip" -S -O /dev/null
> "http://localhost:80/memento/ ...
> [..]
>  HTTP/1.1 200 OK
>  Server: nginx/1.0.1
>  Date: Wed, 25 May 2011 08:06:34 GMT
>  Content-Type: text/plain;charset=utf-8
>  Connection: close
>  Vary: Accept-Encoding
>  Etag: "8YHQMMO1RSGW6DKQHQDBM8X6L"
>  Cache-Control: must-revalidate
>
> What am I missing here?

Mhmm, running the same test with curl instead of wget gave me the
desired results. Odd.

The GET still took ~5.5 seconds, though, even though the amount of
data is 1.1M instead of 14M.

Since the _id is half of the data and I don't really need it; is there
any way (outside of a list which takes too long time for so many rows)
to omit the _id (and key) in the view result? (I need the key when
doing the view "search" so I have to emit it in my map.js).

Cheers,

-Torstein

Re: Does CouchDB support gzip encoding?

Posted by Torstein Krause Johansen <to...@gmail.com>.
On 25 May 2011 15:05, Andrew Stuart (SuperCoders)
<an...@supercoders.com.au> wrote:
> I put nginx in front of the server as a reverse proxy and configure it to do
> the GZIP compression.

Cheers for that.

I've got this in /etc/nginx/sites.enabled/default

  location /memento {
      proxy_pass http://localhost:5984/memento;
      proxy_set_header Host $host;
  }

And gzip turned on in /etc/nginx/nginx.conf:
        gzip on;
        gzip_disable "msie6";

        gzip_vary on;
        gzip_proxied any;
        gzip_comp_level 6;
        gzip_buffers 16 8k;
        gzip_http_version 1.1;
        gzip_types text/plain text/css application/json
application/x-javascript text/xml application/xml application/xml+rss
text/javascript;

But after restarting nginx, it seems like I'm still getting text/plain:

 wget  --header="Accept-Encoding: gzip" -S -O /dev/null
"http://localhost:80/memento/ ...
[..]
  HTTP/1.1 200 OK
  Server: nginx/1.0.1
  Date: Wed, 25 May 2011 08:06:34 GMT
  Content-Type: text/plain;charset=utf-8
  Connection: close
  Vary: Accept-Encoding
  Etag: "8YHQMMO1RSGW6DKQHQDBM8X6L"
  Cache-Control: must-revalidate

What am I missing here?

Cheers,

-Torstein

Re: Does CouchDB support gzip encoding?

Posted by "Andrew Stuart (SuperCoders)" <an...@supercoders.com.au>.
I put nginx in front of the server as a reverse proxy and configure it  
to do the GZIP compression.

as



On 25/05/2011, at 5:01 PM, Torstein Krause Johansen wrote:

Hi all,

first of all, thanks for a great product. I've been enjoying my first
weeks with CouchDB, its simplicity and RESTfulness as well as couchapp
allowing me to spread out the map, reduce & lists into separate JS
files. Wonderful. Now, after searching the high and low on the web,
I'm still left with one big question:

<short>
Does Couch support gzip encoding? If yes, how can I enable it?
</short>

<longer>
Now, my current problem is read performance as in: the returned data
from my view is 14M and it takes 6-8 seconds to download because of
the sheer size. I've tried to add accept encoding=gzip,deflate to my
HTTP client, but to no avail, I still get text back. Is there a way I
can get gzip-ed content back from Couch?

A bit of background for these sizes: Each row looks like this:

{"id":"30b095cb2ec80ee789374be98736e2bc","key": 
[3,7766,"2011-05-24"],"value":434}

I first do a GET on this view to get the entries matching the first +
second integer + the date (and corresponding end date). My second GET
to Couch is then done for each of the values found in "value" on which
I do a _count/reduce to get the number of occurrences of this number.
The total time of an accumulated overview of all values & with count
(within the three given restraints you can see in the key) is 7-8
seconds.

Most of this time is spent getting the initial result and it's mostly
down to the time it takes to download the 14M of data, ~165,000 rows.
If these would be delivered with encoding=gzip, the data size would be
around 1.1M (this is what running gzip on the downloaded content
suggests).
</longer>

Another related question I have is if there's a way to do bulk reads
just as we can do bulk updates (which are great by the way!). As other
people have written before, it's the HTTP negotiation which is the
real hurdle with Couch performance (as well as view checkpoint
updates, of course).


Best regards,

-Torstein

couchdb:  0.11.0-2.3
OS: Debian wheezy/sid
-- 
Message  protected by MailGuard: e-mail anti-virus, anti-spam and  
content filtering.http://www.mailguard.com.au/mg
Click here to report this message as spam:
https://login.mailguard.com.au/report/1CnkRSO4Dy/4QLy1MRGTaMRC0viEKAIIh/0.6