You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Filipe Manana (JIRA)" <ji...@apache.org> on 2009/11/29 16:43:20 UTC

[jira] Created: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

adding ?compression=(gzip|deflate) optional parameter to the attachment download API
------------------------------------------------------------------------------------

                 Key: COUCHDB-583
                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
             Project: CouchDB
          Issue Type: New Feature
          Components: HTTP Interface
         Environment: CouchDB trunk revision 885240
            Reporter: Filipe Manana


The following new feature is added in the patch following this ticket creation.

A new optional http query parameter "compression" is added to the attachments API.
This parameter can have one of the values:  "gzip" or "deflate".

When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).

Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).

Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.

Examples:

$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
$ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code

Etap test case included.

Feedback would be very welcome.

cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Paul Davis <pa...@gmail.com>.
s/vbuilds/vpath builds/

I obviously need another cup...

On Wed, Jan 27, 2010 at 10:34 AM, Paul Joseph Davis (JIRA)
<ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805539#action_12805539 ]
>
> Paul Joseph Davis commented on COUCHDB-583:
> -------------------------------------------
>
> @Filipe,
>
> Actually you want to use test_util:src_file(Path).
>
> The way to double check that you have everything down pat is to run:
>
>     $ make distcheck
>
> And that'll go through the full set of checks to see if your code is distribution ready.
>
> For a brief explanation, autotools has a feature called vbuilds that allows people to expand the source on a read-only mount, and then build to a writable location. So you have two directories (srcdir and builddir) and srcdir must be treated as read-only. This means that if you want to touch a file that was part of the release tarball (not all files in SVN are part of this, touching files during a build that aren't part of a release also causes errors) you use srcdir. If the file of interest is a file that's being written to, or was generated as part of a make rule, then it's in builddir.
>
> The relevant autotools docs are:
>
> http://www.gnu.org/software/hello/manual/automake/VPATH-Builds.html
> http://www.gnu.org/software/make/manual/html_node/General-Search.html
> http://www.gnu.org/software/make/manual/html_node/Commands_002fSearch.html#Commands_002fSearch
>
> Those all go over the weirdness to some extent. I'm remembering another helpful page vaguely but can't figure out what it was.
>
>> storing attachments in compressed form and serving them in compressed form if accepted by the client
>> ----------------------------------------------------------------------------------------------------
>>
>>                 Key: COUCHDB-583
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>>             Project: CouchDB
>>          Issue Type: New Feature
>>          Components: Database Core, HTTP Interface
>>         Environment: CouchDB trunk
>>            Reporter: Filipe Manana
>>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>>
>>
>> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
>> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
>> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
>> This follows Damien's suggestion from 30 November:
>> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
>> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
>> Patch attached.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Robert Newson <ro...@gmail.com>.
ah, no, I meant the strictly append-only header stuff. Back when there
was a discussion on how to be sure a discovered header was a real
header and not a carefully crafted attachment.

B.

On Tue, Jan 26, 2010 at 5:13 PM, Robert Newson <ro...@gmail.com> wrote:
> I vaguely recall this came out when Damien added deterministic revisions?
>
> On Tue, Jan 26, 2010 at 3:28 PM, Filipe David Manana <fd...@gmail.com> wrote:
>> Yep, it makes sense.
>>
>> For text files, with a level 8 compression, I've been seeing attachments
>> being reduced to 30-40% of their original size, which is fairly positive :)
>>
>> On Tue, Jan 26, 2010 at 4:25 PM, Paul Davis <pa...@gmail.com>wrote:
>>
>>> > How did you do the tests? Some tool to measure the speed?
>>> > It would be interesting to do the same for the attachments.
>>> > Personally I think that for text mime types, it's generally worth doing
>>> the
>>> > compression.
>>>
>>> They weren't very scientific. Generate a view with and without and
>>> measure the time and file size for each.
>>>
>>> Not at all saying we shouldn't be using it for the attachments. I was
>>> thinking that now that we are doing attachment encoding proper that it
>>> would've been better to turn of the attempt to compress each
>>> compressed chunk because that's just wasted effort. For attachments it
>>> makes much more sense to do it like your patch because of the gzip
>>> dictionaries will run over the stream and not just each chunk written
>>> to disk. If that makes sense?
>>>
>>> Paul
>>>
>>
>>
>>
>> --
>> Filipe David Manana,
>> fdmanana@gmail.com
>> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>>
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>>
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Robert Newson <ro...@gmail.com>.
I vaguely recall this came out when Damien added deterministic revisions?

On Tue, Jan 26, 2010 at 3:28 PM, Filipe David Manana <fd...@gmail.com> wrote:
> Yep, it makes sense.
>
> For text files, with a level 8 compression, I've been seeing attachments
> being reduced to 30-40% of their original size, which is fairly positive :)
>
> On Tue, Jan 26, 2010 at 4:25 PM, Paul Davis <pa...@gmail.com>wrote:
>
>> > How did you do the tests? Some tool to measure the speed?
>> > It would be interesting to do the same for the attachments.
>> > Personally I think that for text mime types, it's generally worth doing
>> the
>> > compression.
>>
>> They weren't very scientific. Generate a view with and without and
>> measure the time and file size for each.
>>
>> Not at all saying we shouldn't be using it for the attachments. I was
>> thinking that now that we are doing attachment encoding proper that it
>> would've been better to turn of the attempt to compress each
>> compressed chunk because that's just wasted effort. For attachments it
>> makes much more sense to do it like your patch because of the gzip
>> dictionaries will run over the stream and not just each chunk written
>> to disk. If that makes sense?
>>
>> Paul
>>
>
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Filipe David Manana <fd...@gmail.com>.
Yep, it makes sense.

For text files, with a level 8 compression, I've been seeing attachments
being reduced to 30-40% of their original size, which is fairly positive :)

On Tue, Jan 26, 2010 at 4:25 PM, Paul Davis <pa...@gmail.com>wrote:

> > How did you do the tests? Some tool to measure the speed?
> > It would be interesting to do the same for the attachments.
> > Personally I think that for text mime types, it's generally worth doing
> the
> > compression.
>
> They weren't very scientific. Generate a view with and without and
> measure the time and file size for each.
>
> Not at all saying we shouldn't be using it for the attachments. I was
> thinking that now that we are doing attachment encoding proper that it
> would've been better to turn of the attempt to compress each
> compressed chunk because that's just wasted effort. For attachments it
> makes much more sense to do it like your patch because of the gzip
> dictionaries will run over the stream and not just each chunk written
> to disk. If that makes sense?
>
> Paul
>



-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Paul Davis <pa...@gmail.com>.
> How did you do the tests? Some tool to measure the speed?
> It would be interesting to do the same for the attachments.
> Personally I think that for text mime types, it's generally worth doing the
> compression.

They weren't very scientific. Generate a view with and without and
measure the time and file size for each.

Not at all saying we shouldn't be using it for the attachments. I was
thinking that now that we are doing attachment encoding proper that it
would've been better to turn of the attempt to compress each
compressed chunk because that's just wasted effort. For attachments it
makes much more sense to do it like your patch because of the gzip
dictionaries will run over the stream and not just each chunk written
to disk. If that makes sense?

Paul

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Filipe David Manana <fd...@gmail.com>.
Hum, interesting.

How did you do the tests? Some tool to measure the speed?
It would be interesting to do the same for the attachments.
Personally I think that for text mime types, it's generally worth doing the
compression.



On Tue, Jan 26, 2010 at 4:14 PM, Paul Davis <pa...@gmail.com>wrote:

> Huh. Maybe we turned that off? We did sit down and do some tests for
> the speed/size tradeoffs of using compression.
>
> The call was just adding the compressed option to
> erlang:term_to_binary/2 before writing that to disk. I'm not seeing it
> either though.
>
> I reckon we just decided to go with no compression for speed.
>
> Paul Davis
>
> On Tue, Jan 26, 2010 at 9:55 AM, Filipe David Manana <fd...@gmail.com>
> wrote:
> > Paul,
> >
> > I don't see anything on couch_file that does compression (looking at
> trunk).
> > I grepped all the couch .erl files for "gzip" and "zlib", and the only
> ones
> > who are using compression are from the replication feature.
> > $ egrep -nr 'gzip|zlib' *.erl
> > couch_rep_att.erl:66: if ContentEncoding =:= "gzip" ->
> > couch_rep_att.erl:67: zlib:gunzip(Data);
> > couch_rep_changes_feed.erl:64: headers = Source#http_db.headers --
> > [{"Accept-Encoding", "gzip"}]
> > couch_rep_httpc.erl:202: "gzip" ->
> > couch_rep_httpc.erl:203: zlib:gunzip(Body);
> >
> > Doing a quick eye scan on couch_file, I don't find anything also.
> > Am I missing something?
> > On Tue, Jan 26, 2010 at 3:24 PM, Paul Davis <paul.joseph.davis@gmail.com
> >
> > wrote:
> >>
> >> > I do have the same opinion as you, the code would affect many of the
> >> > parts
> >> > regarding the compression (specially couch_stream). For doc
> compression,
> >> > I
> >> > imagine it would touch more places, and also present some difficulties
> >> > to
> >> > assure compatibility with the previous DB file formats.
> >>
> >> Just a note that docs are already stored compressed on disk.
> >> Everything written through couch_file.erl gets a gzip compression
> >> level 6 applied if memory serves. Which suddenly makes me wonder if an
> >> optimization for attachment compression might benefit from turning
> >> that off during stream writes since the stream is already compressed.
> >>
> >> Just a couple random thoughts.
> >>
> >> Paul
> >
> >
> >
> > --
> > Filipe David Manana,
> > fdmanana@gmail.com
> > PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
> >
> > "Reasonable men adapt themselves to the world.
> > Unreasonable men adapt the world to themselves.
> > That's why all progress depends on unreasonable men."
> >
> >
>



-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Paul Davis <pa...@gmail.com>.
Huh. Maybe we turned that off? We did sit down and do some tests for
the speed/size tradeoffs of using compression.

The call was just adding the compressed option to
erlang:term_to_binary/2 before writing that to disk. I'm not seeing it
either though.

I reckon we just decided to go with no compression for speed.

Paul Davis

On Tue, Jan 26, 2010 at 9:55 AM, Filipe David Manana <fd...@gmail.com> wrote:
> Paul,
>
> I don't see anything on couch_file that does compression (looking at trunk).
> I grepped all the couch .erl files for "gzip" and "zlib", and the only ones
> who are using compression are from the replication feature.
> $ egrep -nr 'gzip|zlib' *.erl
> couch_rep_att.erl:66: if ContentEncoding =:= "gzip" ->
> couch_rep_att.erl:67: zlib:gunzip(Data);
> couch_rep_changes_feed.erl:64: headers = Source#http_db.headers --
> [{"Accept-Encoding", "gzip"}]
> couch_rep_httpc.erl:202: "gzip" ->
> couch_rep_httpc.erl:203: zlib:gunzip(Body);
>
> Doing a quick eye scan on couch_file, I don't find anything also.
> Am I missing something?
> On Tue, Jan 26, 2010 at 3:24 PM, Paul Davis <pa...@gmail.com>
> wrote:
>>
>> > I do have the same opinion as you, the code would affect many of the
>> > parts
>> > regarding the compression (specially couch_stream). For doc compression,
>> > I
>> > imagine it would touch more places, and also present some difficulties
>> > to
>> > assure compatibility with the previous DB file formats.
>>
>> Just a note that docs are already stored compressed on disk.
>> Everything written through couch_file.erl gets a gzip compression
>> level 6 applied if memory serves. Which suddenly makes me wonder if an
>> optimization for attachment compression might benefit from turning
>> that off during stream writes since the stream is already compressed.
>>
>> Just a couple random thoughts.
>>
>> Paul
>
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Filipe David Manana <fd...@gmail.com>.
Paul,

I don't see anything on couch_file that does compression (looking at trunk).
I grepped all the couch .erl files for "gzip" and "zlib", and the only ones
who are using compression are from the replication feature.

$ egrep -nr 'gzip|zlib' *.erl
couch_rep_att.erl:66: if ContentEncoding =:= "gzip" ->
couch_rep_att.erl:67: zlib:gunzip(Data);
couch_rep_changes_feed.erl:64: headers = Source#http_db.headers --
[{"Accept-Encoding", "gzip"}]
couch_rep_httpc.erl:202: "gzip" ->
couch_rep_httpc.erl:203: zlib:gunzip(Body);

Doing a quick eye scan on couch_file, I don't find anything also.

Am I missing something?

On Tue, Jan 26, 2010 at 3:24 PM, Paul Davis <pa...@gmail.com>wrote:

> > I do have the same opinion as you, the code would affect many of the
> parts
> > regarding the compression (specially couch_stream). For doc compression,
> I
> > imagine it would touch more places, and also present some difficulties to
> > assure compatibility with the previous DB file formats.
>
> Just a note that docs are already stored compressed on disk.
> Everything written through couch_file.erl gets a gzip compression
> level 6 applied if memory serves. Which suddenly makes me wonder if an
> optimization for attachment compression might benefit from turning
> that off during stream writes since the stream is already compressed.
>
> Just a couple random thoughts.
>
> Paul
>



-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Paul Davis <pa...@gmail.com>.
> I do have the same opinion as you, the code would affect many of the parts
> regarding the compression (specially couch_stream). For doc compression, I
> imagine it would touch more places, and also present some difficulties to
> assure compatibility with the previous DB file formats.

Just a note that docs are already stored compressed on disk.
Everything written through couch_file.erl gets a gzip compression
level 6 applied if memory serves. Which suddenly makes me wonder if an
optimization for attachment compression might benefit from turning
that off during stream writes since the stream is already compressed.

Just a couple random thoughts.

Paul

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Robert Newson <ro...@gmail.com>.
Yes, that's one approach. However, I also need to do something to make
old data unrecoverable, so I was thinking of changing the keys at
compaction time, rendering the old data unreadable. My particular use
case is niche, I admit.

On Tue, Jan 26, 2010 at 2:28 PM, Paul Davis <pa...@gmail.com> wrote:
> Robert,
>
> Is that not something you could do with an encrypted filesystem? I'm
> not too familiar with such things so I'm not certain if that carries
> its own drawbacks or what not.
>
> Paul
>
> On Tue, Jan 26, 2010 at 9:26 AM, Robert Newson <ro...@gmail.com> wrote:
>> The basic intention of my patch would be to ensure that unencrypted
>> documents and attachments are not on disk. You're right that there are
>> many other questions to answer for a general encryption feature.
>>
>> For my exact case, which might always remain a local patch, I wouldn't
>> mind the data passing unencrypted to and from the boxes, but should be
>> encrypted while "at rest" on disk.
>>
>> B.
>>
>> On Tue, Jan 26, 2010 at 2:02 PM, Filipe David Manana <fd...@gmail.com> wrote:
>>> Robert,
>>>
>>> I think your plans are very interesting, as they present not only
>>> interesting challenges but the feature itself I find useful also.
>>> Questions such as:
>>> - what kind of encryption? (symmetric, asymmetric, or both)
>>> - where are the keys stored? Or for the symmetric case, we would use a
>>> diffie-helman protocol for e.g.?
>>> - is the objective to have privacy at the DB storage level or also at the
>>> network level? (and force the decryption on the client side only)
>>> there are many more details of course.
>>> I do have the same opinion as you, the code would affect many of the parts
>>> regarding the compression (specially couch_stream). For doc compression, I
>>> imagine it would touch more places, and also present some difficulties to
>>> assure compatibility with the previous DB file formats.
>>> Let me know if somehow I can help you.
>>> cheers
>>>
>>> On Tue, Jan 26, 2010 at 2:51 PM, Robert Newson <ro...@gmail.com>
>>> wrote:
>>>>
>>>> that was my intention, but the option to send the encrypted bytes (for
>>>> decryption at the client end) is intriguing and also echoes the choice
>>>> to send compressed vs uncompressed responses.
>>>>
>>>> I don't mean to hold up this work and I doubt I'll have a patch any
>>>> time soon, it just seems that these two features have significant
>>>> overlap (you can send data in with a transformation applied or not,
>>>> and request it with or without that transformation).
>>>>
>>>> My brief look at the related code led me to believe that adding
>>>> encryption support would touch several places, and I would think that
>>>> most, perhaps all, of those places would also be touched by
>>>> compression support.
>>>>
>>>> Sorry to be vague, I only intended to add another perspective to the
>>>> discussion.
>>>>
>>>> On Tue, Jan 26, 2010 at 11:01 AM, Filipe David Manana
>>>> <fd...@gmail.com> wrote:
>>>> > Hi Robert,
>>>> >
>>>> > That's interesting.
>>>> > I think that abstraction is doable, but maybe not trivial.
>>>> >
>>>> > In your idea, you plan to always decrypt the docs/attachments before
>>>> > sending
>>>> > them to the client?
>>>> >
>>>> >
>>>> > On Tue, Jan 26, 2010 at 11:34 AM, Robert Newson
>>>> > <ro...@gmail.com>wrote:
>>>> >
>>>> >> fyi: I have a (much delayed) plan to work up an encryption patch for
>>>> >> documents and attachments. Since encryption and compression both apply
>>>> >> at the same level (and the order of the two is important), I wonder if
>>>> >> that would change the approach taken here? That is, would an
>>>> >> abstraction that allowed a chain of transformations when storing (and
>>>> >> the reverse chain when retrieving) be in order?
>>>> >>
>>>> >> B.
>>>> >>
>>>> >> On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org>
>>>> >> wrote:
>>>> >> >
>>>> >> >    [
>>>> >>
>>>> >> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946]
>>>> >> >
>>>> >> > Filipe Manana commented on COUCHDB-583:
>>>> >> > ---------------------------------------
>>>> >> >
>>>> >> > @Chris
>>>> >> >
>>>> >> > Good point, I totally agree.
>>>> >> > It would be interesting to test with real couchapps, real data and
>>>> >> > see
>>>> >> how worth it really is.
>>>> >> >
>>>> >> > A 10Mb text file, for instance, was compressed to about 100Kb in one
>>>> >> > of
>>>> >> my tests.
>>>> >> >
>>>> >> > Also, as for the minified JavaScript files for example, it's still
>>>> >> > worth
>>>> >> compressing them. For example, the minified Ext JS lib file (
>>>> >> http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with
>>>> >> gzip stays at about 170Kb, therefore a reasonably good size reduction.
>>>> >> >
>>>> >> > As Damien said in a previous comment, not only saves disk space but
>>>> >> > also
>>>> >> reduces disk IO (attachment download requests, compaction).
>>>> >> >
>>>> >> > I also look forward to see the impact on real, production level,
>>>> >> applications.
>>>> >> >
>>>> >> >> storing attachments in compressed form and serving them in
>>>> >> >> compressed
>>>> >> form if accepted by the client
>>>> >> >>
>>>> >>
>>>> >> ----------------------------------------------------------------------------------------------------
>>>> >> >>
>>>> >> >>                 Key: COUCHDB-583
>>>> >> >>                 URL:
>>>> >> >> https://issues.apache.org/jira/browse/COUCHDB-583
>>>> >> >>             Project: CouchDB
>>>> >> >>          Issue Type: New Feature
>>>> >> >>          Components: Database Core, HTTP Interface
>>>> >> >>         Environment: CouchDB trunk
>>>> >> >>            Reporter: Filipe Manana
>>>> >> >>         Attachments: couchdb-583-trunk-10th-try.patch,
>>>> >> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch,
>>>> >> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch,
>>>> >> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch,
>>>> >> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
>>>> >> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch,
>>>> >> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch,
>>>> >> jira-couchdb-583-1st-try-trunk.patch,
>>>> >> jira-couchdb-583-2nd-try-trunk.patch
>>>> >> >>
>>>> >> >>
>>>> >> >> This feature allows Couch to gzip compress attachments as they are
>>>> >> >> being
>>>> >> received and store them in compressed form.
>>>> >> >> When a client asks for downloading an attachment (e.g. GET
>>>> >> somedb/somedoc/attachment.txt), the attachment is sent in compressed
>>>> >> form if
>>>> >> the client's http request has gzip specified as a valid transfer
>>>> >> encoding
>>>> >> for the response (using the http header "Accept-Encoding"). Otherwise
>>>> >> couch
>>>> >> decompresses the attachment before sending it back to the client.
>>>> >> >> Attachments are compressed only if their MIME type matches one of
>>>> >> >> those
>>>> >> listed in a separate config file. Compression level is also
>>>> >> configurable in
>>>> >> the default.ini file.
>>>> >> >> This follows Damien's suggestion from 30 November:
>>>> >> >> "Perhaps we need a separate user editable ini file to specify
>>>> >> compressable or non-compressable files (would probably be too big for
>>>> >> the
>>>> >> regular ini file). What do other web servers do?
>>>> >> >> Also, a potential optimization is to compress the file while writing
>>>> >> >> to
>>>> >> disk, and serve the compressed bytes directly to clients that can
>>>> >> handle it,
>>>> >> and decompressed for those that can't. For compressable types, it's a
>>>> >> win
>>>> >> for both disk IO for reads and writes, and CPU on read."
>>>> >> >> Patch attached.
>>>> >> >
>>>> >> > --
>>>> >> > This message is automatically generated by JIRA.
>>>> >> > -
>>>> >> > You can reply to this email to add a comment to the issue online.
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Filipe David Manana,
>>>> > fdmanana@gmail.com
>>>> > PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>>>> >
>>>> > "Reasonable men adapt themselves to the world.
>>>> > Unreasonable men adapt the world to themselves.
>>>> > That's why all progress depends on unreasonable men."
>>>> >
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>> fdmanana@gmail.com
>>> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>>>
>>> "Reasonable men adapt themselves to the world.
>>> Unreasonable men adapt the world to themselves.
>>> That's why all progress depends on unreasonable men."
>>>
>>>
>>
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Paul Davis <pa...@gmail.com>.
Robert,

Is that not something you could do with an encrypted filesystem? I'm
not too familiar with such things so I'm not certain if that carries
its own drawbacks or what not.

Paul

On Tue, Jan 26, 2010 at 9:26 AM, Robert Newson <ro...@gmail.com> wrote:
> The basic intention of my patch would be to ensure that unencrypted
> documents and attachments are not on disk. You're right that there are
> many other questions to answer for a general encryption feature.
>
> For my exact case, which might always remain a local patch, I wouldn't
> mind the data passing unencrypted to and from the boxes, but should be
> encrypted while "at rest" on disk.
>
> B.
>
> On Tue, Jan 26, 2010 at 2:02 PM, Filipe David Manana <fd...@gmail.com> wrote:
>> Robert,
>>
>> I think your plans are very interesting, as they present not only
>> interesting challenges but the feature itself I find useful also.
>> Questions such as:
>> - what kind of encryption? (symmetric, asymmetric, or both)
>> - where are the keys stored? Or for the symmetric case, we would use a
>> diffie-helman protocol for e.g.?
>> - is the objective to have privacy at the DB storage level or also at the
>> network level? (and force the decryption on the client side only)
>> there are many more details of course.
>> I do have the same opinion as you, the code would affect many of the parts
>> regarding the compression (specially couch_stream). For doc compression, I
>> imagine it would touch more places, and also present some difficulties to
>> assure compatibility with the previous DB file formats.
>> Let me know if somehow I can help you.
>> cheers
>>
>> On Tue, Jan 26, 2010 at 2:51 PM, Robert Newson <ro...@gmail.com>
>> wrote:
>>>
>>> that was my intention, but the option to send the encrypted bytes (for
>>> decryption at the client end) is intriguing and also echoes the choice
>>> to send compressed vs uncompressed responses.
>>>
>>> I don't mean to hold up this work and I doubt I'll have a patch any
>>> time soon, it just seems that these two features have significant
>>> overlap (you can send data in with a transformation applied or not,
>>> and request it with or without that transformation).
>>>
>>> My brief look at the related code led me to believe that adding
>>> encryption support would touch several places, and I would think that
>>> most, perhaps all, of those places would also be touched by
>>> compression support.
>>>
>>> Sorry to be vague, I only intended to add another perspective to the
>>> discussion.
>>>
>>> On Tue, Jan 26, 2010 at 11:01 AM, Filipe David Manana
>>> <fd...@gmail.com> wrote:
>>> > Hi Robert,
>>> >
>>> > That's interesting.
>>> > I think that abstraction is doable, but maybe not trivial.
>>> >
>>> > In your idea, you plan to always decrypt the docs/attachments before
>>> > sending
>>> > them to the client?
>>> >
>>> >
>>> > On Tue, Jan 26, 2010 at 11:34 AM, Robert Newson
>>> > <ro...@gmail.com>wrote:
>>> >
>>> >> fyi: I have a (much delayed) plan to work up an encryption patch for
>>> >> documents and attachments. Since encryption and compression both apply
>>> >> at the same level (and the order of the two is important), I wonder if
>>> >> that would change the approach taken here? That is, would an
>>> >> abstraction that allowed a chain of transformations when storing (and
>>> >> the reverse chain when retrieving) be in order?
>>> >>
>>> >> B.
>>> >>
>>> >> On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >    [
>>> >>
>>> >> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946]
>>> >> >
>>> >> > Filipe Manana commented on COUCHDB-583:
>>> >> > ---------------------------------------
>>> >> >
>>> >> > @Chris
>>> >> >
>>> >> > Good point, I totally agree.
>>> >> > It would be interesting to test with real couchapps, real data and
>>> >> > see
>>> >> how worth it really is.
>>> >> >
>>> >> > A 10Mb text file, for instance, was compressed to about 100Kb in one
>>> >> > of
>>> >> my tests.
>>> >> >
>>> >> > Also, as for the minified JavaScript files for example, it's still
>>> >> > worth
>>> >> compressing them. For example, the minified Ext JS lib file (
>>> >> http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with
>>> >> gzip stays at about 170Kb, therefore a reasonably good size reduction.
>>> >> >
>>> >> > As Damien said in a previous comment, not only saves disk space but
>>> >> > also
>>> >> reduces disk IO (attachment download requests, compaction).
>>> >> >
>>> >> > I also look forward to see the impact on real, production level,
>>> >> applications.
>>> >> >
>>> >> >> storing attachments in compressed form and serving them in
>>> >> >> compressed
>>> >> form if accepted by the client
>>> >> >>
>>> >>
>>> >> ----------------------------------------------------------------------------------------------------
>>> >> >>
>>> >> >>                 Key: COUCHDB-583
>>> >> >>                 URL:
>>> >> >> https://issues.apache.org/jira/browse/COUCHDB-583
>>> >> >>             Project: CouchDB
>>> >> >>          Issue Type: New Feature
>>> >> >>          Components: Database Core, HTTP Interface
>>> >> >>         Environment: CouchDB trunk
>>> >> >>            Reporter: Filipe Manana
>>> >> >>         Attachments: couchdb-583-trunk-10th-try.patch,
>>> >> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch,
>>> >> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch,
>>> >> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch,
>>> >> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
>>> >> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch,
>>> >> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch,
>>> >> jira-couchdb-583-1st-try-trunk.patch,
>>> >> jira-couchdb-583-2nd-try-trunk.patch
>>> >> >>
>>> >> >>
>>> >> >> This feature allows Couch to gzip compress attachments as they are
>>> >> >> being
>>> >> received and store them in compressed form.
>>> >> >> When a client asks for downloading an attachment (e.g. GET
>>> >> somedb/somedoc/attachment.txt), the attachment is sent in compressed
>>> >> form if
>>> >> the client's http request has gzip specified as a valid transfer
>>> >> encoding
>>> >> for the response (using the http header "Accept-Encoding"). Otherwise
>>> >> couch
>>> >> decompresses the attachment before sending it back to the client.
>>> >> >> Attachments are compressed only if their MIME type matches one of
>>> >> >> those
>>> >> listed in a separate config file. Compression level is also
>>> >> configurable in
>>> >> the default.ini file.
>>> >> >> This follows Damien's suggestion from 30 November:
>>> >> >> "Perhaps we need a separate user editable ini file to specify
>>> >> compressable or non-compressable files (would probably be too big for
>>> >> the
>>> >> regular ini file). What do other web servers do?
>>> >> >> Also, a potential optimization is to compress the file while writing
>>> >> >> to
>>> >> disk, and serve the compressed bytes directly to clients that can
>>> >> handle it,
>>> >> and decompressed for those that can't. For compressable types, it's a
>>> >> win
>>> >> for both disk IO for reads and writes, and CPU on read."
>>> >> >> Patch attached.
>>> >> >
>>> >> > --
>>> >> > This message is automatically generated by JIRA.
>>> >> > -
>>> >> > You can reply to this email to add a comment to the issue online.
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Filipe David Manana,
>>> > fdmanana@gmail.com
>>> > PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>>> >
>>> > "Reasonable men adapt themselves to the world.
>>> > Unreasonable men adapt the world to themselves.
>>> > That's why all progress depends on unreasonable men."
>>> >
>>
>>
>>
>> --
>> Filipe David Manana,
>> fdmanana@gmail.com
>> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>>
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>>
>>
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Robert Newson <ro...@gmail.com>.
The basic intention of my patch would be to ensure that unencrypted
documents and attachments are not on disk. You're right that there are
many other questions to answer for a general encryption feature.

For my exact case, which might always remain a local patch, I wouldn't
mind the data passing unencrypted to and from the boxes, but should be
encrypted while "at rest" on disk.

B.

On Tue, Jan 26, 2010 at 2:02 PM, Filipe David Manana <fd...@gmail.com> wrote:
> Robert,
>
> I think your plans are very interesting, as they present not only
> interesting challenges but the feature itself I find useful also.
> Questions such as:
> - what kind of encryption? (symmetric, asymmetric, or both)
> - where are the keys stored? Or for the symmetric case, we would use a
> diffie-helman protocol for e.g.?
> - is the objective to have privacy at the DB storage level or also at the
> network level? (and force the decryption on the client side only)
> there are many more details of course.
> I do have the same opinion as you, the code would affect many of the parts
> regarding the compression (specially couch_stream). For doc compression, I
> imagine it would touch more places, and also present some difficulties to
> assure compatibility with the previous DB file formats.
> Let me know if somehow I can help you.
> cheers
>
> On Tue, Jan 26, 2010 at 2:51 PM, Robert Newson <ro...@gmail.com>
> wrote:
>>
>> that was my intention, but the option to send the encrypted bytes (for
>> decryption at the client end) is intriguing and also echoes the choice
>> to send compressed vs uncompressed responses.
>>
>> I don't mean to hold up this work and I doubt I'll have a patch any
>> time soon, it just seems that these two features have significant
>> overlap (you can send data in with a transformation applied or not,
>> and request it with or without that transformation).
>>
>> My brief look at the related code led me to believe that adding
>> encryption support would touch several places, and I would think that
>> most, perhaps all, of those places would also be touched by
>> compression support.
>>
>> Sorry to be vague, I only intended to add another perspective to the
>> discussion.
>>
>> On Tue, Jan 26, 2010 at 11:01 AM, Filipe David Manana
>> <fd...@gmail.com> wrote:
>> > Hi Robert,
>> >
>> > That's interesting.
>> > I think that abstraction is doable, but maybe not trivial.
>> >
>> > In your idea, you plan to always decrypt the docs/attachments before
>> > sending
>> > them to the client?
>> >
>> >
>> > On Tue, Jan 26, 2010 at 11:34 AM, Robert Newson
>> > <ro...@gmail.com>wrote:
>> >
>> >> fyi: I have a (much delayed) plan to work up an encryption patch for
>> >> documents and attachments. Since encryption and compression both apply
>> >> at the same level (and the order of the two is important), I wonder if
>> >> that would change the approach taken here? That is, would an
>> >> abstraction that allowed a chain of transformations when storing (and
>> >> the reverse chain when retrieving) be in order?
>> >>
>> >> B.
>> >>
>> >> On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org>
>> >> wrote:
>> >> >
>> >> >    [
>> >>
>> >> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946]
>> >> >
>> >> > Filipe Manana commented on COUCHDB-583:
>> >> > ---------------------------------------
>> >> >
>> >> > @Chris
>> >> >
>> >> > Good point, I totally agree.
>> >> > It would be interesting to test with real couchapps, real data and
>> >> > see
>> >> how worth it really is.
>> >> >
>> >> > A 10Mb text file, for instance, was compressed to about 100Kb in one
>> >> > of
>> >> my tests.
>> >> >
>> >> > Also, as for the minified JavaScript files for example, it's still
>> >> > worth
>> >> compressing them. For example, the minified Ext JS lib file (
>> >> http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with
>> >> gzip stays at about 170Kb, therefore a reasonably good size reduction.
>> >> >
>> >> > As Damien said in a previous comment, not only saves disk space but
>> >> > also
>> >> reduces disk IO (attachment download requests, compaction).
>> >> >
>> >> > I also look forward to see the impact on real, production level,
>> >> applications.
>> >> >
>> >> >> storing attachments in compressed form and serving them in
>> >> >> compressed
>> >> form if accepted by the client
>> >> >>
>> >>
>> >> ----------------------------------------------------------------------------------------------------
>> >> >>
>> >> >>                 Key: COUCHDB-583
>> >> >>                 URL:
>> >> >> https://issues.apache.org/jira/browse/COUCHDB-583
>> >> >>             Project: CouchDB
>> >> >>          Issue Type: New Feature
>> >> >>          Components: Database Core, HTTP Interface
>> >> >>         Environment: CouchDB trunk
>> >> >>            Reporter: Filipe Manana
>> >> >>         Attachments: couchdb-583-trunk-10th-try.patch,
>> >> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch,
>> >> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch,
>> >> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch,
>> >> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
>> >> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch,
>> >> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch,
>> >> jira-couchdb-583-1st-try-trunk.patch,
>> >> jira-couchdb-583-2nd-try-trunk.patch
>> >> >>
>> >> >>
>> >> >> This feature allows Couch to gzip compress attachments as they are
>> >> >> being
>> >> received and store them in compressed form.
>> >> >> When a client asks for downloading an attachment (e.g. GET
>> >> somedb/somedoc/attachment.txt), the attachment is sent in compressed
>> >> form if
>> >> the client's http request has gzip specified as a valid transfer
>> >> encoding
>> >> for the response (using the http header "Accept-Encoding"). Otherwise
>> >> couch
>> >> decompresses the attachment before sending it back to the client.
>> >> >> Attachments are compressed only if their MIME type matches one of
>> >> >> those
>> >> listed in a separate config file. Compression level is also
>> >> configurable in
>> >> the default.ini file.
>> >> >> This follows Damien's suggestion from 30 November:
>> >> >> "Perhaps we need a separate user editable ini file to specify
>> >> compressable or non-compressable files (would probably be too big for
>> >> the
>> >> regular ini file). What do other web servers do?
>> >> >> Also, a potential optimization is to compress the file while writing
>> >> >> to
>> >> disk, and serve the compressed bytes directly to clients that can
>> >> handle it,
>> >> and decompressed for those that can't. For compressable types, it's a
>> >> win
>> >> for both disk IO for reads and writes, and CPU on read."
>> >> >> Patch attached.
>> >> >
>> >> > --
>> >> > This message is automatically generated by JIRA.
>> >> > -
>> >> > You can reply to this email to add a comment to the issue online.
>> >> >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Filipe David Manana,
>> > fdmanana@gmail.com
>> > PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>> >
>> > "Reasonable men adapt themselves to the world.
>> > Unreasonable men adapt the world to themselves.
>> > That's why all progress depends on unreasonable men."
>> >
>
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Filipe David Manana <fd...@gmail.com>.
Robert,

I think your plans are very interesting, as they present not only
interesting challenges but the feature itself I find useful also.

Questions such as:
- what kind of encryption? (symmetric, asymmetric, or both)
- where are the keys stored? Or for the symmetric case, we would use a
diffie-helman protocol for e.g.?
- is the objective to have privacy at the DB storage level or also at the
network level? (and force the decryption on the client side only)

there are many more details of course.

I do have the same opinion as you, the code would affect many of the parts
regarding the compression (specially couch_stream). For doc compression, I
imagine it would touch more places, and also present some difficulties to
assure compatibility with the previous DB file formats.

Let me know if somehow I can help you.

cheers


On Tue, Jan 26, 2010 at 2:51 PM, Robert Newson <ro...@gmail.com>wrote:

> that was my intention, but the option to send the encrypted bytes (for
> decryption at the client end) is intriguing and also echoes the choice
> to send compressed vs uncompressed responses.
>
> I don't mean to hold up this work and I doubt I'll have a patch any
> time soon, it just seems that these two features have significant
> overlap (you can send data in with a transformation applied or not,
> and request it with or without that transformation).
>
> My brief look at the related code led me to believe that adding
> encryption support would touch several places, and I would think that
> most, perhaps all, of those places would also be touched by
> compression support.
>
> Sorry to be vague, I only intended to add another perspective to the
> discussion.
>
> On Tue, Jan 26, 2010 at 11:01 AM, Filipe David Manana
> <fd...@gmail.com> wrote:
> > Hi Robert,
> >
> > That's interesting.
> > I think that abstraction is doable, but maybe not trivial.
> >
> > In your idea, you plan to always decrypt the docs/attachments before
> sending
> > them to the client?
> >
> >
> > On Tue, Jan 26, 2010 at 11:34 AM, Robert Newson <robert.newson@gmail.com
> >wrote:
> >
> >> fyi: I have a (much delayed) plan to work up an encryption patch for
> >> documents and attachments. Since encryption and compression both apply
> >> at the same level (and the order of the two is important), I wonder if
> >> that would change the approach taken here? That is, would an
> >> abstraction that allowed a chain of transformations when storing (and
> >> the reverse chain when retrieving) be in order?
> >>
> >> B.
> >>
> >> On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org>
> >> wrote:
> >> >
> >> >    [
> >>
> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946
> ]
> >> >
> >> > Filipe Manana commented on COUCHDB-583:
> >> > ---------------------------------------
> >> >
> >> > @Chris
> >> >
> >> > Good point, I totally agree.
> >> > It would be interesting to test with real couchapps, real data and see
> >> how worth it really is.
> >> >
> >> > A 10Mb text file, for instance, was compressed to about 100Kb in one
> of
> >> my tests.
> >> >
> >> > Also, as for the minified JavaScript files for example, it's still
> worth
> >> compressing them. For example, the minified Ext JS lib file (
> >> http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with
> >> gzip stays at about 170Kb, therefore a reasonably good size reduction.
> >> >
> >> > As Damien said in a previous comment, not only saves disk space but
> also
> >> reduces disk IO (attachment download requests, compaction).
> >> >
> >> > I also look forward to see the impact on real, production level,
> >> applications.
> >> >
> >> >> storing attachments in compressed form and serving them in compressed
> >> form if accepted by the client
> >> >>
> >>
> ----------------------------------------------------------------------------------------------------
> >> >>
> >> >>                 Key: COUCHDB-583
> >> >>                 URL:
> https://issues.apache.org/jira/browse/COUCHDB-583
> >> >>             Project: CouchDB
> >> >>          Issue Type: New Feature
> >> >>          Components: Database Core, HTTP Interface
> >> >>         Environment: CouchDB trunk
> >> >>            Reporter: Filipe Manana
> >> >>         Attachments: couchdb-583-trunk-10th-try.patch,
> >> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch,
> >> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch,
> >> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch,
> >> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
> >> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch,
> >> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch,
> >> jira-couchdb-583-1st-try-trunk.patch,
> jira-couchdb-583-2nd-try-trunk.patch
> >> >>
> >> >>
> >> >> This feature allows Couch to gzip compress attachments as they are
> being
> >> received and store them in compressed form.
> >> >> When a client asks for downloading an attachment (e.g. GET
> >> somedb/somedoc/attachment.txt), the attachment is sent in compressed
> form if
> >> the client's http request has gzip specified as a valid transfer
> encoding
> >> for the response (using the http header "Accept-Encoding"). Otherwise
> couch
> >> decompresses the attachment before sending it back to the client.
> >> >> Attachments are compressed only if their MIME type matches one of
> those
> >> listed in a separate config file. Compression level is also configurable
> in
> >> the default.ini file.
> >> >> This follows Damien's suggestion from 30 November:
> >> >> "Perhaps we need a separate user editable ini file to specify
> >> compressable or non-compressable files (would probably be too big for
> the
> >> regular ini file). What do other web servers do?
> >> >> Also, a potential optimization is to compress the file while writing
> to
> >> disk, and serve the compressed bytes directly to clients that can handle
> it,
> >> and decompressed for those that can't. For compressable types, it's a
> win
> >> for both disk IO for reads and writes, and CPU on read."
> >> >> Patch attached.
> >> >
> >> > --
> >> > This message is automatically generated by JIRA.
> >> > -
> >> > You can reply to this email to add a comment to the issue online.
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Filipe David Manana,
> > fdmanana@gmail.com
> > PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
> >
> > "Reasonable men adapt themselves to the world.
> > Unreasonable men adapt the world to themselves.
> > That's why all progress depends on unreasonable men."
> >
>



-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Robert Newson <ro...@gmail.com>.
that was my intention, but the option to send the encrypted bytes (for
decryption at the client end) is intriguing and also echoes the choice
to send compressed vs uncompressed responses.

I don't mean to hold up this work and I doubt I'll have a patch any
time soon, it just seems that these two features have significant
overlap (you can send data in with a transformation applied or not,
and request it with or without that transformation).

My brief look at the related code led me to believe that adding
encryption support would touch several places, and I would think that
most, perhaps all, of those places would also be touched by
compression support.

Sorry to be vague, I only intended to add another perspective to the
discussion.

On Tue, Jan 26, 2010 at 11:01 AM, Filipe David Manana
<fd...@gmail.com> wrote:
> Hi Robert,
>
> That's interesting.
> I think that abstraction is doable, but maybe not trivial.
>
> In your idea, you plan to always decrypt the docs/attachments before sending
> them to the client?
>
>
> On Tue, Jan 26, 2010 at 11:34 AM, Robert Newson <ro...@gmail.com>wrote:
>
>> fyi: I have a (much delayed) plan to work up an encryption patch for
>> documents and attachments. Since encryption and compression both apply
>> at the same level (and the order of the two is important), I wonder if
>> that would change the approach taken here? That is, would an
>> abstraction that allowed a chain of transformations when storing (and
>> the reverse chain when retrieving) be in order?
>>
>> B.
>>
>> On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org>
>> wrote:
>> >
>> >    [
>> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946]
>> >
>> > Filipe Manana commented on COUCHDB-583:
>> > ---------------------------------------
>> >
>> > @Chris
>> >
>> > Good point, I totally agree.
>> > It would be interesting to test with real couchapps, real data and see
>> how worth it really is.
>> >
>> > A 10Mb text file, for instance, was compressed to about 100Kb in one of
>> my tests.
>> >
>> > Also, as for the minified JavaScript files for example, it's still worth
>> compressing them. For example, the minified Ext JS lib file (
>> http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with
>> gzip stays at about 170Kb, therefore a reasonably good size reduction.
>> >
>> > As Damien said in a previous comment, not only saves disk space but also
>> reduces disk IO (attachment download requests, compaction).
>> >
>> > I also look forward to see the impact on real, production level,
>> applications.
>> >
>> >> storing attachments in compressed form and serving them in compressed
>> form if accepted by the client
>> >>
>> ----------------------------------------------------------------------------------------------------
>> >>
>> >>                 Key: COUCHDB-583
>> >>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>> >>             Project: CouchDB
>> >>          Issue Type: New Feature
>> >>          Components: Database Core, HTTP Interface
>> >>         Environment: CouchDB trunk
>> >>            Reporter: Filipe Manana
>> >>         Attachments: couchdb-583-trunk-10th-try.patch,
>> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch,
>> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch,
>> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch,
>> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
>> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch,
>> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch,
>> jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>> >>
>> >>
>> >> This feature allows Couch to gzip compress attachments as they are being
>> received and store them in compressed form.
>> >> When a client asks for downloading an attachment (e.g. GET
>> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if
>> the client's http request has gzip specified as a valid transfer encoding
>> for the response (using the http header "Accept-Encoding"). Otherwise couch
>> decompresses the attachment before sending it back to the client.
>> >> Attachments are compressed only if their MIME type matches one of those
>> listed in a separate config file. Compression level is also configurable in
>> the default.ini file.
>> >> This follows Damien's suggestion from 30 November:
>> >> "Perhaps we need a separate user editable ini file to specify
>> compressable or non-compressable files (would probably be too big for the
>> regular ini file). What do other web servers do?
>> >> Also, a potential optimization is to compress the file while writing to
>> disk, and serve the compressed bytes directly to clients that can handle it,
>> and decompressed for those that can't. For compressable types, it's a win
>> for both disk IO for reads and writes, and CPU on read."
>> >> Patch attached.
>> >
>> > --
>> > This message is automatically generated by JIRA.
>> > -
>> > You can reply to this email to add a comment to the issue online.
>> >
>> >
>>
>
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Filipe David Manana <fd...@gmail.com>.
Hi Robert,

That's interesting.
I think that abstraction is doable, but maybe not trivial.

In your idea, you plan to always decrypt the docs/attachments before sending
them to the client?


On Tue, Jan 26, 2010 at 11:34 AM, Robert Newson <ro...@gmail.com>wrote:

> fyi: I have a (much delayed) plan to work up an encryption patch for
> documents and attachments. Since encryption and compression both apply
> at the same level (and the order of the two is important), I wonder if
> that would change the approach taken here? That is, would an
> abstraction that allowed a chain of transformations when storing (and
> the reverse chain when retrieving) be in order?
>
> B.
>
> On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org>
> wrote:
> >
> >    [
> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946]
> >
> > Filipe Manana commented on COUCHDB-583:
> > ---------------------------------------
> >
> > @Chris
> >
> > Good point, I totally agree.
> > It would be interesting to test with real couchapps, real data and see
> how worth it really is.
> >
> > A 10Mb text file, for instance, was compressed to about 100Kb in one of
> my tests.
> >
> > Also, as for the minified JavaScript files for example, it's still worth
> compressing them. For example, the minified Ext JS lib file (
> http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with
> gzip stays at about 170Kb, therefore a reasonably good size reduction.
> >
> > As Damien said in a previous comment, not only saves disk space but also
> reduces disk IO (attachment download requests, compaction).
> >
> > I also look forward to see the impact on real, production level,
> applications.
> >
> >> storing attachments in compressed form and serving them in compressed
> form if accepted by the client
> >>
> ----------------------------------------------------------------------------------------------------
> >>
> >>                 Key: COUCHDB-583
> >>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
> >>             Project: CouchDB
> >>          Issue Type: New Feature
> >>          Components: Database Core, HTTP Interface
> >>         Environment: CouchDB trunk
> >>            Reporter: Filipe Manana
> >>         Attachments: couchdb-583-trunk-10th-try.patch,
> couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch,
> couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch,
> couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch,
> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
> couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch,
> couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch,
> jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
> >>
> >>
> >> This feature allows Couch to gzip compress attachments as they are being
> received and store them in compressed form.
> >> When a client asks for downloading an attachment (e.g. GET
> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if
> the client's http request has gzip specified as a valid transfer encoding
> for the response (using the http header "Accept-Encoding"). Otherwise couch
> decompresses the attachment before sending it back to the client.
> >> Attachments are compressed only if their MIME type matches one of those
> listed in a separate config file. Compression level is also configurable in
> the default.ini file.
> >> This follows Damien's suggestion from 30 November:
> >> "Perhaps we need a separate user editable ini file to specify
> compressable or non-compressable files (would probably be too big for the
> regular ini file). What do other web servers do?
> >> Also, a potential optimization is to compress the file while writing to
> disk, and serve the compressed bytes directly to clients that can handle it,
> and decompressed for those that can't. For compressable types, it's a win
> for both disk IO for reads and writes, and CPU on read."
> >> Patch attached.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>



-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Robert Newson <ro...@gmail.com>.
fyi: I have a (much delayed) plan to work up an encryption patch for
documents and attachments. Since encryption and compression both apply
at the same level (and the order of the two is important), I wonder if
that would change the approach taken here? That is, would an
abstraction that allowed a chain of transformations when storing (and
the reverse chain when retrieving) be in order?

B.

On Tue, Jan 26, 2010 at 8:02 AM, Filipe Manana (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946 ]
>
> Filipe Manana commented on COUCHDB-583:
> ---------------------------------------
>
> @Chris
>
> Good point, I totally agree.
> It would be interesting to test with real couchapps, real data and see how worth it really is.
>
> A 10Mb text file, for instance, was compressed to about 100Kb in one of my tests.
>
> Also, as for the minified JavaScript files for example, it's still worth compressing them. For example, the minified Ext JS lib file (http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with gzip stays at about 170Kb, therefore a reasonably good size reduction.
>
> As Damien said in a previous comment, not only saves disk space but also reduces disk IO (attachment download requests, compaction).
>
> I also look forward to see the impact on real, production level, applications.
>
>> storing attachments in compressed form and serving them in compressed form if accepted by the client
>> ----------------------------------------------------------------------------------------------------
>>
>>                 Key: COUCHDB-583
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>>             Project: CouchDB
>>          Issue Type: New Feature
>>          Components: Database Core, HTTP Interface
>>         Environment: CouchDB trunk
>>            Reporter: Filipe Manana
>>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>>
>>
>> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
>> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
>> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
>> This follows Damien's suggestion from 30 November:
>> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
>> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
>> Patch attached.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-6th-try.patch

Added httpd misc handler for uri _compressible_type/ and corrected misspelled word "compressable" to compressible.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804917#action_12804917 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Filipe,

Sorry, got distracted by a weekend project. I'll try and do a thorough review tomorrow before the big news day on Wednesday.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-17th-try-git.patch

Following the previous suggestion in this thread and in the irc channel:

renamed:

 #att.len   to   #att.att_len

and

#att.identity_len   to   #att.disk_len

Hopefully the meaning will be more clear to everybody :)

ps: use git apply --binary

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-17th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-3rd-try.patch

Hello,

Here follows the patch implementing Damien's suggestion.

A new key under the httpd section is added to the config file. Its value is the name of a file that lists which MIME types are worth compressing (text based types for e.g.).

When an attachment is uploaded (using the standalone attachment API), if its type matches one of the MIME types listed in the former file, the attachment is stored with gzip compression.

When a client requests the download of an attachment:

1) if the HTTP request message has an Accept-Encoding header listing gzip as an acceptable response body encoding method, the attachment is sent unmodified to the client (in gzip form and in a chunked HTTP message)

2) if the HTTP request doesn't include that header, or if it includes but it mentions gzip as not acceptable, the attachment is uncompressed (on the fly) and sent to the client (in a chunked HTTP response)

An Etap test, covering 7 scenarios, is included.
Note that it breaks some other tests (for example, tests that upload a text attachment and then downloaded it to check its content). I'll fix those tests after your feedback. This is a preliminary version.

Feedback (specially from the commiters) regarding the implementation is very welcome.

cheers,
Filipe Manana (fdmanana)

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

           Component/s: Database Core
           Description: 
This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.

When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.

Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.

This follows Damien's suggestion from 30 November:

"Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?

Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."

Patch attached.

  was:
The following new feature is added in the patch following this ticket creation.

A new optional http query parameter "compression" is added to the attachments API.
This parameter can have one of the values:  "gzip" or "deflate".

When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).

Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).

Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.

Examples:

$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
$ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code

Etap test case included.

Feedback would be very welcome.

cheers

           Environment: CouchDB trunk  (was: CouchDB trunk revision 885240)
    Remaining Estimate:     (was: 24h)
     Original Estimate:     (was: 24h)
               Summary: storing attachments in compressed form and serving them in compressed form if accepted by the client  (was: adding ?compression=(gzip|deflate) optional parameter to the attachment download API)

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Damien Katz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783732#action_12783732 ] 

Damien Katz commented on COUCHDB-583:
-------------------------------------

One problem I think I see with the patch is that we are compressing regardless of mime type. For already compressed files (image, music and video), it does nothing but add CPU overhead.

Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?

Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-10th-try.patch

@Paul

This patch uses the white list allowing * wildcards, so we can specify text/* as you suggested.

I also followed your previous comments.

Currently the decision of whether or not to compress an attachment is made by couch_db:with_stream/3, which passes the compression level to couch_stream. I'm not sure what's the best place to take that decision. I also thought about being couch_stream the one who decides that, but that would imply dependency on couch_config. 

Later I'll submit a patch for mochiweb (for parsing the accept-encoding header).

Let me know your opinions.

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783453#action_12783453 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Filipe,

Good work once again, this is another vote for the accept-encoding. Also, your comment on the necessary evil of buffering responses to send a content-length after compression is in conflict with the HTTP spec:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection.



> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-5th-try.patch

Updated the Etap test suite to include 8 more tests that cover:

+ attachment compression when using the non-standalone attachment upload API
+ attachment decompression when getting an attachment by getting its doc with "?attachments=true"

cheers

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-583:
-----------------------------------


Hi Filipe, COUCHDB-437 is more about compressing the documents on disk than it is the attachments.  To compress Erlang terms on disk, one only needs to replace

term_to_binary(Term)

with

term_to_binary(Term, [compressed])

The two issues are definitely linked, though.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805379#action_12805379 ] 

Chris Anderson commented on COUCHDB-583:
----------------------------------------

I ran the etap test (don't forget to make it executable)

And got this output:

$ ./test/etap/run ./test/etap/140-attachment-comp.t 
# Current time local 2010-01-26 21:57:12
# Using etap version "0.3.4"
1..57
Apache CouchDB 0.0.0 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.2.0>] Apache CouchDB has started on http://127.0.0.1:5984/
# Test died abnormally: {'EXIT',{{badmatch,{error,enoent}},[{erl_eval,expr,3}]}}
[error] [<0.31.0>] {error_report,<0.23.0>,
 {<0.31.0>,crash_report,
  [[{initial_call,{etap,start_etap_server,[]}},
    {pid,<0.31.0>},
    {registered_name,etap_server},
    {error_info,
     {exit,
      {badarg,
       [{io,format,
         [<0.23.0>,"~s~n",
          [[66,97,105,108,32,111,117,116,33,32|
            {'EXIT',{{badmatch,{error,enoent}},[{erl_eval,expr,3}]}}]]]},
        {etap,test_server,1},
        {proc_lib,init_p_do_apply,3}]},
      [{io,o_request,3},{etap,test_server,1},{proc_lib,init_p_do_apply,3}]}},
    {ancestors,[<0.2.0>]},
    {messages,[done]},
    {links,[]},
    {dictionary,[]},
    {trap_exit,false},
    {status,running},
    {heap_size,377},
    {stack_size,24},
    {reductions,451}],
   []]}}


On the other hand, the Futon tests pass, and I browsed to an old database file and was able to load attachments from it so that's some good news.



> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-9th-try.patch

9th patch

+ simplified code in couch_httpd_db.erl (nested case expressions), thanks davidc_ for pointing the mess
+ gracefully merges with trunk svn rev 898552

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-7th-try.patch

7th patch. Does some code cleanup and avoids exporting 2 unnecessary functions.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783423#action_12783423 ] 

Adam Kocoloski commented on COUCHDB-583:
----------------------------------------

I second Sebastian on this -- Accept-Encoding is probably the right way for the client to indicate a preference that the response be compressed/gzipped.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-13th-try.patch

Just removed a useless function in couch_config that I added previously.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783498#action_12783498 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

As Adam says, discussion is about as far as things have gone. It would indeed be a very good fit for CouchDB.

The last remaining issue though is how it does streaming of entity bodies. Right now it expects CouchDB to be able to have a call back mechanism which is a huge impedance mismatch for the control flow of iterating views.

I've been working on a python port of webmachine the last couple days and as part I've been trying to think how to patch webmachine to return a write-callable that could be passed to our streaming output. I haven't gone through the lower end bits of where its handling streaming output, but at the moment it looks like a fairly big patch to the body handling facilities in webmachine. If I get time I'll try and code a patch that doesn't affect the status quo, and then we can start to seriously think about tacking webmachine onto CouchDB.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783469#action_12783469 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Alright,

I'll be working on it (Accept-Encoding header).

I must admit I was not familiar with those sections of the http 1.1 rfc.

:)

thanks for your feedback.


> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-18th-try-git.patch

18th

removed the test_140_* files, since they are a copy of README and share/www/images/logo.png.
The test case now uses those 2 files directly.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-17th-try-git.patch, couchdb-583-trunk-18th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805526#action_12805526 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

s/directory permissions/file paths/

Haven't had my coffee yet.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805468#action_12805468 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

@Chris

just a quick thought. You invoked the test like:

$ ./test/etap/run ./test/etap/140-attachment-comp.t

The test, in the very beginning, does the call: 

{ok, Data} = file:read_file("test_140_file.txt"),

At that moment the current working dir is not test/etap. That .txt test file is found in test/etap/, so... :)

If you cd into test/etap and do

$ ./140-attachment-comp.t

does it work?

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: jira-couchdb-583-1st-try-trunk.patch

This patch also fixes a potential issue in Mochiweb.

mochiweb_response:write_chunk/1 has a potential problem when the given chunk contains itself a chunk separator (CRLF). This patch splits the chunk into "subchunks" if necessary.

Feedback on this is welcome.

best regards,
Filipe Manana

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806283#action_12806283 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Hi Chris,

For the Mochiweb stuff I submitted a patch sometime ago (with exactly the same code), which got accepted:

http://code.google.com/p/mochiweb/source/detail?r=133

I was not aware of that Mimeparse lib. Sure I'll take a look into it and give my contribution :)

Thank you (and Paul) for reviewing the patch and help me refining it.

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-8th-try.patch

+ Updated patch to gracefully merge with the latest trunk revision.

+ Fixed small regression where the value of the length field of an attachment stub was the length of the compressed form of the attachment

+ simplified some of the attachment compression code

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806217#action_12806217 ] 

Chris Anderson commented on COUCHDB-583:
----------------------------------------

Looking at Mimeparse and your Mochiweb patches, it looks like you are covering parts of HTTP that Mimeparse doesn't handle yet. So disregard my last comment about using it instead. And I see that the accept stuff is in Mochi also. Rad.

So it looks like we can commit this. I'd sure appreciate anyone's help upgrading Mochi from upstream (properly).

Filipe, the Mimeparse lib might appreciate your contribution.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801337#action_12801337 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Well, I went for the separate file because of 2 reasons:

1) the list of mime types might be big (probably no longer valid as we can use the * wildcard)

2) if by some reason we need to specify also Mime type parameters, the ini file is problematic since the semicolon character marks noth the beginning of mime parameters and it's also the start of comment for ini files

Well, I haven't figured out why we might need mime parameters, so probably it's not a valid reason :)

Having a 0.9 db file would be nice. I am not 100% sure if we need to provide more clauses of couch_doc:att_foldl_unzip to ressemble those of couch_doc:att_foldl for example :(

Thanks Paul

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-16th-try-git.patch

So, here it follows the correction.

Now using:

    {ok, Data} = file:read_file(
        test_util:source_file("test/etap/test_140_file.txt")
    )

@Chris, @Paul,
Now you should be to run the etap test without cd to test/etap/

And Paul thanks for the explanation on autotools and the links

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804915#action_12804915 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

@Paul

any news on this?

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805383#action_12805383 ] 

Chris Anderson commented on COUCHDB-583:
----------------------------------------

Reading the the source code this looks solid. The only thing I'm not in a position to understand just by reading and testing, is how this new code will interact with old files.

I see references to backwards compatibility in the comments and some changes in the right parts of the code. I feel we are lacking a way to definitively test this.

On the whole, I think this is good code and worth further review. Hopefully we'll be able to get it polished and in soon.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793593#action_12793593 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

@Filipe,

> Note that using non chunked compressed body responses requires
> storing all the compressed blocks in memory and then sending each
> one to the client. This is a necessary "evil", as we only know the length
> of the compressed body after compressing all the body, and we need
> to set the "Content-Length" header for non chunked responses.

You can't buffer attachments in RAM. That's just a no-no.

Going back to the RFC, I don't really see a good place where it's spelled out what to do about Content-Encoding headers and their correspondence with Content-Length. I'm fairly certain you've got this backwards though. The Content-Length needs to be the number of bytes of gzip data used to represent the message body. Content-Length is used to delineate messages, so having them mismatched would be bad.

Though, this does mean that you'll need to store the number of bytes before and after compression when you stream the attachment to disk.

Also, the compression algorithm must be specified by the Accept-Encoding header, not a URL parameter.

The "treshold_for_chunking_comp_responses" is a bit of a weird threshold. You might want to change that to something like "min_compression_length" and just have it say "files shorter than this will not be compressed". And then leave the claims on chunked vs. not to the HTTP content negotiation algorithms.

I'll be out of town for the next couple weeks so forgive me if my responses are a tad slow.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799948#action_12799948 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Hrm, 4KiB of headers even? That is a good point though. But I'd still be quite hesitant to make it a whitelist of of content types to compress. Unless maybe we allowed text/* or similar. Or perhaps is should be a blacklist that could do the * match?

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: jira-couchdb-583-2nd-try-trunk.patch

Just fixed some misspelled words and bad sentences in the comments :)

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783744#action_12783744 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Using a filter like mime type for deciding whether or not to store the attachments in the "gziped" form seems a good idea to me. mod_gzip has that kind of filter for deciding whether a response is gziped or not. An example of mod_gzip config:

mod_gzip_item_include         mime       ^text/html$
mod_gzip_item_include         mime       ^text/plain$
mod_gzip_item_include         mime       ^httpd/unix-directory$

I do like that approach, but since I am a newbie with Couch's code/implementation, I don't know if storing the attachments directly in gziped form will break anything else. couch_stream could compress attachment data when receiving it and write each compressed block to disk.

Then, when requesting an attachment download, the client would have to send the header "Accept-Encoding: gzip" in order to get the compressed data. If it doesn't send that header, we would send him the attachment in the uncompressed form.

I volunteer for implementing it if you agree. 

cheers


> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783456#action_12783456 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Oh, and reading the next bullet point makes things more explicit:

The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805410#action_12805410 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

@Chris,

thanks for looking at it :)

That's weird, in my machine I have no problems running the etap test.
I'll do a checkout of the latest revision (trunk) and re-run it.

Or maybe (unlikely), you did not do a "git apply" with the --binary option?

The stack trace seems to be related with a bad arg passed to io:format, which is strange. I just grepped the patch and there's no call to io:format there. Did you apply it to some local branch where you have io:format calls for debugging for instance? Or you applied a patch other than  couchdb-583-trunk-15th-try-git.patch

I'll re-run it when I get home in the evening. I'll let you know something.

@Paul have you runned the erap test and had the same problem as Chris?

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801436#action_12801436 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

For the long list I think we should just patch the config parser to do the multiple line thing that most INI style configs have which will alleviate the concerns for length.

I've never seen the list of content-negotiation types use media type parameters so I don't think that's a problem for the INI either.

I don't think the 0.9 stuff is a blocker for this patch. Just musing that it would be nice for this patch and the other upgrade code because I don't think we explicitly test that stuff anywhere.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805518#action_12805518 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

It was probably the directory permissions.

All file access needs to go through the test_util.erl functions so that make distcheck works. Its a long and complicated story, but if you read from or write to the file system you need to figure out the paths based on a couple envirionment variables. the utils stuff should make that work easily.


> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-4th-try-trunk.patch

Here follows an updated patch.

relatively to the previous one:

+ works with ticket 558 (attachment upload md5 integrity check)
+ adds gzip compression level configuration option
+ no longer breaks 2 Etap tests and 2 JavaScript tests :)

All tests, except for the Etap ICU test (which is broken for some other reason), are working with this patch.
Etap test included for the new feature is included.

The feature of this patch seems to be the same that is requested in ticket 437 => https://issues.apache.org/jira/browse/COUCHDB-437

Feedback please

cheers

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793569#action_12793569 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Hi Adam,

I see. I am just thinking about how to do it without causing backward incompatibility.

Currently couch_file:append_term[_md5]/2 calls term_to_binary and precedes it with a 32 bits header and an optional md5 digest. Then the 32bits header + optional term md5 + term_to_binary(Term) is appended to the end of the DB file. 

The high order bit of this header indicates whether an md5 hash follows the header (and preceding the serialized term).
Without looking deeply into the code, I think about adding an extra bit to the header which indicates if the term is compressed or not.

Of course this implies adding a new DB header version value, etc. Not so straightforward as attachment compression.

An (ugly) alternative I see is not adding a new header bit and when reading a serialized term from the DB file, always gunzip it and catch an exception:

3> catch(zlib:gunzip(<<"hello world">>)).
{'EXIT',{data_error,[{zlib,call,3},
                     {zlib,inflate,2},
                     {zlib,gunzip,1},
                     {erl_eval,do_apply,5},
                     {erl_eval,expr,5},
                     {shell,exprs,6},
                     {shell,eval_exprs,6},
                     {shell,eval_loop,3}]}}

The issue is that the data_error exception might not mean that the data is not gzip compressed.

If using an extra header bit, than why not add a few more bits that will be reserved for future features. A little bit like most protocol RFCs do, they reserve a few bits in an header for future usage :)

What's your opinion?

cheers



> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806216#action_12806216 ] 

Chris Anderson commented on COUCHDB-583:
----------------------------------------

I think this patch is pretty much ready. In IRC I asked about renaming identity_len and len to raw_len and stored_len to make it more clear what they are referring to.

I've looked through the disk and storage parts of the patch and they seem solid and ready to go. 

I've just noticed something else:

The changes in mochiweb for accepted_encodings. Did this patch go back to mochi yet?

There is an Erlang Mimeparse library that should handle the q values etc, and it's heavily tested. Our JavaScript side uses a JS port of the original Mimeparse library. I think it'd be better to patch Mochiweb to use Mimeparse than to write a new one that's not under they scrutiny the original Mimeparse is.

http://code.google.com/p/mimeparse/

Is there a reason not to use Mimeparse here? Can I persuade you to make that change? Maybe there's a way we can use Mimeparse to avoid having to patch Mochiweb at all?

Thanks for working so hard at this. You've picked probably one of the more challenging features in all of Couch.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783497#action_12783497 ] 

Adam Kocoloski commented on COUCHDB-583:
----------------------------------------

Yes, it's been discussed, although I'm not aware of any efforts that have gone further than discussion.  I'd be keen to try it out someday, but it's a lower-priority issue for me.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-14th-try-git.patch

Sorry for yet another patch upload, but forgot to generate the previous one with the --binary option of "git diff". The new binary file test/etap/test_140_file.png is necessary for the test case.

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Anderson closed COUCHDB-583.
----------------------------------

    Resolution: Fixed

committed in r904650

Thanks for the hard work everyone!

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-16th-try-git.patch, couchdb-583-trunk-17th-try-git.patch, couchdb-583-trunk-18th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Damien Katz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799937#action_12799937 ] 

Damien Katz commented on COUCHDB-583:
-------------------------------------

I haven't looked at the patch, but I agree with most of Paul comments, except for figuring out when to compress files. Lots of compressed files might have uncompressed headers in the file, leading to unnecessary compression. MP3s with id3v2 tags immediately come to mind.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788090#action_12788090 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Forgot to mention that in point 1), couchdb sends the response with the header "Content-Encoding: gzip".

For case 2) this header is missing (therefore the content encoding is identity - the attachment is sent in uncompressed form) if the request accepts identity encoding (either explicitly or by not having "*; q=0" or "identity; q=0" as part of the Accept-Encoding header value). Otherwise an HTTP error 406 is sent (as rfc 2616 recommends) - none of the specified encoding methods requested by the client are supported by the server.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Filipe David Manana <fd...@gmail.com>.
:)

be aware, it's a patch a lot bigger and different than the one you checked
before :)

Honestly, I think storing compressed attachments is a plus. Allows for disk
space saving and faster disk reads. Noticeable specially for text format
attachments. E.g. a 10mb text file is compressed into less than 100K. Much
less disk space needed and a faster reading.

For web browser based apps (in Firefox at least) the XMLHttpReq sent by the
browser always as the accept-encoding header set to "gzip, deflate". It
doesn't allow the programmer to override the header's value, at least I
couldn't do it with Firefox 3.5. Therefore, couch doesn't need to decompress
the attachment while sending its chunks to the client. Decompression is done
by Firefox and transparent to the JS code :) A really nice speedup for
couch.

cheers

On Tue, Dec 22, 2009 at 2:46 PM, Paul Joseph Davis <
paul.joseph.davis@gmail.com> wrote:

> Too lazy to log into jira on my phone.
>
> I just responded to the Jura email I got this morning. I must've gotten
> confused.  I'll try and review your patch tonight or tomorrow and get back
> to you.
>
>
>
>
> On Dec 22, 2009, at 8:26 AM, "Filipe Manana (JIRA)" <ji...@apache.org>
> wrote:
>
>
>>   [
>> https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793615#action_12793615
>>  ]
>>
>> Filipe Manana commented on COUCHDB-583:
>> ---------------------------------------
>>
>> Paul,
>>
>> you're comments refer to the first 3 patches' implementation. The 4th and
>> latest follow Damien's idea (comment from the 30th November).
>>
>> Check the last patch:      couchdb-583-trunk-6th-try.patch
>>
>> The approach is completely different. There's no use of the query
>> parameter ?compression=(gzip|deflate) and no longer that block buffering
>> thing for compression / decompression :) With the latest ones the attachment
>> are compressed and stored in compressed form (if their mime type matches one
>> of those in the config file).
>>
>> As soon as a data chunk is received from the client, it is compressed with
>> a zlib stream and written to disk. Decompression follows the same idea - 1
>> block is read from the disk, compressed and a chunk sent to the client. No
>> need to buffer things. I figured out how to use zlib for incremental gzip
>> compression/decompression.
>>
>> The "reshold_for_chunking_comp_responses" is completely gone also. HTTP
>> content-encoding is now negotiated.
>>
>> After analying the patch, let me know if the implementation is ok and how
>> to simplify it further.
>>
>> cheers
>>
>>
>>  storing attachments in compressed form and serving them in compressed
>>> form if accepted by the client
>>>
>>> ----------------------------------------------------------------------------------------------------
>>>
>>>               Key: COUCHDB-583
>>>               URL: https://issues.apache.org/jira/browse/COUCHDB-583
>>>           Project: CouchDB
>>>        Issue Type: New Feature
>>>        Components: Database Core, HTTP Interface
>>>       Environment: CouchDB trunk
>>>          Reporter: Filipe Manana
>>>       Attachments: couchdb-583-trunk-3rd-try.patch,
>>> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,
>>> couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch,
>>> jira-couchdb-583-2nd-try-trunk.patch
>>>
>>>
>>> This feature allows Couch to gzip compress attachments as they are being
>>> received and store them in compressed form.
>>> When a client asks for downloading an attachment (e.g. GET
>>> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if
>>> the client's http request has gzip specified as a valid transfer encoding
>>> for the response (using the http header "Accept-Encoding"). Otherwise couch
>>> decompresses the attachment before sending it back to the client.
>>> Attachments are compressed only if their MIME type matches one of those
>>> listed in a separate config file. Compression level is also configurable in
>>> the default.ini file.
>>> This follows Damien's suggestion from 30 November:
>>> "Perhaps we need a separate user editable ini file to specify
>>> compressable or non-compressable files (would probably be too big for the
>>> regular ini file). What do other web servers do?
>>> Also, a potential optimization is to compress the file while writing to
>>> disk, and serve the compressed bytes directly to clients that can handle it,
>>> and decompressed for those that can't. For compressable types, it's a win
>>> for both disk IO for reads and writes, and CPU on read."
>>> Patch attached.
>>>
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>


-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: [jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by Paul Joseph Davis <pa...@gmail.com>.
Too lazy to log into jira on my phone.

I just responded to the Jura email I got this morning. I must've  
gotten confused.  I'll try and review your patch tonight or tomorrow  
and get back to you.



On Dec 22, 2009, at 8:26 AM, "Filipe Manana (JIRA)" <ji...@apache.org>  
wrote:

>
>    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793615#action_12793615 
>  ]
>
> Filipe Manana commented on COUCHDB-583:
> ---------------------------------------
>
> Paul,
>
> you're comments refer to the first 3 patches' implementation. The  
> 4th and latest follow Damien's idea (comment from the 30th November).
>
> Check the last patch:      couchdb-583-trunk-6th-try.patch
>
> The approach is completely different. There's no use of the query  
> parameter ?compression=(gzip|deflate) and no longer that block  
> buffering thing for compression / decompression :) With the latest  
> ones the attachment are compressed and stored in compressed form (if  
> their mime type matches one of those in the config file).
>
> As soon as a data chunk is received from the client, it is  
> compressed with a zlib stream and written to disk. Decompression  
> follows the same idea - 1 block is read from the disk, compressed  
> and a chunk sent to the client. No need to buffer things. I figured  
> out how to use zlib for incremental gzip compression/decompression.
>
> The "reshold_for_chunking_comp_responses" is completely gone also.  
> HTTP content-encoding is now negotiated.
>
> After analying the patch, let me know if the implementation is ok  
> and how to simplify it further.
>
> cheers
>
>
>> storing attachments in compressed form and serving them in  
>> compressed form if accepted by the client
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> --- 
>> -------------------------------------------------------------------
>>
>>                Key: COUCHDB-583
>>                URL: https://issues.apache.org/jira/browse/COUCHDB-583
>>            Project: CouchDB
>>         Issue Type: New Feature
>>         Components: Database Core, HTTP Interface
>>        Environment: CouchDB trunk
>>           Reporter: Filipe Manana
>>        Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583- 
>> trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch,  
>> couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try- 
>> trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>>
>>
>> This feature allows Couch to gzip compress attachments as they are  
>> being received and store them in compressed form.
>> When a client asks for downloading an attachment (e.g. GET somedb/ 
>> somedoc/attachment.txt), the attachment is sent in compressed form  
>> if the client's http request has gzip specified as a valid transfer  
>> encoding for the response (using the http header "Accept- 
>> Encoding"). Otherwise couch decompresses the attachment before  
>> sending it back to the client.
>> Attachments are compressed only if their MIME type matches one of  
>> those listed in a separate config file. Compression level is also  
>> configurable in the default.ini file.
>> This follows Damien's suggestion from 30 November:
>> "Perhaps we need a separate user editable ini file to specify  
>> compressable or non-compressable files (would probably be too big  
>> for the regular ini file). What do other web servers do?
>> Also, a potential optimization is to compress the file while  
>> writing to disk, and serve the compressed bytes directly to clients  
>> that can handle it, and decompressed for those that can't. For  
>> compressable types, it's a win for both disk IO for reads and  
>> writes, and CPU on read."
>> Patch attached.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793615#action_12793615 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Paul,

you're comments refer to the first 3 patches' implementation. The 4th and latest follow Damien's idea (comment from the 30th November).

Check the last patch:  	couchdb-583-trunk-6th-try.patch

The approach is completely different. There's no use of the query parameter ?compression=(gzip|deflate) and no longer that block buffering thing for compression / decompression :) With the latest ones the attachment are compressed and stored in compressed form (if their mime type matches one of those in the config file).

As soon as a data chunk is received from the client, it is compressed with a zlib stream and written to disk. Decompression follows the same idea - 1 block is read from the disk, compressed and a chunk sent to the client. No need to buffer things. I figured out how to use zlib for incremental gzip compression/decompression.

The "reshold_for_chunking_comp_responses" is completely gone also. HTTP content-encoding is now negotiated.

After analying the patch, let me know if the implementation is ok and how to simplify it further.

cheers


> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801309#action_12801309 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Awesome that the patch was accepted. With the other one that's probably enough reason to pull the latest mochiweb. Only glanced through the patch on my phone but it looks quite better. The config stuff is still too specific though. Is there something that requires a separate file? I was thinking that a multiline config value patch might be beneficial for this. 

I'll take a closer look when I get home but minus the config stuff I think this is gonna be pretty close to good. 

Also I'm a bit concerned about the 0.9 upgrade code on terms of testing. I wonder if we can find a 0.9 db file to use in testing these sorts of things. 

Good work once again Filipe!

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-12th-try.patch

@Paul

Forget the previous patch, it added unneeded complexity for parsing and dealing with multivalue (line) ini file  config parameters.

Let me know if everything seems ok.

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801259#action_12801259 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Just for the record, the patch submitted for mochiweb (Accept-Encoding header parsing) has just been accepted:

http://code.google.com/p/mochiweb/source/detail?r=133

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Justin Sheehy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783529#action_12783529 ] 

Justin Sheehy commented on COUCHDB-583:
---------------------------------------

If someone would like to point me at the relevant part of the CouchDB source that generates the view body (so that I can see the shape of the impedance mismatch) I'd be happy to see if I can help.  I am only minimally familiar with the CouchDB sources, but might be able to suggest an approach that would fit well once I see the last remaining issue from your point of view.

If you discuss it on couchdb-dev instead of here or the webmachine list, just CC me if you want me to notice as I don't always catch every thread there.



> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Glenn Rempe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783495#action_12783495 ] 

Glenn Rempe commented on COUCHDB-583:
-------------------------------------

As a side note (this should probably be a discussion of its own though).  Has any thought been given to bolting WebMachine onto CouchDB as the primary HTTP interface?  I'm not sure how difficult it would be to retrofit, or if there are use cases that would preclude this, but doing so would seem to eliminate a lot of uncertainty (and possible errors) in making sure that the complexities of HTTP are properly handled?

I comment on it here since webmachine could in theory 'do the right thing' regarding handling content encoding, HTTP response codes, etc.

http://bitbucket.org/justin/webmachine/wiki/Home

Just a thought.  I'm curious if this has been discussed before.

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799947#action_12799947 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Hum, 

Lets open a votation :)

1) use an heuristic, as suggested by Paul

2) or a file listing the mime types worth compressing

3) some other alternative?

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805539#action_12805539 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

@Filipe,

Actually you want to use test_util:src_file(Path).

The way to double check that you have everything down pat is to run:

     $ make distcheck

And that'll go through the full set of checks to see if your code is distribution ready.

For a brief explanation, autotools has a feature called vbuilds that allows people to expand the source on a read-only mount, and then build to a writable location. So you have two directories (srcdir and builddir) and srcdir must be treated as read-only. This means that if you want to touch a file that was part of the release tarball (not all files in SVN are part of this, touching files during a build that aren't part of a release also causes errors) you use srcdir. If the file of interest is a file that's being written to, or was generated as part of a make rule, then it's in builddir.

The relevant autotools docs are:

http://www.gnu.org/software/hello/manual/automake/VPATH-Builds.html
http://www.gnu.org/software/make/manual/html_node/General-Search.html
http://www.gnu.org/software/make/manual/html_node/Commands_002fSearch.html#Commands_002fSearch

Those all go over the weirdness to some extent. I'm remembering another helpful page vaguely but can't figure out what it was.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804946#action_12804946 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

@Chris

Good point, I totally agree.
It would be interesting to test with real couchapps, real data and see how worth it really is.

A 10Mb text file, for instance, was compressed to about 100Kb in one of my tests.

Also, as for the minified JavaScript files for example, it's still worth compressing them. For example, the minified Ext JS lib file (http://www.extjs.com,  ext-all.js) is about 630Kb big. Compressed with gzip stays at about 170Kb, therefore a reasonably good size reduction.

As Damien said in a previous comment, not only saves disk space but also reduces disk IO (attachment download requests, compaction).

I also look forward to see the impact on real, production level, applications.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804919#action_12804919 ] 

Chris Anderson commented on COUCHDB-583:
----------------------------------------

I haven't really been tuned into the full discussion of this patch -- I think the biggest questions for something that digs this deep into the file format are:

How does it impact stability? (looks fine at my cursory glance, aside from cross compatibility with older versions of the file format, which I'd have to look more closely at)

What is the payoff? How much space does this save in practice? (say, with email messages as attachments, vs with pngs or minified js) I'm not asking you to do all that work, just think that real numbers are a selling point.

If it's a big payoff then this becomes a priority. We might also want to add options for compressing the views.




> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Sebastian Cohnen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783418#action_12783418 ] 

Sebastian Cohnen commented on COUCHDB-583:
------------------------------------------

One question:

Why not use Content-Encoding Headers? or is there a particular reason to expose this to the GET-request? see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-15th-try-git.patch

Just added a missing caseless option to the regexp match done in couch_util:compressible_att_type/1

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799829#action_12799829 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Just some quick thoughts reading through the diff:

I'm not a fan of the file containing a list of compressible types. There are too many types that will just make that configuration hard. Not to mention exposing an entirely new API endpoint to work with those types is also needlessly complex.

I'd prefer to see an automatic test trying to compress the first 4K or so of an attachment and use a heuristic to determine whether it compressed enough to justify compressing the entire attachment. If that's not doable, the compressible type system should be integrated into the current configuration mechanism.

For testing from FireFox it might be best to expose a "attachment is stored in compressed form" attribute in the _attachments member.

Passing around the <<"Y">> and <<"N">> binaries as a flag for an attachment being compressed is un-erlangy. true and false atoms would be better.

Test code does not belong in couch_httpd.erl.

Is there something I'm missing on why we need to leak couch_util:gzip* functions into couch_httpd_db.erl instead of putting all of that logic into couch_stream.erl?

Is there nothing in mochiweb to handle accept-encoding parsing?

Instead of naming tests test1 -> test17 and comments above each test, just use a descriptive test name. It might help to group related tests as well so that tests are easier to find.

Data in the etap tests shouldn't be stored inline when its that big. Create data files and use the test helpers to reference the filenames and read from disk.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) adding ?compression=(gzip|deflate) optional parameter to the attachment download API

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783531#action_12783531 ] 

Paul Joseph Davis commented on COUCHDB-583:
-------------------------------------------

Justin,

In a nutshell, the impedance mismatch is that webmachine wants to have control of the logic. Ie, webmachine wants a callable that it can call at its leisure. When CouchDB writes view output to a socket, its basically executing a lists:foldl style callback function. In terms of webmachine, this would be akin to a streaming body where webmachine provides a fun that CouchDB could call to write to the client socket.

The relevant code is at [1]. Its not the most centralized, but that's the fold function defined that does the view output.

I haven't quite pieced together how webmachine is streaming to the socket. I've read the response handling code a few times and near as I can tell its not actually streaming the body over the socket, just consuming an iterator before charset and encoding functions are applied.

Either way, bottom line is that to fit neatly into CouchDB code, webmachine would handle a response like {writer, fun Write/1} where the argument to the Write fun was a callable that writes bytes to the socket. The charset and transfer encodings would then be required to be streamable writers. Without that behavior the nearest thing I can think of would be to do alot of message passing to invert flow control which while possible seems ungood.

[1] http://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_httpd_view.erl#L390

> adding ?compression=(gzip|deflate) optional parameter to the attachment download API
> ------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: HTTP Interface
>         Environment: CouchDB trunk revision 885240
>            Reporter: Filipe Manana
>         Attachments: jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following new feature is added in the patch following this ticket creation.
> A new optional http query parameter "compression" is added to the attachments API.
> This parameter can have one of the values:  "gzip" or "deflate".
> When asking for an attachment (GET http request), if the query parameter "compression" is found, CouchDB will send the attachment compressed to the client (and sets the header Content-Encoding with gzip or deflate).
> Further, it adds a new config option "treshold_for_chunking_comp_responses" (httpd section) that specifies an attachment length threshold. If an attachment has a length >= than this threshold, the http response will be chunked (besides compressed).
> Note that using non chunked compressed  body responses requires storing all the compressed blocks in memory and then sending each one to the client. This is a necessary "evil", as we only know the length of the compressed body after compressing all the body, and we need to set the "Content-Length" header for non chunked responses. By sending chunked responses, we can send each compressed block immediately, without accumulating all of them in memory.
> Examples:
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not be compressed
> $ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # will give a 500 error code
> Etap test case included.
> Feedback would be very welcome.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805533#action_12805533 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Yep,

I should have used 

{ok, Data} = file:read_file(test_util:build_file("test/etap/test_140_file.txt")),

instead of

{ok, Data} = file:read_file("test_140_file.txt"),

Stupid mistake of mine, as I always run the tests when I'm in test/etap/.
I can submit a patch (yet another) with that correction (same goes for the loading of the test png file) when I get back home tonight.

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-12th-try.patch, couchdb-583-trunk-13th-try.patch, couchdb-583-trunk-14th-try-git.patch, couchdb-583-trunk-15th-try-git.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Filipe Manana updated COUCHDB-583:
----------------------------------

    Attachment: couchdb-583-trunk-11th-try.patch

@Paul

This time the mime type expressions are listed in the .ini config file :)

couch_httpd_misc_handlers.erl  had some small adaptations since the config.js test was failing after adding support for multi value .ini parameters.

Let me know if this patch is fully ok.

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-10th-try.patch, couchdb-583-trunk-11th-try.patch, couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-583) storing attachments in compressed form and serving them in compressed form if accepted by the client

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799922#action_12799922 ] 

Filipe Manana commented on COUCHDB-583:
---------------------------------------

Hi Paul,

thanks for you're feedback.

"Passing around the <<"Y">> and <<"N">> binaries as a flag for an attachment being compressed is un-erlangy. true and false atoms would be better. "

Well, this was mostly because I read somewhere in Armstrong's book that binaries are preferred (more efficient) for IO operations (network, disk storage). But I agree, using true / false atoms is more readable.

"Is there nothing in mochiweb to handle accept-encoding parsing? "

I don't think so, at least in the mochiweb included with couch. It's probably better to move this accept-encoding parsing functions, and the respective test functions, into the mochiweb sources.

I'll get back to work and enhance the patch following your remarks.

cheers

> storing attachments in compressed form and serving them in compressed form if accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, couchdb-583-trunk-6th-try.patch, couchdb-583-trunk-7th-try.patch, couchdb-583-trunk-8th-try.patch, couchdb-583-trunk-9th-try.patch, jira-couchdb-583-1st-try-trunk.patch, jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET somedb/somedoc/attachment.txt), the attachment is sent in compressed form if the client's http request has gzip specified as a valid transfer encoding for the response (using the http header "Accept-Encoding"). Otherwise couch decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those listed in a separate config file. Compression level is also configurable in the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or non-compressable files (would probably be too big for the regular ini file). What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, and serve the compressed bytes directly to clients that can handle it, and decompressed for those that can't. For compressable types, it's a win for both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.