You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Garren Smith <ga...@apache.org> on 2016/04/08 14:43:21 UTC

[Proposal] Change etag calculation

Hi All,

I would like to propose a change to how we generate the ETAG for a document
response.

While working on COUCHDB-2978[1] we started looking at how etags were
generated when a document is requested. Currently the etag for a document
is the _rev. This causes a problem with _local documents as their revision
never changes also with documents where the user chooses the _rev. Both of
these can cause some caching issues.

I would like to propose that the etag is generated from the body of the
response (typically the document), _rev and attachments. So for normal
documents the etag is md5(_rev, body, attachment) and for _local documents
it would be md5(_rev, body). This would make it a lot more consistent and
the etag would change when a document has changed.

Cheers
Garren





[1] https://issues.apache.org/jira/browse/COUCHDB-2978

Re: [Proposal] Change etag calculation

Posted by Robert Newson <rn...@apache.org>.
The etag would be derived from the response body not doc.body (as the patch shows, it was parts of doc.revs and doc.body). It must include _rev for cache correctness. It doesn't need _id as caching is per resource. It wouldn't be wrong to include it either, though. 

Sending correct etags in all cases is not controversial. The question is whether it's worth the effort. Fixing etags for _local docs I consider essential now that user agents with caching are performing replication tasks / reading and writing checkpoints. 

The current etag scheme for regular docs is not broken except in the corner case of user supplied _rev. It's reasonable to fix that as long as we don't incur a huge cost but I'm happy with the new status quo with garrens patch. 

> On 11 Apr 2016, at 17:11, Adam Kocoloski <ko...@apache.org> wrote:
> 
> Cool. I’m a little confused about the MD5 for regular docs discussion. What’s the driving force behind switching away from revisions as ETags. Is it
> 
> 1) Users can break this by setting their own revisions
> 2) Documents with identical bodies but different revisions should be cacheable
> 
> In case #1 a user operating at this level potentially breaks quite a bit more than the caching mechanism, doesn’t she? I’d need to think through the full ramifications but I’d love to see investment in the standardization of the revision generation algorithm as we discussed in a recent thread, and then maybe be a bit more strict around the revision IDs that we accept with new_edits=false.
> 
> Case #2 also feels broken to me, we probably can’t be returning documents from a cache with the wrong revision ID.
> 
> Sorry if I’m missing the crux of the argument here.
> 
> Adam
> 
>> On Apr 11, 2016, at 10:43 AM, Garren Smith <ga...@apache.org> wrote:
>> 
>> Thanks for the feedback. For now I will proceed with getting the _local
>> fixes then. Then we can look at a performant way of doing the md5 for
>> regular docs.
>> 
>> Cheers
>> Garren
>> 
>>> On Sat, Apr 9, 2016 at 10:18 AM, Robert Newson <rn...@apache.org> wrote:
>>> 
>>> The original patch from garren calculated the md5(body) at query time.
>>> This was fine for just local docs since fetching then is rare.
>>> 
>>> I'm +1 on the proposal and agree we need to precalculate the etag for
>>> regular docs.
>>> 
>>>>> On 8 Apr 2016, at 19:01, Alexander Shorin <kx...@gmail.com> wrote:
>>>>> 
>>>>> On Fri, Apr 8, 2016 at 8:55 PM, Mutton, James <jm...@akamai.com>
>>> wrote:
>>>>> Is the proposal to calculate and store the md5 as a meta field, or to
>>> calculate md5(_rev, body) at request time?  Doing this at request time
>>> would be very expensive for heavily loaded servers.
>>>> 
>>>> Good point and concern. This is not a new meta field, just a Etag
>>>> header value. And obliviously, there should be the way to not generate
>>>> Etag value if it eventually the same as the _rev field value (I think
>>>> it's good idea to let them share the same algo).  Technically, this
>>>> could be done by looking on what kind of edit happened: interactive or
>>>> not.
>>>> 
>>>> --
>>>> ,,,^..^,,,
> 


Re: [Proposal] Change etag calculation

Posted by Adam Kocoloski <ko...@apache.org>.
Cool. I’m a little confused about the MD5 for regular docs discussion. What’s the driving force behind switching away from revisions as ETags. Is it

1) Users can break this by setting their own revisions
2) Documents with identical bodies but different revisions should be cacheable

In case #1 a user operating at this level potentially breaks quite a bit more than the caching mechanism, doesn’t she? I’d need to think through the full ramifications but I’d love to see investment in the standardization of the revision generation algorithm as we discussed in a recent thread, and then maybe be a bit more strict around the revision IDs that we accept with new_edits=false.

Case #2 also feels broken to me, we probably can’t be returning documents from a cache with the wrong revision ID.

Sorry if I’m missing the crux of the argument here.

Adam

> On Apr 11, 2016, at 10:43 AM, Garren Smith <ga...@apache.org> wrote:
> 
> Thanks for the feedback. For now I will proceed with getting the _local
> fixes then. Then we can look at a performant way of doing the md5 for
> regular docs.
> 
> Cheers
> Garren
> 
> On Sat, Apr 9, 2016 at 10:18 AM, Robert Newson <rn...@apache.org> wrote:
> 
>> The original patch from garren calculated the md5(body) at query time.
>> This was fine for just local docs since fetching then is rare.
>> 
>> I'm +1 on the proposal and agree we need to precalculate the etag for
>> regular docs.
>> 
>>> On 8 Apr 2016, at 19:01, Alexander Shorin <kx...@gmail.com> wrote:
>>> 
>>>> On Fri, Apr 8, 2016 at 8:55 PM, Mutton, James <jm...@akamai.com>
>> wrote:
>>>> Is the proposal to calculate and store the md5 as a meta field, or to
>> calculate md5(_rev, body) at request time?  Doing this at request time
>> would be very expensive for heavily loaded servers.
>>> 
>>> Good point and concern. This is not a new meta field, just a Etag
>>> header value. And obliviously, there should be the way to not generate
>>> Etag value if it eventually the same as the _rev field value (I think
>>> it's good idea to let them share the same algo).  Technically, this
>>> could be done by looking on what kind of edit happened: interactive or
>>> not.
>>> 
>>> --
>>> ,,,^..^,,,
>> 
>> 


Re: [Proposal] Change etag calculation

Posted by Garren Smith <ga...@apache.org>.
Thanks for the feedback. For now I will proceed with getting the _local
fixes then. Then we can look at a performant way of doing the md5 for
regular docs.

Cheers
Garren

On Sat, Apr 9, 2016 at 10:18 AM, Robert Newson <rn...@apache.org> wrote:

> The original patch from garren calculated the md5(body) at query time.
> This was fine for just local docs since fetching then is rare.
>
> I'm +1 on the proposal and agree we need to precalculate the etag for
> regular docs.
>
> > On 8 Apr 2016, at 19:01, Alexander Shorin <kx...@gmail.com> wrote:
> >
> >> On Fri, Apr 8, 2016 at 8:55 PM, Mutton, James <jm...@akamai.com>
> wrote:
> >> Is the proposal to calculate and store the md5 as a meta field, or to
> calculate md5(_rev, body) at request time?  Doing this at request time
> would be very expensive for heavily loaded servers.
> >
> > Good point and concern. This is not a new meta field, just a Etag
> > header value. And obliviously, there should be the way to not generate
> > Etag value if it eventually the same as the _rev field value (I think
> > it's good idea to let them share the same algo).  Technically, this
> > could be done by looking on what kind of edit happened: interactive or
> > not.
> >
> > --
> > ,,,^..^,,,
>
>

Re: [Proposal] Change etag calculation

Posted by Robert Newson <rn...@apache.org>.
The original patch from garren calculated the md5(body) at query time. This was fine for just local docs since fetching then is rare. 

I'm +1 on the proposal and agree we need to precalculate the etag for regular docs. 

> On 8 Apr 2016, at 19:01, Alexander Shorin <kx...@gmail.com> wrote:
> 
>> On Fri, Apr 8, 2016 at 8:55 PM, Mutton, James <jm...@akamai.com> wrote:
>> Is the proposal to calculate and store the md5 as a meta field, or to calculate md5(_rev, body) at request time?  Doing this at request time would be very expensive for heavily loaded servers.
> 
> Good point and concern. This is not a new meta field, just a Etag
> header value. And obliviously, there should be the way to not generate
> Etag value if it eventually the same as the _rev field value (I think
> it's good idea to let them share the same algo).  Technically, this
> could be done by looking on what kind of edit happened: interactive or
> not.
> 
> --
> ,,,^..^,,,


Re: [Proposal] Change etag calculation

Posted by Robert Newson <rn...@apache.org>.

> On 8 Apr 2016, at 19:01, Alexander Shorin <kx...@gmail.com> wrote:
> 
>> On Fri, Apr 8, 2016 at 8:55 PM, Mutton, James <jm...@akamai.com> wrote:
>> Is the proposal to calculate and store the md5 as a meta field, or to calculate md5(_rev, body) at request time?  Doing this at request time would be very expensive for heavily loaded servers.
> 
> Good point and concern. This is not a new meta field, just a Etag
> header value. And obliviously, there should be the way to not generate
> Etag value if it eventually the same as the _rev field value (I think
> it's good idea to let them share the same algo).  Technically, this
> could be done by looking on what kind of edit happened: interactive or
> not.
> 
> --
> ,,,^..^,,,


Re: [Proposal] Change etag calculation

Posted by Alexander Shorin <kx...@gmail.com>.
On Fri, Apr 8, 2016 at 8:55 PM, Mutton, James <jm...@akamai.com> wrote:
> Is the proposal to calculate and store the md5 as a meta field, or to calculate md5(_rev, body) at request time?  Doing this at request time would be very expensive for heavily loaded servers.

Good point and concern. This is not a new meta field, just a Etag
header value. And obliviously, there should be the way to not generate
Etag value if it eventually the same as the _rev field value (I think
it's good idea to let them share the same algo).  Technically, this
could be done by looking on what kind of edit happened: interactive or
not.

--
,,,^..^,,,

Re: [Proposal] Change etag calculation

Posted by "Mutton, James" <jm...@akamai.com>.
Is the proposal to calculate and store the md5 as a meta field, or to calculate md5(_rev, body) at request time?  Doing this at request time would be very expensive for heavily loaded servers.

<\JamesM>

> On Apr 8, 2016, at 7:28 AM, Garren Smith <ga...@apache.org> wrote:
> 
> I think having something like X-Couch-Document-Rev is a good idea. We might
> have to warn in the docs that _local docs that header value will never
> change.
> 
>> On Fri, Apr 8, 2016 at 3:05 PM, Alexander Shorin <kx...@gmail.com> wrote:
>> 
>> +1 if we also add X-Couch-Document-Rev so something that reflects the
>> _rev value in HTTP headers.
>> 
>> Currently, while Etag matches the _rev value, it makes quite handy
>> feature for getting the document revisions without fetching whole the
>> body. To not break this feature, we need to introduce something in
>> place of old Etag behaviour to keep the balance.
>> --
>> ,,,^..^,,,
>> 
>> 
>>> On Fri, Apr 8, 2016 at 3:43 PM, Garren Smith <ga...@apache.org> wrote:
>>> Hi All,
>>> 
>>> I would like to propose a change to how we generate the ETAG for a
>> document
>>> response.
>>> 
>>> While working on COUCHDB-2978[1] we started looking at how etags were
>>> generated when a document is requested. Currently the etag for a document
>>> is the _rev. This causes a problem with _local documents as their
>> revision
>>> never changes also with documents where the user chooses the _rev. Both
>> of
>>> these can cause some caching issues.
>>> 
>>> I would like to propose that the etag is generated from the body of the
>>> response (typically the document), _rev and attachments. So for normal
>>> documents the etag is md5(_rev, body, attachment) and for _local
>> documents
>>> it would be md5(_rev, body). This would make it a lot more consistent and
>>> the etag would change when a document has changed.
>>> 
>>> Cheers
>>> Garren
>>> 
>>> 
>>> 
>>> 
>>> 
>>> [1] https://issues.apache.org/jira/browse/COUCHDB-2978
>> 

Re: [Proposal] Change etag calculation

Posted by Garren Smith <ga...@apache.org>.
I think having something like X-Couch-Document-Rev is a good idea. We might
have to warn in the docs that _local docs that header value will never
change.

On Fri, Apr 8, 2016 at 3:05 PM, Alexander Shorin <kx...@gmail.com> wrote:

> +1 if we also add X-Couch-Document-Rev so something that reflects the
> _rev value in HTTP headers.
>
> Currently, while Etag matches the _rev value, it makes quite handy
> feature for getting the document revisions without fetching whole the
> body. To not break this feature, we need to introduce something in
> place of old Etag behaviour to keep the balance.
> --
> ,,,^..^,,,
>
>
> On Fri, Apr 8, 2016 at 3:43 PM, Garren Smith <ga...@apache.org> wrote:
> > Hi All,
> >
> > I would like to propose a change to how we generate the ETAG for a
> document
> > response.
> >
> > While working on COUCHDB-2978[1] we started looking at how etags were
> > generated when a document is requested. Currently the etag for a document
> > is the _rev. This causes a problem with _local documents as their
> revision
> > never changes also with documents where the user chooses the _rev. Both
> of
> > these can cause some caching issues.
> >
> > I would like to propose that the etag is generated from the body of the
> > response (typically the document), _rev and attachments. So for normal
> > documents the etag is md5(_rev, body, attachment) and for _local
> documents
> > it would be md5(_rev, body). This would make it a lot more consistent and
> > the etag would change when a document has changed.
> >
> > Cheers
> > Garren
> >
> >
> >
> >
> >
> > [1] https://issues.apache.org/jira/browse/COUCHDB-2978
>

Re: [Proposal] Change etag calculation

Posted by Alexander Shorin <kx...@gmail.com>.
+1 if we also add X-Couch-Document-Rev so something that reflects the
_rev value in HTTP headers.

Currently, while Etag matches the _rev value, it makes quite handy
feature for getting the document revisions without fetching whole the
body. To not break this feature, we need to introduce something in
place of old Etag behaviour to keep the balance.
--
,,,^..^,,,


On Fri, Apr 8, 2016 at 3:43 PM, Garren Smith <ga...@apache.org> wrote:
> Hi All,
>
> I would like to propose a change to how we generate the ETAG for a document
> response.
>
> While working on COUCHDB-2978[1] we started looking at how etags were
> generated when a document is requested. Currently the etag for a document
> is the _rev. This causes a problem with _local documents as their revision
> never changes also with documents where the user chooses the _rev. Both of
> these can cause some caching issues.
>
> I would like to propose that the etag is generated from the body of the
> response (typically the document), _rev and attachments. So for normal
> documents the etag is md5(_rev, body, attachment) and for _local documents
> it would be md5(_rev, body). This would make it a lot more consistent and
> the etag would change when a document has changed.
>
> Cheers
> Garren
>
>
>
>
>
> [1] https://issues.apache.org/jira/browse/COUCHDB-2978