You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Randall Leeds <ra...@gmail.com> on 2010/11/13 21:07:08 UTC
Re: Couch and Varnish

Moving over to dev@

On Sat, Nov 13, 2010 at 10:45, Robert Newson <ro...@gmail.com> wrote:
> "In any case, when we're in "every 1 request to cache means 1 request
> to database" situation, "caching" is truly pointless.
>
> Not true. Consider attachments or view query results, checking that
> the cached result is still fresh is faster than redoing the work (or
> copying the attachment again). It's only (almost) pointless when
> fetching documents themselves.
>
> What improvement could be made here? It seems wrong to return a cached
> copy of a document without checking that it is fresh, and my read of
> 2616 says we mustn't.
>

I think it shouldn't be too hard for us to check the ETag without
reading the document body. The ETag for documents is just the rev hash
right? With the btree caching code Filipe committed this would stay
entirely out of the way of file operations for hot documents.

I opened ticket at https://issues.apache.org/jira/browse/COUCHDB-941

It'll mostly be a matter of figuring out where we've broken things
once we stop setting method=GET on the #httpd record
(couch_httpd.erl). I suppose I should add some test cases that verify
HEAD requests are working and then play whack-a-mole.

I'll read some code right now and see if there's anything deeper that
needs to change, but I suspect not.

Karel: if you have any desire to write some js tests that exercise
HEAD requests for all applicable API calls that'd help a lot. Just
attach them to that ticket.

-Randall


> Sent from my iPad
>
> On 13 Nov 2010, at 16:36, "Karel Minařík" <ka...@gmail.com> wrote:
>
>> Hi,
>>
>> I am ashamed to reply so late, sorry, I got lost in other stuff on Monday. I'll combine my replies:
>>
>> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>>>>>>> Of course, you'd be stuck with manually tracking the types of URLs to
>>>>>>> purged, so I haven't been too eager to try it out yet...
>>
>> Yes, that's precisely what I'd like to avoid. It's not _that_ hard of course, and Couch provides awesome entry point for the invalidation in _changes or update_notifier, but still...
>>
>> On 9.Nov, 2010, at 24:42 , Robert Newson wrote:
>>> I think it's clear that caching via ETag for documents is close to
>>> pointless (the work to find the doc in the b+tree is over 90% of the
>>> work and has to be done for GET or HEAD).
>>
>> Yes. I wonder if there's any room for improvement on Couch's part. In any case, when we're in "every 1 request to cache means 1 request to database" situation, "caching" is truly pointless.
>>
>> On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton <za...@gmail.com> wrote:
>>>> That makes sense: if every request to the caching proxy checks the
>>>> etag against CouchDB via a HEAD request—and CouchDB currently does
>>>> just as much work for a HEAD as it would for a GET—you're not going to
>>>> see an improvement.
>>
>> Yes. But that's not the only scenario imaginable. I'd repeat what I wrote to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004993.html]:
>> 1. The cache can "accumulate" requests to a certain resource for a certain (configurable?) period of time (1 second, 1 minute, ...) and ask the backend less often -- accelerating througput.
>> 2. The cache can return "possibly stale" content immediately and check with the backend afterwards (on the background, when n-th next request comes, ...) -- accelerating response time.
>> It was my impression, that at least the first option is doable with Varnish (via some playing with the grace period), but I may be severely mistaken.
>>
>> On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <ra...@gmail.com> wrote:
>>>>> If you have a custom caching policy whereby
>>>>> the proxy will only check the ETag against the authority (Couch) once
>>>>> per (hour, day, whatever) then you'll get a speedup. But if your proxy
>>>>> performs a HEAD request for every incoming request you will not see
>>>>> much performance gain.
>>
>> P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb" and check with the backend based on some configs like this, we could use it for proper self-invalidating caching. (As opposed to TTL-based caching, which bring the manual expiration issues discussed above.) Unfortunately, at least based on the answers I got, this just not seems to be possible.
>>
>> On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote
>>>>>> It'd be nice if the "Couch is HTTP and can leverage existing caches and tools"
>>>>>> talking point truly included significant gains from etag caching.
>>
>> P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and embarrassing issue of this discussion. The O'Reilly book has it all over the place: http://www.google.com/search?q=varnish+OR+squid+site:http://guide.couchdb.org.  Whenever you tell someone who really knows about HTTP caches "Dude, Couch is HTTP and can leverage existing caches and tools" you can and will be laughed at -- you can get away with mentioning expiration based caching and "simple" invalidation via _changes and such, but... Embarrassing still.
>>
>> I'll try to do more research in this area, when time permits. I don't believe there's _not_ some arcane Varnish config option to squeeze some performance eg. in the "highly concurrent requests" scenario.
>>
>> Thanks for all the replies!,
>>
>> Karel
>>
>