You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Karel Minařík <ka...@karmi.cz> on 2010/11/07 11:28:49 UTC

Couch and Varnish

Hello,

I'd like to ask if anyone has some experience to share regarding  
accelerating Couch with Varnish. I think lots of us are doing it, but  
can't find too much info around.

Originally, I thought it would be possible to use ETags with some  
proper Varnish configuration (eg. "accumulate" concurrent requests and  
pass only one to the backend, etc), but that seems not to be possible,  
since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html 
].

As I understand it now, the only way how to cache Couch's response  
would be with time-based caching, and either using the cached response  
until it auto-expires, or expire the cached response via PURGE commands.

Of course, it would be possible and technically trivial to send purge  
requests via the _changes feed or via the "update_notification"  
mechanism. As I see it, the tricky part would be to know which objects  
to purge, based on individual document changes. Because not only  
single documents, but also aggregated view results or fulltext queries  
would get cached. Of course, "there are two hard thing in computer  
science ...".

Has anyone put any thoughts/work into this?

Thanks,

Karel


Re: Couch and Varnish

Posted by Randall Leeds <ra...@gmail.com>.
Moving over to dev@

On Sat, Nov 13, 2010 at 10:45, Robert Newson <ro...@gmail.com> wrote:
> "In any case, when we're in "every 1 request to cache means 1 request
> to database" situation, "caching" is truly pointless.
>
> Not true. Consider attachments or view query results, checking that
> the cached result is still fresh is faster than redoing the work (or
> copying the attachment again). It's only (almost) pointless when
> fetching documents themselves.
>
> What improvement could be made here? It seems wrong to return a cached
> copy of a document without checking that it is fresh, and my read of
> 2616 says we mustn't.
>

I think it shouldn't be too hard for us to check the ETag without
reading the document body. The ETag for documents is just the rev hash
right? With the btree caching code Filipe committed this would stay
entirely out of the way of file operations for hot documents.

I opened ticket at https://issues.apache.org/jira/browse/COUCHDB-941

It'll mostly be a matter of figuring out where we've broken things
once we stop setting method=GET on the #httpd record
(couch_httpd.erl). I suppose I should add some test cases that verify
HEAD requests are working and then play whack-a-mole.

I'll read some code right now and see if there's anything deeper that
needs to change, but I suspect not.

Karel: if you have any desire to write some js tests that exercise
HEAD requests for all applicable API calls that'd help a lot. Just
attach them to that ticket.

-Randall


> Sent from my iPad
>
> On 13 Nov 2010, at 16:36, "Karel Minařík" <ka...@gmail.com> wrote:
>
>> Hi,
>>
>> I am ashamed to reply so late, sorry, I got lost in other stuff on Monday. I'll combine my replies:
>>
>> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>>>>>>> Of course, you'd be stuck with manually tracking the types of URLs to
>>>>>>> purged, so I haven't been too eager to try it out yet...
>>
>> Yes, that's precisely what I'd like to avoid. It's not _that_ hard of course, and Couch provides awesome entry point for the invalidation in _changes or update_notifier, but still...
>>
>> On 9.Nov, 2010, at 24:42 , Robert Newson wrote:
>>> I think it's clear that caching via ETag for documents is close to
>>> pointless (the work to find the doc in the b+tree is over 90% of the
>>> work and has to be done for GET or HEAD).
>>
>> Yes. I wonder if there's any room for improvement on Couch's part. In any case, when we're in "every 1 request to cache means 1 request to database" situation, "caching" is truly pointless.
>>
>> On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton <za...@gmail.com> wrote:
>>>> That makes sense: if every request to the caching proxy checks the
>>>> etag against CouchDB via a HEAD request—and CouchDB currently does
>>>> just as much work for a HEAD as it would for a GET—you're not going to
>>>> see an improvement.
>>
>> Yes. But that's not the only scenario imaginable. I'd repeat what I wrote to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004993.html]:
>> 1. The cache can "accumulate" requests to a certain resource for a certain (configurable?) period of time (1 second, 1 minute, ...) and ask the backend less often -- accelerating througput.
>> 2. The cache can return "possibly stale" content immediately and check with the backend afterwards (on the background, when n-th next request comes, ...) -- accelerating response time.
>> It was my impression, that at least the first option is doable with Varnish (via some playing with the grace period), but I may be severely mistaken.
>>
>> On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <ra...@gmail.com> wrote:
>>>>> If you have a custom caching policy whereby
>>>>> the proxy will only check the ETag against the authority (Couch) once
>>>>> per (hour, day, whatever) then you'll get a speedup. But if your proxy
>>>>> performs a HEAD request for every incoming request you will not see
>>>>> much performance gain.
>>
>> P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb" and check with the backend based on some configs like this, we could use it for proper self-invalidating caching. (As opposed to TTL-based caching, which bring the manual expiration issues discussed above.) Unfortunately, at least based on the answers I got, this just not seems to be possible.
>>
>> On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote
>>>>>> It'd be nice if the "Couch is HTTP and can leverage existing caches and tools"
>>>>>> talking point truly included significant gains from etag caching.
>>
>> P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and embarrassing issue of this discussion. The O'Reilly book has it all over the place: http://www.google.com/search?q=varnish+OR+squid+site:http://guide.couchdb.org.  Whenever you tell someone who really knows about HTTP caches "Dude, Couch is HTTP and can leverage existing caches and tools" you can and will be laughed at -- you can get away with mentioning expiration based caching and "simple" invalidation via _changes and such, but... Embarrassing still.
>>
>> I'll try to do more research in this area, when time permits. I don't believe there's _not_ some arcane Varnish config option to squeeze some performance eg. in the "highly concurrent requests" scenario.
>>
>> Thanks for all the replies!,
>>
>> Karel
>>
>

Re: Couch and Varnish

Posted by Randall Leeds <ra...@gmail.com>.
Moving over to dev@

On Sat, Nov 13, 2010 at 10:45, Robert Newson <ro...@gmail.com> wrote:
> "In any case, when we're in "every 1 request to cache means 1 request
> to database" situation, "caching" is truly pointless.
>
> Not true. Consider attachments or view query results, checking that
> the cached result is still fresh is faster than redoing the work (or
> copying the attachment again). It's only (almost) pointless when
> fetching documents themselves.
>
> What improvement could be made here? It seems wrong to return a cached
> copy of a document without checking that it is fresh, and my read of
> 2616 says we mustn't.
>

I think it shouldn't be too hard for us to check the ETag without
reading the document body. The ETag for documents is just the rev hash
right? With the btree caching code Filipe committed this would stay
entirely out of the way of file operations for hot documents.

I opened ticket at https://issues.apache.org/jira/browse/COUCHDB-941

It'll mostly be a matter of figuring out where we've broken things
once we stop setting method=GET on the #httpd record
(couch_httpd.erl). I suppose I should add some test cases that verify
HEAD requests are working and then play whack-a-mole.

I'll read some code right now and see if there's anything deeper that
needs to change, but I suspect not.

Karel: if you have any desire to write some js tests that exercise
HEAD requests for all applicable API calls that'd help a lot. Just
attach them to that ticket.

-Randall


> Sent from my iPad
>
> On 13 Nov 2010, at 16:36, "Karel Minařík" <ka...@gmail.com> wrote:
>
>> Hi,
>>
>> I am ashamed to reply so late, sorry, I got lost in other stuff on Monday. I'll combine my replies:
>>
>> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>>>>>>> Of course, you'd be stuck with manually tracking the types of URLs to
>>>>>>> purged, so I haven't been too eager to try it out yet...
>>
>> Yes, that's precisely what I'd like to avoid. It's not _that_ hard of course, and Couch provides awesome entry point for the invalidation in _changes or update_notifier, but still...
>>
>> On 9.Nov, 2010, at 24:42 , Robert Newson wrote:
>>> I think it's clear that caching via ETag for documents is close to
>>> pointless (the work to find the doc in the b+tree is over 90% of the
>>> work and has to be done for GET or HEAD).
>>
>> Yes. I wonder if there's any room for improvement on Couch's part. In any case, when we're in "every 1 request to cache means 1 request to database" situation, "caching" is truly pointless.
>>
>> On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton <za...@gmail.com> wrote:
>>>> That makes sense: if every request to the caching proxy checks the
>>>> etag against CouchDB via a HEAD request—and CouchDB currently does
>>>> just as much work for a HEAD as it would for a GET—you're not going to
>>>> see an improvement.
>>
>> Yes. But that's not the only scenario imaginable. I'd repeat what I wrote to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004993.html]:
>> 1. The cache can "accumulate" requests to a certain resource for a certain (configurable?) period of time (1 second, 1 minute, ...) and ask the backend less often -- accelerating througput.
>> 2. The cache can return "possibly stale" content immediately and check with the backend afterwards (on the background, when n-th next request comes, ...) -- accelerating response time.
>> It was my impression, that at least the first option is doable with Varnish (via some playing with the grace period), but I may be severely mistaken.
>>
>> On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <ra...@gmail.com> wrote:
>>>>> If you have a custom caching policy whereby
>>>>> the proxy will only check the ETag against the authority (Couch) once
>>>>> per (hour, day, whatever) then you'll get a speedup. But if your proxy
>>>>> performs a HEAD request for every incoming request you will not see
>>>>> much performance gain.
>>
>> P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb" and check with the backend based on some configs like this, we could use it for proper self-invalidating caching. (As opposed to TTL-based caching, which bring the manual expiration issues discussed above.) Unfortunately, at least based on the answers I got, this just not seems to be possible.
>>
>> On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote
>>>>>> It'd be nice if the "Couch is HTTP and can leverage existing caches and tools"
>>>>>> talking point truly included significant gains from etag caching.
>>
>> P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and embarrassing issue of this discussion. The O'Reilly book has it all over the place: http://www.google.com/search?q=varnish+OR+squid+site:http://guide.couchdb.org.  Whenever you tell someone who really knows about HTTP caches "Dude, Couch is HTTP and can leverage existing caches and tools" you can and will be laughed at -- you can get away with mentioning expiration based caching and "simple" invalidation via _changes and such, but... Embarrassing still.
>>
>> I'll try to do more research in this area, when time permits. I don't believe there's _not_ some arcane Varnish config option to squeeze some performance eg. in the "highly concurrent requests" scenario.
>>
>> Thanks for all the replies!,
>>
>> Karel
>>
>

Re: Couch and Varnish

Posted by Robert Newson <ro...@gmail.com>.
"In any case, when we're in "every 1 request to cache means 1 request
to database" situation, "caching" is truly pointless.

Not true. Consider attachments or view query results, checking that
the cached result is still fresh is faster than redoing the work (or
copying the attachment again). It's only (almost) pointless when
fetching documents themselves.

What improvement could be made here? It seems wrong to return a cached
copy of a document without checking that it is fresh, and my read of
2616 says we mustn't.

Sent from my iPad

On 13 Nov 2010, at 16:36, "Karel Minařík" <ka...@gmail.com> wrote:

> Hi,
>
> I am ashamed to reply so late, sorry, I got lost in other stuff on Monday. I'll combine my replies:
>
> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>>>>>> Of course, you'd be stuck with manually tracking the types of URLs to
>>>>>> purged, so I haven't been too eager to try it out yet...
>
> Yes, that's precisely what I'd like to avoid. It's not _that_ hard of course, and Couch provides awesome entry point for the invalidation in _changes or update_notifier, but still...
>
> On 9.Nov, 2010, at 24:42 , Robert Newson wrote:
>> I think it's clear that caching via ETag for documents is close to
>> pointless (the work to find the doc in the b+tree is over 90% of the
>> work and has to be done for GET or HEAD).
>
> Yes. I wonder if there's any room for improvement on Couch's part. In any case, when we're in "every 1 request to cache means 1 request to database" situation, "caching" is truly pointless.
>
> On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton <za...@gmail.com> wrote:
>>> That makes sense: if every request to the caching proxy checks the
>>> etag against CouchDB via a HEAD request—and CouchDB currently does
>>> just as much work for a HEAD as it would for a GET—you're not going to
>>> see an improvement.
>
> Yes. But that's not the only scenario imaginable. I'd repeat what I wrote to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004993.html]:
> 1. The cache can "accumulate" requests to a certain resource for a certain (configurable?) period of time (1 second, 1 minute, ...) and ask the backend less often -- accelerating througput.
> 2. The cache can return "possibly stale" content immediately and check with the backend afterwards (on the background, when n-th next request comes, ...) -- accelerating response time.
> It was my impression, that at least the first option is doable with Varnish (via some playing with the grace period), but I may be severely mistaken.
>
> On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <ra...@gmail.com> wrote:
>>>> If you have a custom caching policy whereby
>>>> the proxy will only check the ETag against the authority (Couch) once
>>>> per (hour, day, whatever) then you'll get a speedup. But if your proxy
>>>> performs a HEAD request for every incoming request you will not see
>>>> much performance gain.
>
> P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb" and check with the backend based on some configs like this, we could use it for proper self-invalidating caching. (As opposed to TTL-based caching, which bring the manual expiration issues discussed above.) Unfortunately, at least based on the answers I got, this just not seems to be possible.
>
> On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote
>>>>> It'd be nice if the "Couch is HTTP and can leverage existing caches and tools"
>>>>> talking point truly included significant gains from etag caching.
>
> P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and embarrassing issue of this discussion. The O'Reilly book has it all over the place: http://www.google.com/search?q=varnish+OR+squid+site:http://guide.couchdb.org.  Whenever you tell someone who really knows about HTTP caches "Dude, Couch is HTTP and can leverage existing caches and tools" you can and will be laughed at -- you can get away with mentioning expiration based caching and "simple" invalidation via _changes and such, but... Embarrassing still.
>
> I'll try to do more research in this area, when time permits. I don't believe there's _not_ some arcane Varnish config option to squeeze some performance eg. in the "highly concurrent requests" scenario.
>
> Thanks for all the replies!,
>
> Karel
>

Re: Couch and Varnish

Posted by Karel Minařík <ka...@gmail.com>.
Hi,

I am ashamed to reply so late, sorry, I got lost in other stuff on  
Monday. I'll combine my replies:

On Mon, Nov 8, 2010 at 08:17, Zachary Zolton  
<za...@gmail.com> wrote:
>>>>> Of course, you'd be stuck with manually tracking the types of  
>>>>> URLs to
>>>>> purged, so I haven't been too eager to try it out yet...

Yes, that's precisely what I'd like to avoid. It's not _that_ hard of  
course, and Couch provides awesome entry point for the invalidation in  
_changes or update_notifier, but still...

On 9.Nov, 2010, at 24:42 , Robert Newson wrote:
> I think it's clear that caching via ETag for documents is close to
> pointless (the work to find the doc in the b+tree is over 90% of the
> work and has to be done for GET or HEAD).

Yes. I wonder if there's any room for improvement on Couch's part. In  
any case, when we're in "every 1 request to cache means 1 request to  
database" situation, "caching" is truly pointless.

On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton <zachary.zolton@gmail.com 
 > wrote:
>> That makes sense: if every request to the caching proxy checks the
>> etag against CouchDB via a HEAD request—and CouchDB currently does
>> just as much work for a HEAD as it would for a GET—you're not going  
>> to
>> see an improvement.

Yes. But that's not the only scenario imaginable. I'd repeat what I  
wrote to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004993.html 
]:
1. The cache can "accumulate" requests to a certain resource for a  
certain (configurable?) period of time (1 second, 1 minute, ...) and  
ask the backend less often -- accelerating througput.
2. The cache can return "possibly stale" content immediately and check  
with the backend afterwards (on the background, when n-th next request  
comes, ...) -- accelerating response time.
It was my impression, that at least the first option is doable with  
Varnish (via some playing with the grace period), but I may be  
severely mistaken.

On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds  
<ra...@gmail.com> wrote:
>>> If you have a custom caching policy whereby
>>> the proxy will only check the ETag against the authority (Couch)  
>>> once
>>> per (hour, day, whatever) then you'll get a speedup. But if your  
>>> proxy
>>> performs a HEAD request for every incoming request you will not see
>>> much performance gain.

P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb"  
and check with the backend based on some configs like this, we could  
use it for proper self-invalidating caching. (As opposed to TTL-based  
caching, which bring the manual expiration issues discussed above.)  
Unfortunately, at least based on the answers I got, this just not  
seems to be possible.

On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com>  
wrote
>>>> It'd be nice if the "Couch is HTTP and can leverage existing  
>>>> caches and tools"
>>>> talking point truly included significant gains from etag caching.

P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and  
embarrassing issue of this discussion. The O'Reilly book has it all  
over the place: http://www.google.com/search?q=varnish+OR+squid+site:http://guide.couchdb.org 
.  Whenever you tell someone who really knows about HTTP caches "Dude,  
Couch is HTTP and can leverage existing caches and tools" you can and  
will be laughed at -- you can get away with mentioning expiration  
based caching and "simple" invalidation via _changes and such, but...  
Embarrassing still.

I'll try to do more research in this area, when time permits. I don't  
believe there's _not_ some arcane Varnish config option to squeeze  
some performance eg. in the "highly concurrent requests" scenario.

Thanks for all the replies!,

Karel


Re: Couch and Varnish

Posted by Robert Newson <ro...@gmail.com>.
I think it's clear that caching via ETag for documents is close to
pointless (the work to find the doc in the b+tree is over 90% of the
work and has to be done for GET or HEAD).

Where there should be a boost is caching of attachments, since couch
doesn't have to fetch one byte of the actual binary in the case of an
ETag match, and caching of view query results, as couch knows the view
on disk hasn't changed, and so can return a 304 rather than recompute
the result (which is where stale=ok becomes your friend).

B.

On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton
<za...@gmail.com> wrote:
> That makes sense: if every request to the caching proxy checks the
> etag against CouchDB via a HEAD request—and CouchDB currently does
> just as much work for a HEAD as it would for a GET—you're not going to
> see an improvement.
>
> On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <ra...@gmail.com> wrote:
>> I should be more clear. If you have a custom caching policy whereby
>> the proxy will only check the ETag against the authority (Couch) once
>> per (hour, day, whatever) then you'll get a speedup. But if your proxy
>> performs a HEAD request for every incoming request you will not see
>> much performance gain.
>>
>> On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote:
>>> As I mentioned on another thread, etags only save you bandwidth as
>>> right now Couch performs the GET request and then discards the body.
>>> I'll open a JIRA ticket for this if it's not there already. It'd be
>>> nice if the "Couch is HTTP and can leverage existing caches and tools"
>>> talking point truly included significant gains from etag caching.
>>>
>>> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>>>> Drat! If only Varnish supported Etags...
>>>>
>>>> If you don't wanna use time-based expiry, you could probably craft a
>>>> custom-built solution where you watch the _changes feed and explicitly
>>>> purge URLs using a tool such as Thinner:
>>>>
>>>> http://propublica.github.com/thinner/
>>>>
>>>> Of course, you'd be stuck with manually tracking the types of URLs to
>>>> purged, so I haven't been too eager to try it out yet...
>>>>
>>>> —Zach
>>>>
>>>> On Sun, Nov 7, 2010 at 1:22 PM, Adam Kocoloski <ko...@apache.org> wrote:
>>>>> Hi Karel, the last time I looked into this I came to the same conclusions as you have here.  Regards,
>>>>>
>>>>> Adam
>>>>>
>>>>> On Nov 7, 2010, at 5:28 AM, Karel Minařík wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'd like to ask if anyone has some experience to share regarding accelerating Couch with Varnish. I think lots of us are doing it, but can't find too much info around.
>>>>>>
>>>>>> Originally, I thought it would be possible to use ETags with some proper Varnish configuration (eg. "accumulate" concurrent requests and pass only one to the backend, etc), but that seems not to be possible, since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html].
>>>>>>
>>>>>> As I understand it now, the only way how to cache Couch's response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands.
>>>>>>
>>>>>> Of course, it would be possible and technically trivial to send purge requests via the _changes feed or via the "update_notification" mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached. Of course, "there are two hard thing in computer science ...".
>>>>>>
>>>>>> Has anyone put any thoughts/work into this?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Karel
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Couch and Varnish

Posted by Zachary Zolton <za...@gmail.com>.
That makes sense: if every request to the caching proxy checks the
etag against CouchDB via a HEAD request—and CouchDB currently does
just as much work for a HEAD as it would for a GET—you're not going to
see an improvement.

On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds <ra...@gmail.com> wrote:
> I should be more clear. If you have a custom caching policy whereby
> the proxy will only check the ETag against the authority (Couch) once
> per (hour, day, whatever) then you'll get a speedup. But if your proxy
> performs a HEAD request for every incoming request you will not see
> much performance gain.
>
> On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote:
>> As I mentioned on another thread, etags only save you bandwidth as
>> right now Couch performs the GET request and then discards the body.
>> I'll open a JIRA ticket for this if it's not there already. It'd be
>> nice if the "Couch is HTTP and can leverage existing caches and tools"
>> talking point truly included significant gains from etag caching.
>>
>> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>>> Drat! If only Varnish supported Etags...
>>>
>>> If you don't wanna use time-based expiry, you could probably craft a
>>> custom-built solution where you watch the _changes feed and explicitly
>>> purge URLs using a tool such as Thinner:
>>>
>>> http://propublica.github.com/thinner/
>>>
>>> Of course, you'd be stuck with manually tracking the types of URLs to
>>> purged, so I haven't been too eager to try it out yet...
>>>
>>> —Zach
>>>
>>> On Sun, Nov 7, 2010 at 1:22 PM, Adam Kocoloski <ko...@apache.org> wrote:
>>>> Hi Karel, the last time I looked into this I came to the same conclusions as you have here.  Regards,
>>>>
>>>> Adam
>>>>
>>>> On Nov 7, 2010, at 5:28 AM, Karel Minařík wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'd like to ask if anyone has some experience to share regarding accelerating Couch with Varnish. I think lots of us are doing it, but can't find too much info around.
>>>>>
>>>>> Originally, I thought it would be possible to use ETags with some proper Varnish configuration (eg. "accumulate" concurrent requests and pass only one to the backend, etc), but that seems not to be possible, since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html].
>>>>>
>>>>> As I understand it now, the only way how to cache Couch's response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands.
>>>>>
>>>>> Of course, it would be possible and technically trivial to send purge requests via the _changes feed or via the "update_notification" mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached. Of course, "there are two hard thing in computer science ...".
>>>>>
>>>>> Has anyone put any thoughts/work into this?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Karel
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Couch and Varnish

Posted by Randall Leeds <ra...@gmail.com>.
I should be more clear. If you have a custom caching policy whereby
the proxy will only check the ETag against the authority (Couch) once
per (hour, day, whatever) then you'll get a speedup. But if your proxy
performs a HEAD request for every incoming request you will not see
much performance gain.

On Mon, Nov 8, 2010 at 12:06, Randall Leeds <ra...@gmail.com> wrote:
> As I mentioned on another thread, etags only save you bandwidth as
> right now Couch performs the GET request and then discards the body.
> I'll open a JIRA ticket for this if it's not there already. It'd be
> nice if the "Couch is HTTP and can leverage existing caches and tools"
> talking point truly included significant gains from etag caching.
>
> On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
>> Drat! If only Varnish supported Etags...
>>
>> If you don't wanna use time-based expiry, you could probably craft a
>> custom-built solution where you watch the _changes feed and explicitly
>> purge URLs using a tool such as Thinner:
>>
>> http://propublica.github.com/thinner/
>>
>> Of course, you'd be stuck with manually tracking the types of URLs to
>> purged, so I haven't been too eager to try it out yet...
>>
>> —Zach
>>
>> On Sun, Nov 7, 2010 at 1:22 PM, Adam Kocoloski <ko...@apache.org> wrote:
>>> Hi Karel, the last time I looked into this I came to the same conclusions as you have here.  Regards,
>>>
>>> Adam
>>>
>>> On Nov 7, 2010, at 5:28 AM, Karel Minařík wrote:
>>>
>>>> Hello,
>>>>
>>>> I'd like to ask if anyone has some experience to share regarding accelerating Couch with Varnish. I think lots of us are doing it, but can't find too much info around.
>>>>
>>>> Originally, I thought it would be possible to use ETags with some proper Varnish configuration (eg. "accumulate" concurrent requests and pass only one to the backend, etc), but that seems not to be possible, since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html].
>>>>
>>>> As I understand it now, the only way how to cache Couch's response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands.
>>>>
>>>> Of course, it would be possible and technically trivial to send purge requests via the _changes feed or via the "update_notification" mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached. Of course, "there are two hard thing in computer science ...".
>>>>
>>>> Has anyone put any thoughts/work into this?
>>>>
>>>> Thanks,
>>>>
>>>> Karel
>>>>
>>>
>>>
>>
>

Re: Couch and Varnish

Posted by Randall Leeds <ra...@gmail.com>.
As I mentioned on another thread, etags only save you bandwidth as
right now Couch performs the GET request and then discards the body.
I'll open a JIRA ticket for this if it's not there already. It'd be
nice if the "Couch is HTTP and can leverage existing caches and tools"
talking point truly included significant gains from etag caching.

On Mon, Nov 8, 2010 at 08:17, Zachary Zolton <za...@gmail.com> wrote:
> Drat! If only Varnish supported Etags...
>
> If you don't wanna use time-based expiry, you could probably craft a
> custom-built solution where you watch the _changes feed and explicitly
> purge URLs using a tool such as Thinner:
>
> http://propublica.github.com/thinner/
>
> Of course, you'd be stuck with manually tracking the types of URLs to
> purged, so I haven't been too eager to try it out yet...
>
> —Zach
>
> On Sun, Nov 7, 2010 at 1:22 PM, Adam Kocoloski <ko...@apache.org> wrote:
>> Hi Karel, the last time I looked into this I came to the same conclusions as you have here.  Regards,
>>
>> Adam
>>
>> On Nov 7, 2010, at 5:28 AM, Karel Minařík wrote:
>>
>>> Hello,
>>>
>>> I'd like to ask if anyone has some experience to share regarding accelerating Couch with Varnish. I think lots of us are doing it, but can't find too much info around.
>>>
>>> Originally, I thought it would be possible to use ETags with some proper Varnish configuration (eg. "accumulate" concurrent requests and pass only one to the backend, etc), but that seems not to be possible, since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html].
>>>
>>> As I understand it now, the only way how to cache Couch's response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands.
>>>
>>> Of course, it would be possible and technically trivial to send purge requests via the _changes feed or via the "update_notification" mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached. Of course, "there are two hard thing in computer science ...".
>>>
>>> Has anyone put any thoughts/work into this?
>>>
>>> Thanks,
>>>
>>> Karel
>>>
>>
>>
>

Re: Couch and Varnish

Posted by Zachary Zolton <za...@gmail.com>.
Drat! If only Varnish supported Etags...

If you don't wanna use time-based expiry, you could probably craft a
custom-built solution where you watch the _changes feed and explicitly
purge URLs using a tool such as Thinner:

http://propublica.github.com/thinner/

Of course, you'd be stuck with manually tracking the types of URLs to
purged, so I haven't been too eager to try it out yet...

—Zach

On Sun, Nov 7, 2010 at 1:22 PM, Adam Kocoloski <ko...@apache.org> wrote:
> Hi Karel, the last time I looked into this I came to the same conclusions as you have here.  Regards,
>
> Adam
>
> On Nov 7, 2010, at 5:28 AM, Karel Minařík wrote:
>
>> Hello,
>>
>> I'd like to ask if anyone has some experience to share regarding accelerating Couch with Varnish. I think lots of us are doing it, but can't find too much info around.
>>
>> Originally, I thought it would be possible to use ETags with some proper Varnish configuration (eg. "accumulate" concurrent requests and pass only one to the backend, etc), but that seems not to be possible, since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html].
>>
>> As I understand it now, the only way how to cache Couch's response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands.
>>
>> Of course, it would be possible and technically trivial to send purge requests via the _changes feed or via the "update_notification" mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached. Of course, "there are two hard thing in computer science ...".
>>
>> Has anyone put any thoughts/work into this?
>>
>> Thanks,
>>
>> Karel
>>
>
>

Re: Couch and Varnish

Posted by Adam Kocoloski <ko...@apache.org>.
Hi Karel, the last time I looked into this I came to the same conclusions as you have here.  Regards,

Adam

On Nov 7, 2010, at 5:28 AM, Karel Minařík wrote:

> Hello,
> 
> I'd like to ask if anyone has some experience to share regarding accelerating Couch with Varnish. I think lots of us are doing it, but can't find too much info around.
> 
> Originally, I thought it would be possible to use ETags with some proper Varnish configuration (eg. "accumulate" concurrent requests and pass only one to the backend, etc), but that seems not to be possible, since Varnish does not pass ETags to the backend [http://lists.varnish-cache.org/pipermail/varnish-misc/2010-November/004997.html].
> 
> As I understand it now, the only way how to cache Couch's response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands.
> 
> Of course, it would be possible and technically trivial to send purge requests via the _changes feed or via the "update_notification" mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached. Of course, "there are two hard thing in computer science ...".
> 
> Has anyone put any thoughts/work into this?
> 
> Thanks,
> 
> Karel
>