You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Leandro Nunes <a-...@hotels.com> on 2017/05/16 11:42:51 UTC

HttpCacheStorage#getEntry called twice on cache hit

I just noticed the caching implementation is calling getEntry method twice when the entry is cached. I did some debugging and it looks like the first one is related with flushing invalid entries and the second call is related with actually serving the request. My question is wether this is the expected behaviour or is something that has been overlooked and can/should be addressed?

Thanks,
Leandro
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: HttpCacheStorage#getEntry called twice on cache hit

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2017-05-18 at 15:32 +0000, Leandro Nunes wrote:
> Hi,
> 
> I created this (https://github.com/apache/httpcomponents-client/pull/
> 77) pull request with a simple possible solution to this problem. It
> would be awesome if you could please take a look and validate whether
> the proposed fix valid or not. I’m more than happy to change whatever
> you think is necessary and reapply the fix on other branches after
> that.
> 
> Thanks,
> Leandro
> 

Jon,

Do you think you could get around to taking look at the pull request?
I'll do all the necessary leg work merging PRs into the release
branches if you confirm the changes are valid

Cheers

Oleg  


> On May 17, 2017, at 11:59 AM, Leandro Nunes <a-lnunes@hotels.com<mail
> to:a-lnunes@hotels.com>> wrote:
> 
> Thanks Oleg and Jon,
> 
> Oleg,
> I didn’t see a problem with the implementation of the RFC. The result
> I get back is the one I was expecting (regarding actually calling the
> origin for the entity or returning the cached instance - without
> making the request to the origin). The problem has to do with calling
> the datastore twice (which can pose as a problem specially when using
> an off-heap datastore).
> 
> Jon,
> I created a gist where you can see the behaviour I’m talking about (h
> ttps://gist.github.com/leandronunes85/c8f68ca556ac4121bdd956b2ae29c19
> f). In this example I’m using a facebook static resource with “Cache-
> Control: public,max-age=31536000,immutable”. In my exact use case I
> get back this header: "Cache-Control: max-age=600, public”. In both
> cases the behaviour is the same. The output from running the
> application in the gist is:
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> op=putEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> CACHE_MISS - HTTP/1.1 200 OK - Cache-Control: public,max-
> age=31536000,immutable
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> CACHE_HIT - HTTP/1.1 200 OK - Cache-Control: public,max-
> age=31536000,immutable
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iH
> xsw8zm.png'
> CACHE_HIT - HTTP/1.1 200 OK - Cache-Control: public,max-
> age=31536000,immutable
> 
> As you can see the datastore is being asked for the entity twice on
> cache hit (and 4 times (!) for cache misses).
> 
> Thanks for any help you can provide,
> Leandro
> 
> On May 17, 2017, at 12:39 AM, Jon Moore <jonm@apache.org<mailto:jonm@
> apache.org><ma...@apache.org>> wrote:
> 
> Yes, unfortunately I haven't had time to work on the caching module
> for
> several years, so I don't remember all the ins and outs of the
> implementation. However, Leandro, perhaps you could post the headers
> for
> the original request, the (cached) response, and then the subsequent
> request that gets a cache hit? If so, I can possibly help explain the
> behavior.
> 
> Jon
> 
> On Tue, May 16, 2017 at 2:23 PM, Oleg Kalnichevski <olegk@apache.org<
> mailto:olegk@apache.org><ma...@apache.org>> wrote:
> 
> On Tue, 2017-05-16 at 11:42 +0000, Leandro Nunes wrote:
> I just noticed the caching implementation is calling getEntry method
> twice when the entry is cached. I did some debugging and it looks
> like the first one is related with flushing invalid entries and the
> second call is related with actually serving the request. My question
> is wether this is the expected behaviour or is something that has
> been overlooked and can/should be addressed?
> 
> Thanks,
> Leandro
> 
> Leandro
> 
> 
> HttpClient cache module desperately needs some attention. The best
> course of action should be to consult the RFC, fix the behavior if
> wrong, raise a PR at Github.
> 
> Oleg
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org<ma
> ilto:httpclient-users-unsubscribe@hc.apache.org><mailto:httpclient-us
> ers-unsubscribe@hc.apache.org>
> For additional commands, e-mail: httpclient-users-help@hc.apache.org<
> mailto:httpclient-users-help@hc.apache.org><mailto:httpclient-users-h
> elp@hc.apache.org>
> 
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: HttpCacheStorage#getEntry called twice on cache hit

Posted by Leandro Nunes <a-...@hotels.com>.
Hi,

I created this (https://github.com/apache/httpcomponents-client/pull/77) pull request with a simple possible solution to this problem. It would be awesome if you could please take a look and validate whether the proposed fix valid or not. I’m more than happy to change whatever you think is necessary and reapply the fix on other branches after that.

Thanks,
Leandro

On May 17, 2017, at 11:59 AM, Leandro Nunes <a-...@hotels.com>> wrote:

Thanks Oleg and Jon,

Oleg,
I didn’t see a problem with the implementation of the RFC. The result I get back is the one I was expecting (regarding actually calling the origin for the entity or returning the cached instance - without making the request to the origin). The problem has to do with calling the datastore twice (which can pose as a problem specially when using an off-heap datastore).

Jon,
I created a gist where you can see the behaviour I’m talking about (https://gist.github.com/leandronunes85/c8f68ca556ac4121bdd956b2ae29c19f). In this example I’m using a facebook static resource with “Cache-Control: public,max-age=31536000,immutable”. In my exact use case I get back this header: "Cache-Control: max-age=600, public”. In both cases the behaviour is the same. The output from running the application in the gist is:
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=putEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
CACHE_MISS - HTTP/1.1 200 OK - Cache-Control: public,max-age=31536000,immutable
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
CACHE_HIT - HTTP/1.1 200 OK - Cache-Control: public,max-age=31536000,immutable
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
CACHE_HIT - HTTP/1.1 200 OK - Cache-Control: public,max-age=31536000,immutable

As you can see the datastore is being asked for the entity twice on cache hit (and 4 times (!) for cache misses).

Thanks for any help you can provide,
Leandro

On May 17, 2017, at 12:39 AM, Jon Moore <jo...@apache.org>> wrote:

Yes, unfortunately I haven't had time to work on the caching module for
several years, so I don't remember all the ins and outs of the
implementation. However, Leandro, perhaps you could post the headers for
the original request, the (cached) response, and then the subsequent
request that gets a cache hit? If so, I can possibly help explain the
behavior.

Jon

On Tue, May 16, 2017 at 2:23 PM, Oleg Kalnichevski <ol...@apache.org>> wrote:

On Tue, 2017-05-16 at 11:42 +0000, Leandro Nunes wrote:
I just noticed the caching implementation is calling getEntry method
twice when the entry is cached. I did some debugging and it looks
like the first one is related with flushing invalid entries and the
second call is related with actually serving the request. My question
is wether this is the expected behaviour or is something that has
been overlooked and can/should be addressed?

Thanks,
Leandro

Leandro


HttpClient cache module desperately needs some attention. The best
course of action should be to consult the RFC, fix the behavior if
wrong, raise a PR at Github.

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org<ma...@hc.apache.org>
For additional commands, e-mail: httpclient-users-help@hc.apache.org<ma...@hc.apache.org>





Re: HttpCacheStorage#getEntry called twice on cache hit

Posted by Leandro Nunes <a-...@hotels.com>.
Thanks Oleg and Jon,

Oleg,
I didn’t see a problem with the implementation of the RFC. The result I get back is the one I was expecting (regarding actually calling the origin for the entity or returning the cached instance - without making the request to the origin). The problem has to do with calling the datastore twice (which can pose as a problem specially when using an off-heap datastore).

Jon,
I created a gist where you can see the behaviour I’m talking about (https://gist.github.com/leandronunes85/c8f68ca556ac4121bdd956b2ae29c19f). In this example I’m using a facebook static resource with “Cache-Control: public,max-age=31536000,immutable”. In my exact use case I get back this header: "Cache-Control: max-age=600, public”. In both cases the behaviour is the same. The output from running the application in the gist is:
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=putEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
CACHE_MISS - HTTP/1.1 200 OK - Cache-Control: public,max-age=31536000,immutable
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
CACHE_HIT - HTTP/1.1 200 OK - Cache-Control: public,max-age=31536000,immutable
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
op=getEntry, key='https://www.facebook.com:443/rsrc.php/v3/y4/r/gf6iHxsw8zm.png'
CACHE_HIT - HTTP/1.1 200 OK - Cache-Control: public,max-age=31536000,immutable

As you can see the datastore is being asked for the entity twice on cache hit (and 4 times (!) for cache misses).

Thanks for any help you can provide,
Leandro

On May 17, 2017, at 12:39 AM, Jon Moore <jo...@apache.org>> wrote:

Yes, unfortunately I haven't had time to work on the caching module for
several years, so I don't remember all the ins and outs of the
implementation. However, Leandro, perhaps you could post the headers for
the original request, the (cached) response, and then the subsequent
request that gets a cache hit? If so, I can possibly help explain the
behavior.

Jon

On Tue, May 16, 2017 at 2:23 PM, Oleg Kalnichevski <ol...@apache.org>> wrote:

On Tue, 2017-05-16 at 11:42 +0000, Leandro Nunes wrote:
I just noticed the caching implementation is calling getEntry method
twice when the entry is cached. I did some debugging and it looks
like the first one is related with flushing invalid entries and the
second call is related with actually serving the request. My question
is wether this is the expected behaviour or is something that has
been overlooked and can/should be addressed?

Thanks,
Leandro

Leandro


HttpClient cache module desperately needs some attention. The best
course of action should be to consult the RFC, fix the behavior if
wrong, raise a PR at Github.

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org<ma...@hc.apache.org>
For additional commands, e-mail: httpclient-users-help@hc.apache.org<ma...@hc.apache.org>




Re: HttpCacheStorage#getEntry called twice on cache hit

Posted by Jon Moore <jo...@apache.org>.
Yes, unfortunately I haven't had time to work on the caching module for
several years, so I don't remember all the ins and outs of the
implementation. However, Leandro, perhaps you could post the headers for
the original request, the (cached) response, and then the subsequent
request that gets a cache hit? If so, I can possibly help explain the
behavior.

Jon

On Tue, May 16, 2017 at 2:23 PM, Oleg Kalnichevski <ol...@apache.org> wrote:

> On Tue, 2017-05-16 at 11:42 +0000, Leandro Nunes wrote:
> > I just noticed the caching implementation is calling getEntry method
> > twice when the entry is cached. I did some debugging and it looks
> > like the first one is related with flushing invalid entries and the
> > second call is related with actually serving the request. My question
> > is wether this is the expected behaviour or is something that has
> > been overlooked and can/should be addressed?
> >
> > Thanks,
> > Leandro
>
> Leandro
>
>
> HttpClient cache module desperately needs some attention. The best
> course of action should be to consult the RFC, fix the behavior if
> wrong, raise a PR at Github.
>
> Oleg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: HttpCacheStorage#getEntry called twice on cache hit

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Tue, 2017-05-16 at 11:42 +0000, Leandro Nunes wrote:
> I just noticed the caching implementation is calling getEntry method
> twice when the entry is cached. I did some debugging and it looks
> like the first one is related with flushing invalid entries and the
> second call is related with actually serving the request. My question
> is wether this is the expected behaviour or is something that has
> been overlooked and can/should be addressed?
> 
> Thanks,
> Leandro

Leandro


HttpClient cache module desperately needs some attention. The best
course of action should be to consult the RFC, fix the behavior if
wrong, raise a PR at Github.

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org