You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Sudheer Vinukonda (JIRA)" <ji...@apache.org> on 2015/10/23 17:36:27 UTC

[jira] [Updated] (TS-3979) Remove allow empty doc cache setting.

     [ https://issues.apache.org/jira/browse/TS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sudheer Vinukonda updated TS-3979:
----------------------------------
    Affects Version/s: 6.0.0
        Fix Version/s: 7.0.0

> Remove allow empty doc cache setting.
> -------------------------------------
>
>                 Key: TS-3979
>                 URL: https://issues.apache.org/jira/browse/TS-3979
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 6.0.0
>            Reporter: Sudheer Vinukonda
>             Fix For: 7.0.0
>
>
> Refer TS-3978
> Following a discussion on the IRC, opening this jira to deprecate the Allow empty doc cache setting. The consensus is to cache anything that's cacheable and valid (regardless, of if it's empty or non-empty).
> {code}
> Amaryllis
> 7:50
> sudheerv: do any other reverse proxies refuse to cache empty documents without content-length?  it's not something i've noticed before (in varnish, for example)
> Amaryllis
> 7:50
> the connection could still be broken at any point in the response and result in broken content being cached
> 7:51
> but as a compromise, what if there was another option to only cache empty responses for 3xx, which is probably 80% of use cases?
> 7:52
> (maybe, in addition, it should accept transfer-encoding: chunked, since that can indicate end-of-body)
> zwoop_
> 7:54
> Amaryllis  we should talk to amc (he knows everything, we don't need Google here) about this
> 7:54
> zwoop_ is now known as woop
> 7:54
> woop is now known as zwoop
> 7:54
> zwoop left the room (quit: Changing host).
> 7:54
> zwoop [~zwoop@apache/committer/zwoop] entered the room.
> 7:54
> mode (+o zwoop) by ChanServ
> zwoop
> 7:55
> Amaryllis amc Maybe we could have a plugin API, that basically forces to set f.allow_empty_doc ?
> amc
> 7:55
> I think you can change the CacheVC structure without problem.
> Amaryllis
> 7:55
> i think some sort of fix for this should be in core, at the moment TS won't cache any of our redirects
> amc
> 7:56
> There's not a global config to allow that?
> zwoop
> 7:56
> Amaryllis  does it send a Cache-Control header with the redirects ?
> Amaryllis
> 7:56
> amc, zwoop: the origin sends transfer-encoding: chunked instead of content-length.
> 7:56
> so even with cache-control it won't be cached, that's what this PR is to change
> 7:56
> TS-3978
> ASFBot
> 7:56
> TS-3978: Allow empty document caching to follow normal logic - https://issues.apache.org/jira/browse/TS-3978
> zwoop
> 7:56
> ah
> 7:57
> yeah, that's expected
> 7:57
> and is why we added that option to allow empty docs to be cached
> Amaryllis
> 7:57
> this is with that option enabled
> zwoop
> 7:57
> huh
> Amaryllis
> 7:57
> it still won't cache any document without content-length
> zwoop
> 7:57
> yeah, need CL: too
> 7:57
> that's a requirement
> Amaryllis
> 7:57
> so i'll update the PR to also recognise chunked responses as being cacheable even if empty
> zwoop
> 7:57
> otherwise, it can't distinguish the cases you were worried about (a broken connection) vs a truly empty doc
> Amaryllis
> 7:58
> zwoop: right, but the PR makes it optional, only if allow_empty_doc=2
> zwoop
> 7:58
> heh, your redirect is Chunked ??
> Amaryllis
> 7:58
> yes, an empty chunked response 
> zwoop
> 7:58
> is that even correct?
> Amaryllis
> 7:58
> it's perfectly valid, if somewhat unusual
> zwoop
> 7:58
> doesn't the chunking require the final "0" ?
> 7:58
> which means, the doc isn't empty
> Amaryllis
> 7:58
> i think ATS is looking at 'empty' after de-chucking, isn't it?
> 7:59
> because it's definitely not cacheing these responses
> zwoop
> 7:59
> not sure
> 7:59
> gancho_ left the room (quit: Ping timeout: 272 seconds).
> amc
> 7:59
> I think Amaryllis is correct.
> zwoop
> 7:59
> Amaryllis  but, I definitely remember that the CL: header was a requirement that can not be ignored (safely), so not sure I think having an allow_empty_doc=2 is ok
> 8:00
> unless allow_empty_doc=2 also means allow Chunked content to be empty
> amc
> 8:00
> I need to check to see what ATS actually caches - the chunked data or the payload.
> zwoop
> 8:00
> but, you have to have some indicator telling us that the doc really is empty before we can cache it
> 8:00
> amc is caches the unchunked data
> 8:00
> it dechunks it, and caches that
> 8:01
> is dechunk even a proper verb? 
> amc
> 8:01
> Ah, but it caches the encoding header.
> Amaryllis
> 8:01
> zwoop: https://dpaste.de/6kUT
> zwoop
> 8:01
> amc Hmmm, really ? That doesn't sound right
> amc
> 8:01
> "zwoop drank too much at the summit and dechunked himself in the hallway". Yep, a proper verb.
> Amaryllis
> 8:01
> that's the actual origin response causing problems for us
> zwoop
> 8:01
> amc I'd expect it to change it from chunked to a Content-Length:
> amc
> 8:02
> I was wondering about that.
> zwoop
> 8:02
> amc lets dechunk big time at the Bar camp
> amc
> 8:02
> What happens when the content is served? Is the encoding preserved and obeyed?
> zwoop
> 8:03
> for the first client (the cache miss) I think it seems the chunked response
> Amaryllis
> 8:03
> yes, TS returns a correct thunked response
> 8:03
> and never caches it, so it's always the same
> zwoop
> 8:03
> but subsequent requests (when served out of cache) should have a CL:
> amc
> 8:03
> I was thinking of what happens for chunked responses from origin that are non-zero lenght.
> zwoop
> 8:03
> Amaryllis  can you not convince your origin that they are crazy, and change it to not send a Chunked empty response, and instead send CL: 0 ?
> sudheerv
> 8:04
> catching up on the messages…but, my concern is the same as zwoop's
> Amaryllis
> 8:04
> zwoop: no, it never adds a CL: https://dpaste.de/Goex
> zwoop
> 8:04
> Amaryllis  because youa re not caching it
> Amaryllis
> 8:04
> yes, exactly, because it's not cacheable 
> zwoop
> 8:04
> Amaryllis  amc is asking the general question, to which my answer is
> Amaryllis
> 8:04
> ah
> zwoop
> 8:04
> amc yes, I'd expect it to send the chunked response to the first client, and subsequent responses (out of cache) with a CL: 0
> Amaryllis
> 8:04
> perhaps i can make uwsgi do some sort of output buffering and put a CL header in
> zwoop
> 8:05
> amc that's at least how it's worked for the last ~10 years, we better not ahve changed it
> Amaryllis
> 8:05
> zwoop: how can that ever happen?  a chunked empty response will never be cached
> zwoop
> 8:05
> not talking empty responses
> 8:05
> general case
> amc
> 8:05
> I have many opinions on this. I do think it should be cacheable as an empty doc if it's chunked with a zero length payload.
> Amaryllis
> 8:05
> okay, but it's not CL: 0 then
> zwoop
> 8:05
> your case is deranged
> 8:05
> Amaryllis  right, nothing to do with your case
> Amaryllis
> 8:05
> amc: shall i submit a separate PR to fix that?
> amc
> 8:06
> Yes.
> zwoop
> 8:06
> amc yeah, that seems reasonable. But, If it was me, I'd still beat the Origin people over the head, cause they are crazy
> sudheerv
> 8:06
> amc: yes, but, you have to ensure that you received the final chunk
> 8:06
> if it already is the case, by the time you are checking for allow empty doc caching, then that's fine
> amc
> 8:06
> Right. That is happening in the clips Amaryllis showed us.
> Amaryllis
> 8:06
> zwoop: i think it's because the origin sends the headers before it's generated the response body.  so it doesn't know it's going to be empty
> sudheerv
> 8:06
> but, i dont know if that's the case already
> zwoop
> 8:07
> Amaryllis  it would know it's a redirect no?
> 8:07
> Amaryllis and why would a redirect have a body, ever ?
> Amaryllis
> 8:07
> yes, but redirects can still have bodies
> sudheerv
> 8:07
> you may get empy responses without content-length when we support outbound spdy/h2 as well
> zwoop
> 8:07
> true
> Amaryllis
> 8:07
> most web servers include bodies in 3xx
> 8:07
> i did have the origin fix it in one specific case but we need something more general
> zwoop
> 8:07
> yeah, I forgot about that, you're right
> 8:07
> niq left the room (quit: Ping timeout: 250 seconds).
> amc
> 8:08
> If it's chunked and ATS recieves the final payload (0 length frame) it can reasonably presume it got valid (non-truncated) content.
> sudheerv
> 8:08
> Amaryllis: I think the patch should ensure it can cover non-broken empty body cases for chunked-encoding/spdy/h2
> Amaryllis
> 8:08
> sudheerv: does this even apply to spdy/h2?
> Amaryllis
> 8:09
> actually, there is no H2 to origins at all...
> sudheerv
> 8:09
> not right now - we don't support outbound spdy/h2 yet
> Amaryllis
> 8:09
> right
> sudheerv
> 8:09
> but, when we do, it will be a similar problem there
> Amaryllis
> 8:09
> yes
> sudheerv
> 8:09
> there's no chunked encoding or content-len with spdy/h2
> Amaryllis
> 8:09
> but i can't really test code that's yet to be written 
> sudheerv
> 8:09
> so, yet another case of empty body 
> 8:09
> no, my point is - the code we write now must not break when we have spdy/h2
> 8:10
> so, instead of "assuming" that the body is valid, it might make sense
> 8:10
> to somehow indicate that it is
> Amaryllis
> 8:10
> well, i think it will break anyway, in the sense that empty H2 responses won't be cached... since they have no CL?
> sudheerv
> 8:10
> well, but you are fixing that 
> Amaryllis
> 8:10
> but H2 doesn't do TE either, does it?  i thought it replaces that entire thing
> sudheerv
> 8:10
> if it's consistently broken (or not supported), that's alright
> 8:10
> yes, it doesn't
> 8:10
> that's my point
> Amaryllis
> 8:11
> all i'm adding (for now) is to make it accept CL _or_ TE
> sudheerv
> 8:11
> so, we need to have a generic way of notifying cache layer
> 8:11
> that the response is valid
> Amaryllis
> 8:11
> and test that empty TE is properly checked for the empty chunk
> amc
> 8:12
> Hmmm. Maybe a virtual on CacheVConnection "setAllowEmptyContent(bool)".
> sudheerv
> 8:12
> yeah, something like that
> amc
> 8:12
> Then an implementation in CacheVC.
> sudheerv
> 8:12
> amc: i also want to make this setting overridable
> amc
> 8:12
> Then the higher layer can say "I know this was a valid zero length content, cache monkey"
> sudheerv
> 8:12
> (but, that's for later)
> 8:12
> yes, that makes sense
> Amaryllis
> 8:13
> amc: CacheVC already has allow_empty_doc flag, i think that's enough
> sudheerv
> 8:13
> Amaryllis: yeah, it does - I poked around that a bit recently, didn't seem very straightfwd
> amc
> 8:13
> Just bang on that directly?
> sudheerv
> 8:13
> but, pls go ahead and do it if you find it easy 
> Amaryllis
> 8:13
> amc: that was the plan, yes 
> 8:14
> amc: https://github.com/apache/trafficserver/pull/310/files#diff-7fada9a23fa1d12e90ca6bec876ecf8fL477
> 8:14
> amc: ... that check just needs another check for chunked as well.
> sudheerv
> 8:15
> Amaryllis: if we did something like what amc suggested above
> amc
> 8:15
> No, I meant that it would be set from (for instance) the chunked decoder or just above that.
> sudheerv
> 8:15
> where the higher layer can notify cache that it can be cached
> 8:15
> then we don't have to check any headers
> 8:15
> that already seems hacky enough
> amc
> 8:15
> So there wouldn't be an addiitional value for the config, it would just work as epexted.
> 8:15
> expected.
> sudheerv
> 8:15
> yes
> jpeach
> 8:15
> there's already a function that checks whether the whole document was received
> amc
> 8:16
> Hmmm. Interesting.
> Amaryllis
> 8:16
> i think some things are getting mixed up here - i'm not going to add an additional config value
> 8:16
> that's what the rejected PR did
> jpeach
> 8:16
> it is used when figuring what to do when getting an EOF
> amc
> 8:16
> What I would want is a way for the chunk decoder or its parent logic to pass on to the CacheVC the validity of the document.
> 8:17
> Amaryllis-  I thought your PR added the value '2' to the config variable.
> jpeach
> 8:17
> afaict as long as you can know you have the entire doc, it could be cacheable
> sudheerv
> 8:17
> jpeach: that makes sense
> Amaryllis
> 8:17
> amc: yes, but people disliked that, so instead i suggested fixing it to check for chucked encoding
> 8:17
> amc: which doesn't need an additional config setting
> sudheerv
> 8:17
> but, i guess the complication is that, at what ppint to do you know that you received the whole doc
> amc
> 8:17
> OK, I must be reading the wrong PR.
> sudheerv
> 8:17
> is it before checking allow empty doc or after
> Amaryllis
> 8:18
> amc: there's only one PR, i didn't write the chunked version yet
> sudheerv
> 8:18
> if it's just a matter of changing the order to check for whole doc, without breaking anything , then that seems like a cleaner solution
> amc
> 8:18
> Ah, OK.
> 8:19
> Yes, I agree. I think we're all on the same page - check for getting the whole document and if it's zero length and the config value is 1, cache it.
> Amaryllis
> 8:19
> (we need something for 6.0 fairly quickly, so we'll be using this patch in production, but i'm happy to put more effort into fixing it properly)
> sudheerv
> 8:19
> +1
> amc
> 8:19
> The brokeness is that complete chunked stuff isn't being checked correctly.
> 8:19
> Amaryllis - coool.
> Amaryllis
> 8:19
> yes - that's the only bit that actually needs to be fixed, but it might also make sense to refactor how the check is done
> amc
> 8:20
> Allright.
> Amaryllis
> 8:20
> i'm not sure i know enough about the TS internals to actually implement the second bit though
> amc
> 8:20
> I have to go get my Macbook fixed - update took out my wireless.
> 8:21
> I'll see if I have some time this weekend to take a look.
> sudheerv
> 8:21
> Amaryllis: if it's okay with everyone else, adding the check for TE header just to fix the TE part alone for now is fine by me as well
> 8:21
> (we can open a separate jira to improve the code for future)
> 8:22
> (was just saying, that instead of adding case by case, it'd be nicer to have a cleaner solution - but, by no means, it's immediately required)
> Amaryllis
> 8:23
> sudheerv: sure
> jpeach
> 8:23
> found it ... HttpSM::is_http_server_eos_truncation
> 8:23
> dustywusty_ is now known as dustywusty
> sudheerv
> 8:23
> jpeach: that might be a bit late?
> 8:24
> it seems to be after tunnel completes
> jpeach
> 8:24
> the place it is called from might be too late, but that's the logic for knowing if we have the whole response
> sudheerv
> 8:24
> whereas, the tunnel is writing to UA and Cache in parallel
> zwoop
> 8:24
> sorry, had to reboot.
> sudheerv
> 8:24
> jpeach: right, but, the point is
> zwoop
> 8:24
> amc sudheerv  I'm thinking, maybe we should just deprecate this setting completely ? And always allow caching empty docs when we safely can ?
> sudheerv
> 8:24
> you may get an EOS after some body is recieved already
> 8:25
> zwoop: +1 to that
> zwoop
> 8:25
> it was added for compatibility reasons eons ago, but we changed the default to "1"  in 5.x I think
> sudheerv
> 8:25
> we just need to know when is it safe 
> jpeach
> 8:25
> sudheerv: I'm just saying that we have logic to know whether we have the whole response; not that the logic is already used in all the right places
> zwoop
> 8:25
> we can deprecate it now (marking it as "don't change / use this unless you really, really have to" and remove it for 7.0.0
> sudheerv
> 8:27
> +1
> 8:27
> reveller left the room.
> zwoop
> 8:28
> sudheerv  file a Jira on it maybe ? Or two. We should change the docs now (marking it deprecated) and a bug for 7.0.0 for removal
> sudheerv
> 8:29
> sure..
> 8:30{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)