You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by 오재경 <ja...@nimbusnetworks.co.kr> on 2012/06/12 01:23:39 UTC

range request problem

Hi.

we have an serious issue. we have a customer site whose service is online
learning and it provides registered members with a player. the player sends
range request. 

Now you can guess what problem I confront. we have to cache contents but
because of the range request traffic server can’t cache whole content.
It doesn’t matter whether we set background_fill_completed_threshold to
0.00 or not.

My question is :
- is it possible I make a plugin that force to cache the contents(after
first range request and even if the player doesn’t send request anymore)
as if it is doing background_fill ? 
- if not so what else can/should I do?

thank you and best regards.

Re: range request problem

Posted by "Alan M. Carroll" <am...@thought-mesh.net>.
Yes. For a range request, if it cannot be satisfied by cache, is just passed on to the origin server. This is because there is no good thing to do with the request as partial responses cannot be cached and (as noted) always fetching the source object is not a good thing.

This hasn't come up much before because people who were concerned about this issue had already written plugins for various reasons and adding a background fill was easy.

One issue with doing a background fill is that you would need to check inside the range request response from the origin server to determine the size of the source object. Perhaps we could do that and use the background fill config  values to determine whether to start a background fill. We would want a separate switch (separate values?) so this wasn't done even if normal background fills were enabled.

> Unfortunately, those fixes doesn't help this. The problem is documented in
> at least one bug, e.g.

> https://issues.apache.org/jira/browse/TS-683


> Another option that I've been pondering, which is more suitable for large
> contents I think, is to kick off background requests for the full objects. A
> plugin that does something like:


Re: range request problem

Posted by Leif Hedstrom <zw...@apache.org>.
On 6/12/12 5:43 PM, Nick Kew wrote:
> On Tue, 12 Jun 2012 16:51:42 -0600
>> Yeah, I totally agree. There are some potential alternatives here, such as
>> fixed sized chunking of content perhaps. It's still a difficult problem to
>> solve optimally for every type of request. Your suggested heuristics
>> probably is reasonable for many cases, but what is the client asks for the
>> first 16KB, and we have no idea how long the request is (it could be e.b.
>> 512GB)? Do we defer dealing with it until we have collected enough data to
>> build an intelligent decision?
>>
>> Also, blindly caching every Range: request could potentially completely fill
>> the cache with responses that partially overlap (there are no restrictions
>> on how the client can form the Range requests :/).
> Are we at cross-purposes here?
>
> If the client requests the first 16kb, then a rangeless request to the
> backend fetches that 16k first, so the client can be satisfied while
> the proxy continues filling the cache.  That requires decoupling the
> client and server requests.  I wasn't suggesting caching any ranged
> request!

That could be difficult to do I think, because of the way the producer / 
consumer relationships are done in the core (but maybe it's doable, I 
haven't looked at it from that perspective).

>
> Thinking about it, perhaps an optimal heuristic is, on receipt of a
> range request, to make two requests to the backend: the request as-is,
> and (if cacheable) a second background request for everything-but the
> range.  The background request grabs a mutex so we don't duplicate

Right, that's almost what my initial proposal was done, except it wouldn't 
kick of the second "full" request until it gets the response header (to 
avoid sending the full request for an uncacheable object). Maybe we could 
issue a HEAD request initially, until we figure out if the object is 
cacheable or not? The "mutex" is more or less implicit by the way our cache 
works, only one producer can hold the cached object for write. This is also 
how read-while-writer works (one producer writing to the cache, and there 
can be multiple client consumers even before the cache is finished writing).
> That's actually looking a lot like your original proposal!
>

Brilliant minds ... :).

-- leif


Re: range request problem

Posted by Nick Kew <ni...@apache.org>.
On Tue, 12 Jun 2012 16:51:42 -0600
Leif Hedstrom <zw...@apache.org> wrote:

> On 6/11/12 11:20 PM, Nick Kew wrote:
> > On 12 Jun 2012, at 02:49, Leif Hedstrom wrote:
> >
> >> Another option that I've been pondering, which is more suitable for large
> >> contents I think, is to kick off background requests for the full objects. A
> >> plugin that does something like:
> >>
> >> Thoughts?
> > I seem to recollect discussing approaches to caching ranges recently
> > (was it with you?)
> >
> > What you outline makes sense where the resource is much bigger than
> > the requested range, and fetching the whole thing (in a rangeless
> > request to the backend) would be too much overhead.  But to enforce
> > it on all range requests could be overkill.  I wonder if there's a case
> > for adding a heuristic to examine the client's ranges, and fetch
> > the whole thing while the client waits UNLESS the number of
> > bytes the client wants to skip exceeds some threshold - which
> > then triggers what you describe?
> >
> 
> Yeah, I totally agree. There are some potential alternatives here, such as 
> fixed sized chunking of content perhaps. It's still a difficult problem to 
> solve optimally for every type of request. Your suggested heuristics 
> probably is reasonable for many cases, but what is the client asks for the 
> first 16KB, and we have no idea how long the request is (it could be e.b. 
> 512GB)? Do we defer dealing with it until we have collected enough data to 
> build an intelligent decision?
> 
> Also, blindly caching every Range: request could potentially completely fill 
> the cache with responses that partially overlap (there are no restrictions 
> on how the client can form the Range requests :/).

Are we at cross-purposes here?

If the client requests the first 16kb, then a rangeless request to the
backend fetches that 16k first, so the client can be satisfied while
the proxy continues filling the cache.  That requires decoupling the
client and server requests.  I wasn't suggesting caching any ranged
request!

Thinking about it, perhaps an optimal heuristic is, on receipt of a
range request, to make two requests to the backend: the request as-is,
and (if cacheable) a second background request for everything-but the
range.  The background request grabs a mutex so we don't duplicate
requests to a URL, and when done it reassembles the entire response
in cache.  Any other ranged requests for the same URL coming while
the URL is mutexed just runs without caching.

That's actually looking a lot like your original proposal!

-- 
Nick Kew

Re: range request problem

Posted by Leif Hedstrom <zw...@apache.org>.
On 6/11/12 11:20 PM, Nick Kew wrote:
> On 12 Jun 2012, at 02:49, Leif Hedstrom wrote:
>
>> Another option that I've been pondering, which is more suitable for large
>> contents I think, is to kick off background requests for the full objects. A
>> plugin that does something like:
>>
>> Thoughts?
> I seem to recollect discussing approaches to caching ranges recently
> (was it with you?)
>
> What you outline makes sense where the resource is much bigger than
> the requested range, and fetching the whole thing (in a rangeless
> request to the backend) would be too much overhead.  But to enforce
> it on all range requests could be overkill.  I wonder if there's a case
> for adding a heuristic to examine the client's ranges, and fetch
> the whole thing while the client waits UNLESS the number of
> bytes the client wants to skip exceeds some threshold - which
> then triggers what you describe?
>

Yeah, I totally agree. There are some potential alternatives here, such as 
fixed sized chunking of content perhaps. It's still a difficult problem to 
solve optimally for every type of request. Your suggested heuristics 
probably is reasonable for many cases, but what is the client asks for the 
first 16KB, and we have no idea how long the request is (it could be e.b. 
512GB)? Do we defer dealing with it until we have collected enough data to 
build an intelligent decision?

Also, blindly caching every Range: request could potentially completely fill 
the cache with responses that partially overlap (there are no restrictions 
on how the client can form the Range requests :/).

Cheers,

-- leif


Re: range request problem

Posted by Nick Kew <ni...@apache.org>.
On 12 Jun 2012, at 02:49, Leif Hedstrom wrote:

> Another option that I've been pondering, which is more suitable for large
> contents I think, is to kick off background requests for the full objects. A
> plugin that does something like:
> 
> Thoughts?

I seem to recollect discussing approaches to caching ranges recently
(was it with you?)

What you outline makes sense where the resource is much bigger than
the requested range, and fetching the whole thing (in a rangeless
request to the backend) would be too much overhead.  But to enforce
it on all range requests could be overkill.  I wonder if there's a case
for adding a heuristic to examine the client's ranges, and fetch
the whole thing while the client waits UNLESS the number of
bytes the client wants to skip exceeds some threshold - which
then triggers what you describe?

-- 
Nick Kew

Re: range request problem

Posted by 오재경 <ge...@gmail.com>.
>
> 1) In read-response-header hook, if the object is a Range response, *and*
> it's cacheable, schedule a background load on the task threads. This
> obviously has to be done in a way that the same request is only fetched
> one,
> effectively "locking" it.


How do i schedule a background load on the task threads? what are "the task
threads"?
Is there any API for that? I can't find new developed API from SDK
document. it's not updated. If you give me a hint i'll search source code.


> 2) On the task threads, we do a fetch through the SM for the URL, without a
> Range header. We simply discard the results.
>
>
yes. we don't have to send the response a users didn't ask but how discard?
just close the buffer?


> 3) While the background fetch is happening, client Range: requests for
> those
> objects continue to be proxied as before.

Re: range request problem

Posted by Leif Hedstrom <zw...@apache.org>.
On 6/11/12 5:34 PM, James Peach wrote:
> On 11/06/2012, at 4:23 PM, 오재경 wrote:
>
>> Hi.
>>
>> we have an serious issue. we have a customer site whose service is online
>> learning and it provides registered members with a player. the player sends
>> range request. 
>>
>> Now you can guess what problem I confront. we have to cache contents but
>> because of the range request traffic server can’t cache whole content.
>> It doesn’t matter whether we set background_fill_completed_threshold to
>> 0.00 or not.
> Have you tested with 3.1.4? There have been some improvements to range request handling in that release.
>


Unfortunately, those fixes doesn't help this. The problem is documented in
at least one bug, e.g.

https://issues.apache.org/jira/browse/TS-683


Another option that I've been pondering, which is more suitable for large
contents I think, is to kick off background requests for the full objects. A
plugin that does something like:

1) In read-response-header hook, if the object is a Range response, *and*
it's cacheable, schedule a background load on the task threads. This
obviously has to be done in a way that the same request is only fetched one,
effectively "locking" it.

2) On the task threads, we do a fetch through the SM for the URL, without a
Range header. We simply discard the results.

3) While the background fetch is happening, client Range: requests for those
objects continue to be proxied as before.

4) (possibly, not sure if it's doable without core changes, but we might
consider this): If an object is partially written in cache, and a Range:
request can be fully satisfied with what we already have in cache, server
out of the cache. This is similar to our read-while-writer feature.


A background fill feature like this is useful also as a "generic" tool, so
maybe the feature should go into core. That way, a plugin for these range
request fills can use that, as well as e.g. a serve-while-revalidate plugin
(as an example). I'm sure there could be other reasons why someone would
want to kick off a background fill request, which is disconnected from any
client sessions (i.e. there's only a blackhole consumer of the VC).

Thoughts?

-- leif


Re: range request problem

Posted by Eric Ahn <by...@gmail.com>.

> On 11/06/2012, at 4:23 PM, 오재경 wrote:
> 
>> Hi.
>> 
>> we have an serious issue. we have a customer site whose service is online
>> learning and it provides registered members with a player. the player sends
>> range request. 
>> 
>> Now you can guess what problem I confront. we have to cache contents but
>> because of the range request traffic server can’t cache whole content.
>> It doesn’t matter whether we set background_fill_completed_threshold to
>> 0.00 or not.
> 
> Have you tested with 3.1.4? There have been some improvements to range request handling in that release.
> 
> J

I've patched on 3.0.4 but didn't cached a range request. Do It only patch on 3.0.4?

Eric


> 
>> 
>> My question is :
>> - is it possible I make a plugin that force to cache the contents(after
>> first range request and even if the player doesn’t send request anymore)
>> as if it is doing background_fill ? 
>> - if not so what else can/should I do?
>> 
>> thank you and best regards.
> 

Re: range request problem

Posted by James Peach <jp...@apache.org>.
On 11/06/2012, at 4:23 PM, 오재경 wrote:

> Hi.
> 
> we have an serious issue. we have a customer site whose service is online
> learning and it provides registered members with a player. the player sends
> range request. 
> 
> Now you can guess what problem I confront. we have to cache contents but
> because of the range request traffic server can’t cache whole content.
> It doesn’t matter whether we set background_fill_completed_threshold to
> 0.00 or not.

Have you tested with 3.1.4? There have been some improvements to range request handling in that release.

J

> 
> My question is :
> - is it possible I make a plugin that force to cache the contents(after
> first range request and even if the player doesn’t send request anymore)
> as if it is doing background_fill ? 
> - if not so what else can/should I do?
> 
> thank you and best regards.