You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@shindig.apache.org by Brian Eaton <be...@google.com> on 2008/07/11 20:31:17 UTC

content rewriting as a chained fetcher?

Hey -

I just discovered the content rewriting logic in AbstractHttpCache.
I'm not going to mess with it right now, but I wanted to test the
waters: how do people feel about making the content rewriting part of
the content fetching chain instead of burying it down in caching
logic?

I see that Kevin has considered moving this into gadget rendering:
https://issues.apache.org/jira/browse/SHINDIG-398.

Cheers,
Brian

Re: content rewriting as a chained fetcher?

Posted by Ian Boston <ie...@tfd.co.uk>.

Also moving the parts that relate to directly to the cache into the  
cache would make more sense, eg from a 'cache' point of view, element  
driven expiry in the cache should not need to know that its an http  
element. Its relatively easy to implement below the cache abstraction  
layer, if the interface to the cache has the concept of element  
driven expiry.

Did the cache abstraction patches get applied (I cant remember  
immediately, but will go and look).
Ian

On 11 Jul 2008, at 19:43, Kevin Brown wrote:

> The whole http fetcher stack is a mess right now. Adding more  
> chaining isn't
> going to help the situation.
>
> Rewriting being done at the http fetcher level has also caused  
> several new
> issues -- notably, rewriting gets done before message bundle  
> substitution,
> which breaks a lot of things. Moving it out of the http stack  
> entirely seems
> like it would be cleaner.
>
> On Fri, Jul 11, 2008 at 11:31 AM, Brian Eaton <be...@google.com>  
> wrote:
>
>> Hey -
>>
>> I just discovered the content rewriting logic in AbstractHttpCache.
>> I'm not going to mess with it right now, but I wanted to test the
>> waters: how do people feel about making the content rewriting part of
>> the content fetching chain instead of burying it down in caching
>> logic?
>>
>> I see that Kevin has considered moving this into gadget rendering:
>> https://issues.apache.org/jira/browse/SHINDIG-398.
>>
>> Cheers,
>> Brian
>>

Re: content rewriting as a chained fetcher?

Posted by Kevin Brown <et...@google.com>.

The whole http fetcher stack is a mess right now. Adding more chaining isn't
going to help the situation.

Rewriting being done at the http fetcher level has also caused several new
issues -- notably, rewriting gets done before message bundle substitution,
which breaks a lot of things. Moving it out of the http stack entirely seems
like it would be cleaner.

On Fri, Jul 11, 2008 at 11:31 AM, Brian Eaton <be...@google.com> wrote:

> Hey -
>
> I just discovered the content rewriting logic in AbstractHttpCache.
> I'm not going to mess with it right now, but I wanted to test the
> waters: how do people feel about making the content rewriting part of
> the content fetching chain instead of burying it down in caching
> logic?
>
> I see that Kevin has considered moving this into gadget rendering:
> https://issues.apache.org/jira/browse/SHINDIG-398.
>
> Cheers,
> Brian
>

Re: content rewriting as a chained fetcher?

Posted by Kevin Brown <et...@google.com>.

On Mon, Jul 14, 2008 at 1:44 PM, Brian Eaton <be...@google.com> wrote:

> On Mon, Jul 14, 2008 at 12:19 PM, Kevin Brown <et...@google.com> wrote:
> > Coupling rewriting to http fetching further just makes the problem worse.
>
> It has to be coupled somewhere.  The question is what kind of coupling
> is cleanest.
>
> > There's really no reason why the only thing being cached has to be an
> > HttpResponse, or, indeed, why there can't be multiple caches for
> different
> > types of data.
>
> Somewhere has to be aware of the semantics of the HTTP/1.1 caching
> spec.  AbstractHttpCache seems like a reasonable place for that logic.
>  The code below that can (should?) be reasonably independent of the
> type of data being cached, modulo the kind of element driven expiry
> abstraction that Ian proposed.
>
> > Right now, the callers of the fetchers have to know a lot about every
> type
> > of request. That calls for separate interfaces, not globbing more stuff
> into
> > the same interface.
>
> The caller of the fetcher has to know about the new interfaces. =)
> There is no way to avoid something fairly close to MakeRequestHandler
> having to know something about OAuth and signed fetch.  I like the way
> this was handled for oauth.  MakeRequestHandler has very few OAuth
> specific lines of code.  There is:
>
>      case OAUTH:
>        return contentFetcherFactory.getOAuthFetcher(token, new
> OAuthRequestParams(request));
>
> And later:
>      // Merge in additional response data
>      for (Map.Entry<String, String> entry :
> results.getMetadata().entrySet()) {
>        resp.put(entry.getKey(), entry.getValue());
>      }
>
> That's it.

That and the supporting classes like ContentFetcherFactory =). This would be
cleaner if there were just 3 separate interfaces (HttpFetcher,
SignedFetcher, OAuthFetcher). We'll never eliminate the switching logic in
the make request handler or the preloader (or in the upcoming proxied
content handler), so we may as well embrace it.

> OAuthRequestParams handles the parsing of the various
> OAuth related parameters to makeRequest, and the OAuth logic is
> encapsulated in a content fetcher.  The switch from opensocial 0.7 to
> opensocial 0.8 OAuth syntax did not impact MakeRequestHandler at all.
> That's the kind of independence we should shoot for.
>
> What freaks me out about content-rewriting being munged in with
> caching is that I've got *no idea* how the OAuth changes for caching
> impact content rewriting.  The interface to the cache is fuzzy, and
> that makes it hard to reuse both the caching and the content rewriting
> code.

I agree -- but I think the notion of adding caching as a part of the chain
just makes things more complicated. I'd rather rewriting be done at the
gadget rendering and proxy layers rather than the http layer, and that
rewritten content was cached independently of http responses. There are many
types of "rewriting" that might take place in the long run (including Caja),
so we may as well have that rewriting happening in a standard place rather
than putting it all over the code. We actually already have this partially
implemented in GadgetRenderingTask.

>
>
> Cheers,
> Brian
>

Re: content rewriting as a chained fetcher?

Posted by Brian Eaton <be...@google.com>.

On Mon, Jul 14, 2008 at 12:19 PM, Kevin Brown <et...@google.com> wrote:
> Coupling rewriting to http fetching further just makes the problem worse.

It has to be coupled somewhere.  The question is what kind of coupling
is cleanest.

> There's really no reason why the only thing being cached has to be an
> HttpResponse, or, indeed, why there can't be multiple caches for different
> types of data.

Somewhere has to be aware of the semantics of the HTTP/1.1 caching
spec.  AbstractHttpCache seems like a reasonable place for that logic.
 The code below that can (should?) be reasonably independent of the
type of data being cached, modulo the kind of element driven expiry
abstraction that Ian proposed.

> Right now, the callers of the fetchers have to know a lot about every type
> of request. That calls for separate interfaces, not globbing more stuff into
> the same interface.

The caller of the fetcher has to know about the new interfaces. =)
There is no way to avoid something fairly close to MakeRequestHandler
having to know something about OAuth and signed fetch.  I like the way
this was handled for oauth.  MakeRequestHandler has very few OAuth
specific lines of code.  There is:

      case OAUTH:
        return contentFetcherFactory.getOAuthFetcher(token, new
OAuthRequestParams(request));

And later:
      // Merge in additional response data
      for (Map.Entry<String, String> entry : results.getMetadata().entrySet()) {
        resp.put(entry.getKey(), entry.getValue());
      }

That's it.  OAuthRequestParams handles the parsing of the various
OAuth related parameters to makeRequest, and the OAuth logic is
encapsulated in a content fetcher.  The switch from opensocial 0.7 to
opensocial 0.8 OAuth syntax did not impact MakeRequestHandler at all.
That's the kind of independence we should shoot for.

What freaks me out about content-rewriting being munged in with
caching is that I've got *no idea* how the OAuth changes for caching
impact content rewriting.  The interface to the cache is fuzzy, and
that makes it hard to reuse both the caching and the content rewriting
code.

Cheers,
Brian

Re: content rewriting as a chained fetcher?

Posted by Kevin Brown <et...@google.com>.

On Mon, Jul 14, 2008 at 10:25 AM, Brian Eaton <be...@google.com> wrote:

> On Mon, Jul 14, 2008 at 10:08 AM, Louis Ryan <lr...@google.com> wrote:
> > Brian
> > This was actually how the original implementation worked and was later
> > changed because we needed to ability to associate rewritten content with
> the
> > cacheable version of the request and not the executed version of the
> > request. See the OAuth and signing fetchers for examples of why this is
> > important. Having the chained-fetchers do this just spreads the cache
> > lookups throughout the code which isnt very DRY. A potential refactoring
> > would be to allow for injecting cache entry pre/post processors into the
> > cache.
>
> The way I'd like to do this would work as follows:
> - any content fetcher that needs special handling of cached content
> (which is pretty much all of them) exposes an interface to generate an
> appropriate cache key
> - we thwack a CacheContentFetcher on the front of the fetching chain
> that knows what do with cache keys.
>
> So the CacheContentFetcher looks something like this:
>
>   key = nextFetcher.getCacheKey(request)
>   if (key in cache) {
>      return cached content
>   }
>   content = nextFetcher.fetch(request)
>   cache.put(key, content)
>
> The rewriting content fetcher would sit between the cached content
> fetcher and the other fetchers in the chain (e.g signing, oauth, or
> anything else).  It's getCacheKey method would look something like
> this:
>
>   key = nextFetcher.getCacheKey(request)
>   if (shouldRewrite(request)) {
>       key.addParameter("rewritten", "true")
>   }
>   return key;
>
> So the fetching chain would look like this:
>   caching -> rewriting -> [authentication] -> remote content fetcher
>
> This would be pleasant because then we wouldn't need to sprinkle cache
> lookups throughout all of the fetchers.  Each fetcher would need to
> know how to generate an appropriate cache key, but wouldn't need a
> handle to the actual cache and wouldn't need to worry about how to do
> lookups.


Coupling rewriting to http fetching further just makes the problem worse.
There's really no reason why the only thing being cached has to be an
HttpResponse, or, indeed, why there can't be multiple caches for different
types of data.

Right now, the callers of the fetchers have to know a lot about every type
of request. That calls for separate interfaces, not globbing more stuff into
the same interface.


>
> Cheers,
> Brian
>

Re: content rewriting as a chained fetcher?

Posted by Brian Eaton <be...@google.com>.

On Mon, Jul 14, 2008 at 10:08 AM, Louis Ryan <lr...@google.com> wrote:
> Brian
> This was actually how the original implementation worked and was later
> changed because we needed to ability to associate rewritten content with the
> cacheable version of the request and not the executed version of the
> request. See the OAuth and signing fetchers for examples of why this is
> important. Having the chained-fetchers do this just spreads the cache
> lookups throughout the code which isnt very DRY. A potential refactoring
> would be to allow for injecting cache entry pre/post processors into the
> cache.

The way I'd like to do this would work as follows:
- any content fetcher that needs special handling of cached content
(which is pretty much all of them) exposes an interface to generate an
appropriate cache key
- we thwack a CacheContentFetcher on the front of the fetching chain
that knows what do with cache keys.

So the CacheContentFetcher looks something like this:

   key = nextFetcher.getCacheKey(request)
   if (key in cache) {
      return cached content
   }
   content = nextFetcher.fetch(request)
   cache.put(key, content)

The rewriting content fetcher would sit between the cached content
fetcher and the other fetchers in the chain (e.g signing, oauth, or
anything else).  It's getCacheKey method would look something like
this:

   key = nextFetcher.getCacheKey(request)
   if (shouldRewrite(request)) {
       key.addParameter("rewritten", "true")
   }
   return key;

So the fetching chain would look like this:
   caching -> rewriting -> [authentication] -> remote content fetcher

This would be pleasant because then we wouldn't need to sprinkle cache
lookups throughout all of the fetchers.  Each fetcher would need to
know how to generate an appropriate cache key, but wouldn't need a
handle to the actual cache and wouldn't need to worry about how to do
lookups.

Cheers,
Brian

Re: content rewriting as a chained fetcher?

Posted by Louis Ryan <lr...@google.com>.

Brian
This was actually how the original implementation worked and was later
changed because we needed to ability to associate rewritten content with the
cacheable version of the request and not the executed version of the
request. See the OAuth and signing fetchers for examples of why this is
important. Having the chained-fetchers do this just spreads the cache
lookups throughout the code which isnt very DRY. A potential refactoring
would be to allow for injecting cache entry pre/post processors into the
cache.

-Louis

On Fri, Jul 11, 2008 at 11:31 AM, Brian Eaton <be...@google.com> wrote:

> Hey -
>
> I just discovered the content rewriting logic in AbstractHttpCache.
> I'm not going to mess with it right now, but I wanted to test the
> waters: how do people feel about making the content rewriting part of
> the content fetching chain instead of burying it down in caching
> logic?
>
> I see that Kevin has considered moving this into gadget rendering:
> https://issues.apache.org/jira/browse/SHINDIG-398.
>
> Cheers,
> Brian
>