You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Gancho Tenev <ga...@apache.org> on 2019/02/11 20:42:02 UTC

Re: [PROPOSAL] Rewrite url after running all remap plugins

I looked into the issues #4929, #4026, #2877 and PR #4531 which are all related to this proposal about TSRemapRequestInfo::requestUrl (requestUrl for short) and plugin ordering and I would like to revive the thread. (links to issues, PRs, IRC logs referenced at the end of this email)

I definitely support *not* having the first plugin special but would like to propose to rewrite the url *before* running all plugins instead of *after* which would guarantee that:
all plugins would get the same requestUrl (first plugin not special case)
all plugins would treat requestUrl the same way consistently as a *remapped* URL which makes the first plugin logic really no different from the rest
there would be a remapped URL default in case the remap rule had no plugins OR none of the plugins modified the remapped URL.




The following are some thoughts and information which I hope would justify the proposal.
 
Plugin design:
==============

Based on my understanding of the relevant code and the comments in it, the plugin execution by design allows the plugins to be ordered in a pipeline which results in “calculating” the final remapped URL. requestUrl is meant to be used in read/write mode as the *input* and the *output* for each individual plugin in the pipeline.

Each plugin from the pipeline could read requestUrl if it needs to check what would be the eventual remapped URL if not modified and to decide to modify it if necessary (as a whole or only parts of it by using TS API calls). Then the next plugin in the pipeline could follow the same read and/or modify action and so on and so forth. The pipeline progress can be continued or aborted based on any plugin return code which also marks if the remapped URL was modified.

At least this was the behavior before PR #4531, except that for some reason, not well understood, the special first plugin got the *pristine* URL as input and the rest of the plugins got the *remapped* URL.
 

Even after the PR #4531 fix the first plugin still feels special?
=================================================================
 
Now let us consider the following use case after the PR #4531 fix. 

The first plugin gets the *pristine* URL (as before instead of the *remapped* URL) which means it will have to treat its input as *pristine* URI. If the first plugin decides to check (read input) and modify (write output) requestUrl then the second plugin will have to treat the input requestUrl differently as *remapped* URL since the first plugin already modified it and if no other plugin modifies requestUrl the content of requestUrl will end up being the final *remapped* URL.

Wait! The first plugin gets *pristine* URL and the rest get the *remapped* URL? This still feels like inconsistent design since it makes the first plugin a special case again which was something that we tried to fix with PR #4531.

Also PR #4531 kept the backward compatibility for the first plugin and broke compatibility for the rest. We cannot avoid breaking compatibility but I would prefer to break compatibility for the one (first) plugin in the chain rather than for the rest (most) of the plugins. Based on what I have seen, most of the configurations run with much more than a single plugin per remap. 

Looking further into the usage, the overhead of setting a default by overwriting URL before all plugins run on every matching transaction will be negligible since most of the plugins don’t modify the requestUrl or if they do, it is the case for a relatively small number of the transactions, hence we would hit the overwrite most of the time regardless. The only exception are plugins like regex_remap which constantly overwrite URL but even then setting the default before regex_remap modifies the URL does not seem too expensive.

I have a PR #4964 which I believe should address all the aforementioned issues. 

Please let me know what you think!

Cheers,
--Gancho


Reference:
https://github.com/apache/trafficserver/issues/4929 - “Cachekey plugin assumes requestUrl in TSRemapRequestInfo is remapped URL”
https://github.com/apache/trafficserver/issues/4026 - “Plugin ordering changes remap URI even with empty plugin”
https://github.com/apache/trafficserver/issues/2877 - “Unexpected/wrong/undocumented behavior when using regex_remap plugin”
https://github.com/apache/trafficserver/pull/4531 - “Rewrite url after running all remap plugin”
https://github.com/apache/trafficserver/pull/4964 <https://github.com/apache/trafficserver/pull/4964> - "Rewrite url before all remap plugins run” (NEW)
https://wilderness.apache.org/channels/?f=traffic-server/2018-11-29 - IRC discussion about “Rewrite url after running all remap plugin”




> On Dec 3, 2018, at 12:09 PM, Alan Carroll <so...@oath.com.INVALID> wrote:
> 
> This came up again on IRC. The effect of the PR is
> 
> Before:
> If the first remap plugin does not do a remap, then the URL is remapped to
> the "from" URL in the remap rule, then the rest of the remap plugins run.
> 
> After:
> All remap plugins are run, and if *none* of them remap, then the "from" URL
> from the remap rule is used.
> 
> The primary point of this is to not have the first remap plugin be special.
> In particular this means a plugin that's change from first to second
> doesn't see different behavior.
> 
> 
> On Mon, Dec 3, 2018 at 5:50 AM Takuya Kitano <tk...@yahoo-corp.jp> wrote:
> 
>> Thank you. This is exactly what I want to fix.
>> 
>> 
>> 
>> 2018/12/02 21:59 に、"宋辰伟" <61...@qq.com> を書き込みました:
>> 
>>    Actually there is a problem that URL is only overwritten in first
>> plugin in current logic. That means the request_url is overwritten in first
>> plugin(which return NO_REMAP ) and never changed even if the remain plugin
>> set map_to or map_form, except you set request_url directly.  If my
>> understanding  is correctly.
>> 
>>    scw00 - Song Chenwei
>> 
>>> 在 2018年12月2日,上午11:26,Bret Palsson <br...@gmail.com> 写道:
>>> 
>>> I don’t normally comment, but feel like I need to chime in here.
>>> 
>>> I agree in keeping with the stack behavior. This change has the
>> potential to break a lot of plugin logic and will cause a lot of work
>> across many dev teams that maintain many plugins. The pristine URL should
>> be used in subsequent plugins as needed.
>>> 
>>> Having said that, I am also interested to know and understand the
>> cause for wanting this behavior.
>>> 
>>> -Bret
>>> 
>>>> On Dec 1, 2018, at 20:15, Yongming Zhao <mi...@gmail.com> wrote:
>>>> 
>>>> I’d like to take a little step back ask about what is the root
>> causing about this change:
>>>> 
>>>> in the origin design
>>>> 1, the remap plugin works in ordered, which means you can
>> stop(consider it done) remapping at any plugin if you want, and the
>> following plugins will not run after.
>>>> 2, the remap plugin works in stacks, which means the seconds plugin
>> will continue work on the URL which is rewritten in the first plugin.
>>>> 
>>>> when you talking about the origin URL, there is always a pristine
>> URL you can copy, if you really want, so the second plugin can see the
>> origin un-mapped URL. this pristine URL is design for logging, but I think
>> you can take it for your purpose if you like
>>>> 
>>>> 
>>>> so, here rise my question:
>>>> can the pristine URL full file your second plugin needs? if so, can
>> we keep the origin design?
>>>> 
>>>> 
>>>> - Yongming Zhao 赵永明
>>>> 
>>>>> 在 2018年11月5日,上午11:06,Takuya Kitano <tk...@yahoo-corp.jp> 写道:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I’d like to propose a change about the timing that ATS runs remap
>> plugins.
>>>>> 
>>>>> Currently, we configure remap.config like the following snippet,
>>>>> and each plugins use `rri->requestUrl`,
>>>>> the behavior of a remap plugin as the first plugin is different
>> from that as the second plugin.
>>>>> 
>>>>> ```
>>>>> map http://before-remap.com/ http://after-remap.com/
>> @plugin=<first plugin>.so @pparam=... @plugin=<second plugin>.so @pparam=...
>>>>> ```
>>>>> 
>>>>> In detail, the first plugin get pre-remapped url information from
>> `rri->requestUrl`,
>>>>> but the second plugin get post-remapped one.
>>>>> 
>>>>> The cause of this behavior is ATS executes
>> `url_rewrite_remap_request` function after running the first remap plugin.
>>>>> 
>>>>> My proposal is that ATS should execute all remap plugins and then
>> rewrite url.
>>>>> 
>>>>> More ditails are in
>>>>> - Issue : https://github.com/apache/trafficserver/issues/2877
>>>>> - Pull Request : https://github.com/apache/trafficserver/pull/4531
>>>>> 
>>>>> 
>>>>> What do you think about this?
>>>>> 
>>>>> Thank you.
>>>>> 
>>>> 
>>> 
>>> .
>> 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> *Beware the fisherman who's casting out his line in to a dried up riverbed.*
> *Oh don't try to tell him 'cause he won't believe. Throw some bread to the
> ducks instead.*
> *It's easier that way. *- Genesis : Duke : VI 25-28