You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@trafficserver.apache.org by Réjean Bouchard <rb...@nexweb.ca> on 2013/06/20 14:49:16 UTC

Want to get the original URL

What I want to do with regex lookup.

 

for the moment, we can use regex to find some content of the cache on a
result page.  That result page show the URL of each file that satisfy the
regex expression in a table.  There is some "errors" on that page.

 

1 - The URL in that list are URL after remap is done (domain name and path
of the original URL can have change). Maybe that is normal but I think they
must be the originals ones.

 

2 - The link of each of the table entry are remapped URL too.  This is why
the link are not working.  They must be the originals URL too.

 

3 - Finally, this is the same problem for the checkbox.

 

When we click one of those link, we go to an "error page" with the remapped
URL written in red big font and the following message: "Cache Lookup Failed,
or missing in cluster".  This is because that link use the remapped URL
instead of the original one.  If we manually replace the link with the
original URL, all is working well.

So, what I exactly need to do is make links on that page the original ones
instead of the remapped ones.  That's why I need to know where I can find
tho original link from the structure I have in the functions used to display
these regex lookup results.

 

4 - Finally, this is the same problem when we check the checkbox and try to
click on the "DELETE" button.

 

So does anybody tell me where i can find those originals URL?

 

Thank you!

 

Réjean Bouchard

Nexweb

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

By the way this is to fix the TS-1767.

Réjean Bouchard
Nexweb

-----Message d'origine-----
De : Réjean Bouchard [mailto:rbouchard@nexweb.ca] 
Envoyé : 20 juin 2013 16:46
À : dev@trafficserver.apache.org
Cc : 'Leif Hedstrom'
Objet : RE: Want to get the original URL

The reason why I'm looking for this is simple.  The TS keep multiple copies
based on the inbound domain.  Here is a way to prouve this concept:

Create 2 domain ex: ts.mysite.com and ts2.mysite.com.
Remap those domains to www.mysite.com
Create test.txt file with the text "first file"
Go to ts.mysite.com/test.txt  :  you will see "first file"
Change test.txt content to "second file"
Go to ts2.mysite.com/test.txt  :  you will see "second file"
Clear browser cache
Change test.txt content to "third file"
Go to ts.mysite.com/test.txt  :  you will see "first file"
Go to ts2.mysite.com/test.txt  :  you will see "second file"
There is only one entry in cache if you scan it from regex search

So, the reason why you want to be able to see the original URL request is to
be able to flush all the version of test.txt.
Let say that you have a 15,000,000 images cached that is generated by users
and you want to purge the cache of every file that have some values in the
URL (ex: picture size 10X40).
Flushing the complete cache for that purpose can be trivial.  In the other
hand, having to generate a purge request for every image in the database is
not the optimal way and can be a pain.
Now, having the ability to purge from a regex can be the optimal and the
best solution.
I'm fixing the webUI for this purpose.  And since the system return only the
remapped URL and it's not possible to purge a remapped URL, it's not very
usefull.  I try the HTTPInfo->request_url_get() function return nothing, I
decided to ask here where the info was.

So, what would think if I fix the TS so this information may be available by
the function?  Do you see a reason why not?

Réjean Bouchard
Nexweb

-----Message d'origine-----
De : Leif Hedstrom [mailto:zwoop@apache.org] Envoyé : 20 juin 2013 10:42 À :
dev@trafficserver.apache.org Cc : Réjean Bouchard Objet : Re: Want to get
the original URL

On 6/20/13 6:49 AM, Réjean Bouchard wrote:
> 4 - Finally, this is the same problem when we check the checkbox and 
> try to click on the "DELETE" button.
>
>   
>
> So does anybody tell me where i can find those originals URL?

Once in the cache, you can not track it back to the "original URL" (I'm
fairly certain at least). There's a simple reason for this: There are no
guarantees of a 1-to-1 mapping. It's entirely possible, and sometimes
likely,  that 1,000 URLs can map to the same cache URL. Or 1,000,000 million
URLs...

If this is important to you, you can log both the pristine and remapped URL,
and build up some sort of relationship in an external system.

Cheers,

-- Leif

RE: Want to get the original URL

Posted by "Harris, Scott" <Sc...@sensis.com.au>.

I have been following this thread because I'm seeing the same results as Réjean and also originally raised TS-1767 which Réjean mentioned earlier in the thread.

In doing the same test as below to validate that 2 different client urls mapped to the same origin server only cache 1 object as explained in the example below I have found that changing the proxy.config.url_remap.pristine_host_hdr effects the outcome.

If proxy.config.url_remap.pristine_host_hdr = 0 then the example below works and my issues with TS-1767 go away.
With proxy.config.url_remap.pristine_host_hdr = 1, I then get an object cached for each client url and TS-1767 is an issue

Scott

-----Original Message-----
From: Leif Hedstrom [mailto:zwoop@apache.org] 
Sent: Friday, 21 June 2013 3:42 PM
To: Réjean Bouchard
Cc: dev@trafficserver.apache.org
Subject: Re: Want to get the original URL

On 6/20/13 2:46 PM, Réjean Bouchard wrote:
> The reason why I'm looking for this is simple.  The TS keep multiple 
> copies based on the inbound domain.  Here is a way to prouve this concept:
>
> Create 2 domain ex: ts.mysite.com and ts2.mysite.com.
> Remap those domains to www.mysite.com
> Create test.txt file with the text "first file"
> Go to ts.mysite.com/test.txt  :  you will see "first file"
> Change test.txt content to "second file"
> Go to ts2.mysite.com/test.txt  :  you will see "second file"
> Clear browser cache


Well, that's not how it's designed to behave, and I can not reproduce this in my own tests. This is what I have in my remap.config

map http://ts1.example.com  http://localhost:82 map http://ts2.example.com  http://localhost:82


I cleared the cache ("sudo traffic_server -Cclear"), and started it up:

$ curl -D - -H "Host: ts1.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 504 Not Cached

$ curl -D - -H "Host: ts2.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 504 Not Cached

Neither requests gives a cache hit. Now I allow it to cache for the ts1.example.com domain:

$ curl -D - -H "Host: ts1.example.com" http://localhost/test.txt
HTTP/1.1 200 OK


Then same tests as above:

$ curl -D - -H "Host: ts1.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 200 OK

$ curl -D - -H "Host: ts2.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 200 OK


I can also verify that both URLs gives the same response. And the Age: 
header (a good indicator) are identical, and I do not see an origin request for more than one request.


I have no idea why you are not getting this behavior. What you are 
experiencing is simply not how it works. A *wild* guess is that you are 
maybe doing Vary: on some headers, and that causes it to create 
different entries for various requests (which is as it should).

-- Leif



> Change test.txt content to "third file"
> Go to ts.mysite.com/test.txt  :  you will see "first file"
> Go to ts2.mysite.com/test.txt  :  you will see "second file"
> There is only one entry in cache if you scan it from regex search
>
> So, the reason why you want to be able to see the original URL request is to
> be able to flush all the version of test.txt.
> Let say that you have a 15,000,000 images cached that is generated by users
> and you want to purge the cache of every file that have some values in the
> URL (ex: picture size 10X40).
> Flushing the complete cache for that purpose can be trivial.  In the other
> hand, having to generate a purge request for every image in the database is
> not the optimal way and can be a pain.
> Now, having the ability to purge from a regex can be the optimal and the
> best solution.
> I'm fixing the webUI for this purpose.  And since the system return only the
> remapped URL and it's not possible to purge a remapped URL, it's not very
> usefull.  I try the HTTPInfo->request_url_get() function return nothing, I
> decided to ask here where the info was.
>
> So, what would think if I fix the TS so this information may be available by
> the function?  Do you see a reason why not?
>
>
> Réjean Bouchard
> Nexweb
>
>
> -----Message d'origine-----
> De : Leif Hedstrom [mailto:zwoop@apache.org]
> Envoyé : 20 juin 2013 10:42
> À : dev@trafficserver.apache.org
> Cc : Réjean Bouchard
> Objet : Re: Want to get the original URL
>
> On 6/20/13 6:49 AM, Réjean Bouchard wrote:
>> 4 - Finally, this is the same problem when we check the checkbox and
>> try to click on the "DELETE" button.
>>
>>
>>
>> So does anybody tell me where i can find those originals URL?
>
> Once in the cache, you can not track it back to the "original URL" (I'm
> fairly certain at least). There's a simple reason for this: There are no
> guarantees of a 1-to-1 mapping. It's entirely possible, and sometimes
> likely,  that 1,000 URLs can map to the same cache URL. Or 1,000,000 million
> URLs...
>
> If this is important to you, you can log both the pristine and remapped URL,
> and build up some sort of relationship in an external system.
>
> Cheers,
>
> -- Leif
>
>

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

I have another question for you: I found a member of URLImpl named
"the_request".  It's apparently design to content the original request.  If
ENABLE_SAVE_ORIGINAL_REQUEST is defined to a positive value, the TS is
already able to initialise that.  Is that a good way to save my needed
information?  If no, what do you think to be a good way to do that?

Thank you!
Réjean Bouchard
Nexweb

-----Message d'origine-----
De : Leif Hedstrom [mailto:zwoop@apache.org] 
Envoyé : 21 juin 2013 11:03
À : dev@trafficserver.apache.org
Cc : Réjean Bouchard
Objet : Re: Want to get the original URL

On 6/21/13 6:44 AM, Réjean Bouchard wrote:
> In my case,  the cache_vary_headers is off.  Here is all "vary" lines 
> in my records.config :
>
> CONFIG proxy.config.http.cache.enable_default_vary_headers INT 0 
> CONFIG proxy.config.http.cache.vary_default_text STRING NULL CONFIG 
> proxy.config.http.cache.vary_default_images STRING NULL CONFIG 
> proxy.config.http.cache.vary_default_other STRING NULL
>
> I attach the entire file with this mail.  I currently use version 3.2.4.
> What version did you use?  The question is: how can we purge a file if 
> the original URL is not available?

If you send a "PURGE" request it also goes through the remap rules, and will
purge the appropriate entry in cache.

     curl -X PURGE http://ts1.example.com/test.txt

I don't know if the cache inspector does the same (I'd imagine it would, but
not sure, I never use it).

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/26/13 10:04 AM, Réjean Bouchard wrote:
>
> Try that, you'll probably understand well.

yeah, I get it now. James pointed it out to me. The problem is that the 
cache inspector really ought to "delete" based on the cache key, and not a 
URL.... Can you file an RFE on that please? (Make sure there are no other 
bugs on a similar / same topic)

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/26/13 10:04 AM, Réjean Bouchard wrote:
> Maybe my explanations are not clear enough.
>
> We always have a lot of images in cache.  That's why we have to do a regex
> search using the webUI ( http://exemple_domain/cache/lookup_regex_form ).
>  From that result page we obtain after the search, it's impossible to do a
> URL lookup by clicking the name of the file or delete a file by checking a
> box and clicking the "delete" button at the end of the page.  We not
> necessary know the full name of all of images.  That's why we want to use
> the regex search result page.

Fwiw, what I would do, if it was me, would be to log all client request 
URLs, and produce a little DB (anything will do), and then apply the regex 
to that list of URL when you need to purge based on regexes. Then simply 
PURGE those URLs that matches.

The "problem" with the ATS cache is that it consumes and stores minimal 
amounts of RAM (10 bytes per cache object). This is certainly by design, to 
allow for very large caches. Adding additional information, such as lists of 
active URLs, is not part of the design (and hopefully never will be), for 
these special cases, do special things outside of the cache :).

One interesting idea that GoDaddy implements in their CDN is to create 
generation IDs for certain content. They do it per domain, but you do it 
based on e.g. a prefix of the URL path. It allows for instant (O(1)) cache 
invalidation of large portions of the cache. This requires a custom plugin, 
of course.

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/26/13 10:04 AM, Réjean Bouchard wrote:
> Maybe my explanations are not clear enough.
>
> We always have a lot of images in cache.  That's why we have to do a regex
> search using the webUI ( http://exemple_domain/cache/lookup_regex_form ).
>  From that result page we obtain after the search, it's impossible to do a
> URL lookup by clicking the name of the file or delete a file by checking a
> box and clicking the "delete" button at the end of the page.  We not
> necessary know the full name of all of images.  That's why we want to use
> the regex search result page.

This is almost guaranteed to not work. Doing a regex lookup over any 
sizeable cache is incredibly expensive. You have to traverse the entire 
cache on every lookup.
>
> Try that, you'll probably understand well.

Ok, I'll take a look.

-- leif

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

Maybe my explanations are not clear enough.

We always have a lot of images in cache.  That's why we have to do a regex
search using the webUI ( http://exemple_domain/cache/lookup_regex_form ).
>From that result page we obtain after the search, it's impossible to do a
URL lookup by clicking the name of the file or delete a file by checking a
box and clicking the "delete" button at the end of the page.  We not
necessary know the full name of all of images.  That's why we want to use
the regex search result page.

Try that, you'll probably understand well.

Thank's for your help.
Réjean Bouchard
Nexweb

-----Message d'origine-----
De : Leif Hedstrom [mailto:zwoop@apache.org] 
Envoyé : 26 juin 2013 11:18
À : dev@trafficserver.apache.org
Cc : Réjean Bouchard
Objet : Re: Want to get the original URL

On 6/26/13 5:28 AM, Réjean Bouchard wrote:
> You're right!  But if you use the web UI, do a regex search and click 
> on one of the file name, it's impossible for the UI to do the same 
> PURGE, DELETE or URL_Lookup .  That why I need those "real file name" 
> to be saved in the cache.

So I still don't understand this. When I use the cache inspector, it works
just as expected. I search a URL, and it will show a page like this:

     http://people.apache.org/~zwoop/ATS/cache-inspector.png

If I click the "Delete URL" button, all those alternates are nuked. Doing a
quick search after doing the Delete, I get this from the CI:

     http://www.boot.org

     Cache Lookup Failed, or missing in cluster

/me confused. You really don't need the original URLs to purge items out of
the cache, but I'm guessing I'm missing something here ...

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/26/13 5:28 AM, Réjean Bouchard wrote:
> You're right!  But if you use the web UI, do a regex search and click on one
> of the file name, it's impossible for the UI to do the same PURGE, DELETE or
> URL_Lookup .  That why I need those "real file name" to be saved in the
> cache.

So I still don't understand this. When I use the cache inspector, it works 
just as expected. I search a URL, and it will show a page like this:

     http://people.apache.org/~zwoop/ATS/cache-inspector.png

If I click the "Delete URL" button, all those alternates are nuked. Doing a 
quick search after doing the Delete, I get this from the CI:

     http://www.boot.org

     Cache Lookup Failed, or missing in cluster

/me confused. You really don't need the original URLs to purge items out of 
the cache, but I'm guessing I'm missing something here ...

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/26/13 10:33 AM, Réjean Bouchard wrote:
> I understand your point but there is legal obligation that might require you
> to know what's is inside your cache.  Let's say some illegal content get
> cached but you might not know the full url of that.  If you want to purge
> that and are only aware of the domain.  You may want to purge only the
> content from that domain and not the full cache.  Let say you have 75000000
> images and the TS is caching dynamic resized image generator.  There is
> valid reason to do it on the fly and you might not be aware of what your
> cache currently hold and one of those image is simply illegal.  In that case
> it's really usefull to be able to search the cache to be sure to delete the
> file.

If you want to purge that much, and can do generation IDs by e.g. domains, 
you should do that. E.g.

     https://github.com/godaddy/ats-plugin-cache-key-genid

This is by far the best solution for massive purges, but puts constraints on 
how you partition your cache data (by domain, or by URL prefixes or 
something that you can impose a generation ID upon).

If you have a cache of the size I think you are, the regex searches will 
take days or even weeks to complete. Probably not what you want :). You can 
still do what i suggested, presumably you log all requests, so building a 
little DB over all URLs that has hit the cache is not difficult, and you can 
manage that in some way (outside of ATS) such that it's optimal for your use 
case.

-- leif

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

I understand your point but there is legal obligation that might require you
to know what's is inside your cache.  Let's say some illegal content get
cached but you might not know the full url of that.  If you want to purge
that and are only aware of the domain.  You may want to purge only the
content from that domain and not the full cache.  Let say you have 75000000
images and the TS is caching dynamic resized image generator.  There is
valid reason to do it on the fly and you might not be aware of what your
cache currently hold and one of those image is simply illegal.  In that case
it's really usefull to be able to search the cache to be sure to delete the
file.
I think that fixing the admin is not that much work.  It's a great tool for
us and for others.  I agree that some sites with fully database driven
content can do their purge one by one alone.  But it's not always the case.
May by there is other way to solve that but for the moment, fixing the admin
is the fastest way.

Thank's
Réjean Bouchard
Nexweb

-----Message d'origine-----
De : James Peach [mailto:jpeach@apache.org] 
Envoyé : 26 juin 2013 11:42
À : dev
Objet : Re: Want to get the original URL

On Jun 26, 2013, at 4:28 AM, Réjean Bouchard <rb...@nexweb.ca> wrote:

> You're right!  But if you use the web UI, do a regex search and click 
> on one of the file name, it's impossible for the UI to do the same 
> PURGE, DELETE or URL_Lookup .  That why I need those "real file name" 
> to be saved in the cache.

Does your workflow really depend on being able to search the cache with a
regex? The recommended way to remove objects from the cache is to use the
PURGE method.

My opinion is that the web UI should be removed and that cache management
tools should be separated from the Traffic Server core. That's low priority
and a bit tricky due to some implementation details, but eventually I hope
to remove it :)

J

> 
> Thank You!
> Réjean Bouchard
> Nexweb
> 
> -----Message d'origine-----
> De : Leif Hedstrom [mailto:zwoop@apache.org] Envoyé : 21 juin 2013 
> 11:03 À : dev@trafficserver.apache.org Cc : Réjean Bouchard Objet : 
> Re: Want to get the original URL
> 
> On 6/21/13 6:44 AM, Réjean Bouchard wrote:
>> In my case,  the cache_vary_headers is off.  Here is all "vary" lines 
>> in my records.config :
>> 
>> CONFIG proxy.config.http.cache.enable_default_vary_headers INT 0 
>> CONFIG proxy.config.http.cache.vary_default_text STRING NULL CONFIG 
>> proxy.config.http.cache.vary_default_images STRING NULL CONFIG 
>> proxy.config.http.cache.vary_default_other STRING NULL
>> 
>> I attach the entire file with this mail.  I currently use version 3.2.4.
>> What version did you use?  The question is: how can we purge a file 
>> if the original URL is not available?
> 
> If you send a "PURGE" request it also goes through the remap rules, 
> and will purge the appropriate entry in cache.
> 
>     curl -X PURGE http://ts1.example.com/test.txt
> 
> I don't know if the cache inspector does the same (I'd imagine it 
> would, but not sure, I never use it).
> 
> -- Leif

Re: Want to get the original URL

Posted by James Peach <jp...@apache.org>.

On Jun 26, 2013, at 4:28 AM, Réjean Bouchard <rb...@nexweb.ca> wrote:

> You're right!  But if you use the web UI, do a regex search and click on one
> of the file name, it's impossible for the UI to do the same PURGE, DELETE or
> URL_Lookup .  That why I need those "real file name" to be saved in the
> cache.

Does your workflow really depend on being able to search the cache with a regex? The recommended way to remove objects from the cache is to use the PURGE method.

My opinion is that the web UI should be removed and that cache management tools should be separated from the Traffic Server core. That's low priority and a bit tricky due to some implementation details, but eventually I hope to remove it :)

J

> 
> Thank You!
> Réjean Bouchard
> Nexweb
> 
> -----Message d'origine-----
> De : Leif Hedstrom [mailto:zwoop@apache.org] 
> Envoyé : 21 juin 2013 11:03
> À : dev@trafficserver.apache.org
> Cc : Réjean Bouchard
> Objet : Re: Want to get the original URL
> 
> On 6/21/13 6:44 AM, Réjean Bouchard wrote:
>> In my case,  the cache_vary_headers is off.  Here is all "vary" lines 
>> in my records.config :
>> 
>> CONFIG proxy.config.http.cache.enable_default_vary_headers INT 0 
>> CONFIG proxy.config.http.cache.vary_default_text STRING NULL CONFIG 
>> proxy.config.http.cache.vary_default_images STRING NULL CONFIG 
>> proxy.config.http.cache.vary_default_other STRING NULL
>> 
>> I attach the entire file with this mail.  I currently use version 3.2.4.
>> What version did you use?  The question is: how can we purge a file if 
>> the original URL is not available?
> 
> If you send a "PURGE" request it also goes through the remap rules, and will
> purge the appropriate entry in cache.
> 
>     curl -X PURGE http://ts1.example.com/test.txt
> 
> I don't know if the cache inspector does the same (I'd imagine it would, but
> not sure, I never use it).
> 
> -- Leif

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

You're right!  But if you use the web UI, do a regex search and click on one
of the file name, it's impossible for the UI to do the same PURGE, DELETE or
URL_Lookup .  That why I need those "real file name" to be saved in the
cache.

Thank You!
Réjean Bouchard
Nexweb

-----Message d'origine-----
De : Leif Hedstrom [mailto:zwoop@apache.org] 
Envoyé : 21 juin 2013 11:03
À : dev@trafficserver.apache.org
Cc : Réjean Bouchard
Objet : Re: Want to get the original URL

On 6/21/13 6:44 AM, Réjean Bouchard wrote:
> In my case,  the cache_vary_headers is off.  Here is all "vary" lines 
> in my records.config :
>
> CONFIG proxy.config.http.cache.enable_default_vary_headers INT 0 
> CONFIG proxy.config.http.cache.vary_default_text STRING NULL CONFIG 
> proxy.config.http.cache.vary_default_images STRING NULL CONFIG 
> proxy.config.http.cache.vary_default_other STRING NULL
>
> I attach the entire file with this mail.  I currently use version 3.2.4.
> What version did you use?  The question is: how can we purge a file if 
> the original URL is not available?

If you send a "PURGE" request it also goes through the remap rules, and will
purge the appropriate entry in cache.

     curl -X PURGE http://ts1.example.com/test.txt

I don't know if the cache inspector does the same (I'd imagine it would, but
not sure, I never use it).

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/21/13 6:44 AM, Réjean Bouchard wrote:
> In my case,  the cache_vary_headers is off.  Here is all "vary" lines in my
> records.config :
>
> CONFIG proxy.config.http.cache.enable_default_vary_headers INT 0
> CONFIG proxy.config.http.cache.vary_default_text STRING NULL
> CONFIG proxy.config.http.cache.vary_default_images STRING NULL
> CONFIG proxy.config.http.cache.vary_default_other STRING NULL
>
> I attach the entire file with this mail.  I currently use version 3.2.4.
> What version did you use?  The question is: how can we purge a file if the
> original URL is not available?

If you send a "PURGE" request it also goes through the remap rules, and 
will purge the appropriate entry in cache.

     curl -X PURGE http://ts1.example.com/test.txt

I don't know if the cache inspector does the same (I'd imagine it would, 
but not sure, I never use it).

-- Leif

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

In my case,  the cache_vary_headers is off.  Here is all "vary" lines in my
records.config :

CONFIG proxy.config.http.cache.enable_default_vary_headers INT 0
CONFIG proxy.config.http.cache.vary_default_text STRING NULL
CONFIG proxy.config.http.cache.vary_default_images STRING NULL
CONFIG proxy.config.http.cache.vary_default_other STRING NULL

I attach the entire file with this mail.  I currently use version 3.2.4.
What version did you use?  The question is: how can we purge a file if the
original URL is not available?

Thanks for your time and help.

Réjean Bouchard
Nexweb


-----Message d'origine-----
De : Leif Hedstrom [mailto:zwoop@apache.org] 
Envoyé : 21 juin 2013 01:42
À : Réjean Bouchard
Cc : dev@trafficserver.apache.org
Objet : Re: Want to get the original URL

On 6/20/13 2:46 PM, Réjean Bouchard wrote:
> The reason why I'm looking for this is simple.  The TS keep multiple 
> copies based on the inbound domain.  Here is a way to prouve this concept:
>
> Create 2 domain ex: ts.mysite.com and ts2.mysite.com.
> Remap those domains to www.mysite.com
> Create test.txt file with the text "first file"
> Go to ts.mysite.com/test.txt  :  you will see "first file"
> Change test.txt content to "second file"
> Go to ts2.mysite.com/test.txt  :  you will see "second file"
> Clear browser cache


Well, that's not how it's designed to behave, and I can not reproduce this
in my own tests. This is what I have in my remap.config

map http://ts1.example.com  http://localhost:82 map http://ts2.example.com
http://localhost:82


I cleared the cache ("sudo traffic_server -Cclear"), and started it up:

$ curl -D - -H "Host: ts1.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 504 Not Cached

$ curl -D - -H "Host: ts2.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 504 Not Cached

Neither requests gives a cache hit. Now I allow it to cache for the
ts1.example.com domain:

$ curl -D - -H "Host: ts1.example.com" http://localhost/test.txt
HTTP/1.1 200 OK


Then same tests as above:

$ curl -D - -H "Host: ts1.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 200 OK

$ curl -D - -H "Host: ts2.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 200 OK


I can also verify that both URLs gives the same response. And the Age: 
header (a good indicator) are identical, and I do not see an origin request
for more than one request.


I have no idea why you are not getting this behavior. What you are 
experiencing is simply not how it works. A *wild* guess is that you are 
maybe doing Vary: on some headers, and that causes it to create 
different entries for various requests (which is as it should).

-- Leif



> Change test.txt content to "third file"
> Go to ts.mysite.com/test.txt  :  you will see "first file"
> Go to ts2.mysite.com/test.txt  :  you will see "second file"
> There is only one entry in cache if you scan it from regex search
>
> So, the reason why you want to be able to see the original URL request is
to
> be able to flush all the version of test.txt.
> Let say that you have a 15,000,000 images cached that is generated by
users
> and you want to purge the cache of every file that have some values in the
> URL (ex: picture size 10X40).
> Flushing the complete cache for that purpose can be trivial.  In the other
> hand, having to generate a purge request for every image in the database
is
> not the optimal way and can be a pain.
> Now, having the ability to purge from a regex can be the optimal and the
> best solution.
> I'm fixing the webUI for this purpose.  And since the system return only
the
> remapped URL and it's not possible to purge a remapped URL, it's not very
> usefull.  I try the HTTPInfo->request_url_get() function return nothing, I
> decided to ask here where the info was.
>
> So, what would think if I fix the TS so this information may be available
by
> the function?  Do you see a reason why not?
>
>
> Réjean Bouchard
> Nexweb
>
>
> -----Message d'origine-----
> De : Leif Hedstrom [mailto:zwoop@apache.org]
> Envoyé : 20 juin 2013 10:42
> À : dev@trafficserver.apache.org
> Cc : Réjean Bouchard
> Objet : Re: Want to get the original URL
>
> On 6/20/13 6:49 AM, Réjean Bouchard wrote:
>> 4 - Finally, this is the same problem when we check the checkbox and
>> try to click on the "DELETE" button.
>>
>>
>>
>> So does anybody tell me where i can find those originals URL?
>
> Once in the cache, you can not track it back to the "original URL" (I'm
> fairly certain at least). There's a simple reason for this: There are no
> guarantees of a 1-to-1 mapping. It's entirely possible, and sometimes
> likely,  that 1,000 URLs can map to the same cache URL. Or 1,000,000
million
> URLs...
>
> If this is important to you, you can log both the pristine and remapped
URL,
> and build up some sort of relationship in an external system.
>
> Cheers,
>
> -- Leif
>
>

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/20/13 2:46 PM, Réjean Bouchard wrote:
> The reason why I'm looking for this is simple.  The TS keep multiple copies
> based on the inbound domain.  Here is a way to prouve this concept:
>
> Create 2 domain ex: ts.mysite.com and ts2.mysite.com.
> Remap those domains to www.mysite.com
> Create test.txt file with the text "first file"
> Go to ts.mysite.com/test.txt  :  you will see "first file"
> Change test.txt content to "second file"
> Go to ts2.mysite.com/test.txt  :  you will see "second file"
> Clear browser cache


Well, that's not how it's designed to behave, and I can not reproduce 
this in my own tests. This is what I have in my remap.config

map http://ts1.example.com  http://localhost:82
map http://ts2.example.com  http://localhost:82


I cleared the cache ("sudo traffic_server -Cclear"), and started it up:

$ curl -D - -H "Host: ts1.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 504 Not Cached

$ curl -D - -H "Host: ts2.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 504 Not Cached

Neither requests gives a cache hit. Now I allow it to cache for the 
ts1.example.com domain:

$ curl -D - -H "Host: ts1.example.com" http://localhost/test.txt
HTTP/1.1 200 OK


Then same tests as above:

$ curl -D - -H "Host: ts1.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 200 OK

$ curl -D - -H "Host: ts2.example.com" -H "Cache-Control: 
only-if-cached" http://localhost/test.txt
HTTP/1.1 200 OK


I can also verify that both URLs gives the same response. And the Age: 
header (a good indicator) are identical, and I do not see an origin 
request for more than one request.


I have no idea why you are not getting this behavior. What you are 
experiencing is simply not how it works. A *wild* guess is that you are 
maybe doing Vary: on some headers, and that causes it to create 
different entries for various requests (which is as it should).

-- Leif



> Change test.txt content to "third file"
> Go to ts.mysite.com/test.txt  :  you will see "first file"
> Go to ts2.mysite.com/test.txt  :  you will see "second file"
> There is only one entry in cache if you scan it from regex search
>
> So, the reason why you want to be able to see the original URL request is to
> be able to flush all the version of test.txt.
> Let say that you have a 15,000,000 images cached that is generated by users
> and you want to purge the cache of every file that have some values in the
> URL (ex: picture size 10X40).
> Flushing the complete cache for that purpose can be trivial.  In the other
> hand, having to generate a purge request for every image in the database is
> not the optimal way and can be a pain.
> Now, having the ability to purge from a regex can be the optimal and the
> best solution.
> I'm fixing the webUI for this purpose.  And since the system return only the
> remapped URL and it's not possible to purge a remapped URL, it's not very
> usefull.  I try the HTTPInfo->request_url_get() function return nothing, I
> decided to ask here where the info was.
>
> So, what would think if I fix the TS so this information may be available by
> the function?  Do you see a reason why not?
>
>
> Réjean Bouchard
> Nexweb
>
>
> -----Message d'origine-----
> De : Leif Hedstrom [mailto:zwoop@apache.org]
> Envoyé : 20 juin 2013 10:42
> À : dev@trafficserver.apache.org
> Cc : Réjean Bouchard
> Objet : Re: Want to get the original URL
>
> On 6/20/13 6:49 AM, Réjean Bouchard wrote:
>> 4 - Finally, this is the same problem when we check the checkbox and
>> try to click on the "DELETE" button.
>>
>>
>>
>> So does anybody tell me where i can find those originals URL?
>
> Once in the cache, you can not track it back to the "original URL" (I'm
> fairly certain at least). There's a simple reason for this: There are no
> guarantees of a 1-to-1 mapping. It's entirely possible, and sometimes
> likely,  that 1,000 URLs can map to the same cache URL. Or 1,000,000 million
> URLs...
>
> If this is important to you, you can log both the pristine and remapped URL,
> and build up some sort of relationship in an external system.
>
> Cheers,
>
> -- Leif
>
>

RE: Want to get the original URL

Posted by Réjean Bouchard <rb...@nexweb.ca>.

The reason why I'm looking for this is simple.  The TS keep multiple copies
based on the inbound domain.  Here is a way to prouve this concept:

Create 2 domain ex: ts.mysite.com and ts2.mysite.com.
Remap those domains to www.mysite.com
Create test.txt file with the text "first file"
Go to ts.mysite.com/test.txt  :  you will see "first file"
Change test.txt content to "second file"
Go to ts2.mysite.com/test.txt  :  you will see "second file"
Clear browser cache
Change test.txt content to "third file"
Go to ts.mysite.com/test.txt  :  you will see "first file"
Go to ts2.mysite.com/test.txt  :  you will see "second file"
There is only one entry in cache if you scan it from regex search

So, the reason why you want to be able to see the original URL request is to
be able to flush all the version of test.txt.
Let say that you have a 15,000,000 images cached that is generated by users
and you want to purge the cache of every file that have some values in the
URL (ex: picture size 10X40).
Flushing the complete cache for that purpose can be trivial.  In the other
hand, having to generate a purge request for every image in the database is
not the optimal way and can be a pain.
Now, having the ability to purge from a regex can be the optimal and the
best solution.
I'm fixing the webUI for this purpose.  And since the system return only the
remapped URL and it's not possible to purge a remapped URL, it's not very
usefull.  I try the HTTPInfo->request_url_get() function return nothing, I
decided to ask here where the info was.

So, what would think if I fix the TS so this information may be available by
the function?  Do you see a reason why not?

Réjean Bouchard
Nexweb

-----Message d'origine-----
De : Leif Hedstrom [mailto:zwoop@apache.org] 
Envoyé : 20 juin 2013 10:42
À : dev@trafficserver.apache.org
Cc : Réjean Bouchard
Objet : Re: Want to get the original URL

On 6/20/13 6:49 AM, Réjean Bouchard wrote:
> 4 - Finally, this is the same problem when we check the checkbox and 
> try to click on the "DELETE" button.
>
>   
>
> So does anybody tell me where i can find those originals URL?

Once in the cache, you can not track it back to the "original URL" (I'm
fairly certain at least). There's a simple reason for this: There are no
guarantees of a 1-to-1 mapping. It's entirely possible, and sometimes
likely,  that 1,000 URLs can map to the same cache URL. Or 1,000,000 million
URLs...

If this is important to you, you can log both the pristine and remapped URL,
and build up some sort of relationship in an external system.

Cheers,

-- Leif

Re: Want to get the original URL

Posted by Leif Hedstrom <zw...@apache.org>.

On 6/20/13 6:49 AM, Réjean Bouchard wrote:
> 4 - Finally, this is the same problem when we check the checkbox and try to
> click on the "DELETE" button.
>
>   
>
> So does anybody tell me where i can find those originals URL?

Once in the cache, you can not track it back to the "original URL" (I'm 
fairly certain at least). There's a simple reason for this: There are no 
guarantees of a 1-to-1 mapping. It's entirely possible, and sometimes 
likely,  that 1,000 URLs can map to the same cache URL. Or 1,000,000 
million URLs...

If this is important to you, you can log both the pristine and remapped 
URL, and build up some sort of relationship in an external system.

Cheers,

-- Leif