You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by "Vadim N. Lyalikov" <va...@yandex.ru> on 2004/11/02 22:02:43 UTC

[users@httpd] Re: [OBORONA-SPAM] RE: [users@httpd] escaped input mod_rewrite

Emyr Tomos writes:

>Surely if you are after search-engine friendly urls the easiest way is
>to do it the other way 'round - have a bunch of "friendly" urls, and
>either re-write them to a script which will take the PATH_INFO and
>interpret it into your database friendly strings or, easier still,
>simply put a script at the root of your site which takes in PATH_INFO
>e.g each url is of the form
>http://server/script/cakes/wedding/3tiered.html and /script gets
>/cakes/wedding/3tier.html from which it extracts cakes, wedding, 3tier
>and turns it into a database-friendly url. Where does the randomness
>come in? Surely the whole point is that you want specific database
>identifiers to be replaced by more generally understood terms like
>cakes, cars, golf clubs. 
>Am I missing the point?
>  
>
If i want, that user (baker) can himself add *any* wedding cake names he 
wants to database dynamically, e.g. set of 2 names :
"My-/?slashed-cake/name%2F" and
"My-/?slashed-cake%2Fname/",
exactly these names, awful for apache people, but beautiful for my  
eccentric baker; with slashes, percent signs, interrogation sign ...
Can i handle this situation with your scheme?
Vadim.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


RE: [users@httpd] escaped input mod_rewrite

Posted by Tim Burden <ti...@burden.ca>.
I agree whole heartedly. With respect to search engines, I have not seen any
evidence that they care about keywords in URLs. Particularly Google. (I have
seen lots of people state that this is the case, but no evidence).

But for sure they will choke on special characters. Even if Google manages
to index the page, it will treat the page differently in terms of how it
passes on PageRank and so on. That's why base-64 isn't too good for URLs,
because it has those '=' signs that spiders may think are part of dynamic
URLs. You're better off to just use a numerical database ID in the URL. By
the way this  resolves a security issue as well because you can be very
explicit with what kind of data you are passing to your database query.

With respect to the human-friendliness issue, there are really just two
factors to worry about: how long is the URL (like...does it break when you
try to send it by email) and can you "hack" the URL and get back to some
kind of reasonable index. For example, if you had
domain.com/Cakes/78263478.htm can you clip off the end piece and get some
reasonable page at /Cakes/. Beyond that, human users are not going to sit
there and try to guess your cake names any more than they are going to sit
there and guess at database IDs.


----- Original Message ----- 
From: "Robert Andersson" <ro...@profundis.nu>
To: <us...@httpd.apache.org>
Sent: Wednesday, November 03, 2004 11:43 AM
Subject: Re: [users@httpd] Re: [OBORONA-SPAM] Re: [users@httpd] Re:
[OBORONA-SPAM] RE: [users@httpd] escaped input mod_rewrite


> [plain text, please...]
>
> Vadim N. Lyalikov wrote:
> > I'll be glad to do smth like base-64, but these url would not be human
> > friendly.
> > And may be nor search engine friendly. I think that keywords (cake names
> > ...)
> > give some points, when searched by spider.
>
> As pointed out, you will not be able to use arbitrary URIs. Even if some
> clients and Apache would deal with it, most of the search engines you care
> about will choke on them.
>
> My suggest is to process the name (the key) first. Eg. Make it all lower
> case, replace white space with "_", other separators with "-", replace
> certain characters (oumlats etc), and a few other special cases. Then,
> finally remove any invalid characters.
>
> This should be done before inserting it in the database, where it must be
> unique, and then used when creating links.
>
> Regards,
> Robert Andersson
>
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Re: [OBORONA-SPAM] Re: [users@httpd] Re: [OBORONA-SPAM] RE: [users@httpd] escaped input mod_rewrite

Posted by Robert Andersson <ro...@profundis.nu>.
[plain text, please...]

Vadim N. Lyalikov wrote:
> I'll be glad to do smth like base-64, but these url would not be human 
> friendly.
> And may be nor search engine friendly. I think that keywords (cake names 
> ...)
> give some points, when searched by spider.

As pointed out, you will not be able to use arbitrary URIs. Even if some 
clients and Apache would deal with it, most of the search engines you care 
about will choke on them.

My suggest is to process the name (the key) first. Eg. Make it all lower 
case, replace white space with "_", other separators with "-", replace 
certain characters (oumlats etc), and a few other special cases. Then, 
finally remove any invalid characters.

This should be done before inserting it in the database, where it must be 
unique, and then used when creating links.

Regards,
Robert Andersson 


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Re: [OBORONA-SPAM] RE: [users@httpd] escaped input mod_rewrite

Posted by Joshua Slive <js...@gmail.com>.
On Wed, 03 Nov 2004 00:02:43 +0300, Vadim N. Lyalikov
<va...@yandex.ru> wrote:
> If i want, that user (baker) can himself add *any* wedding cake names he
> wants to database dynamically, e.g. set of 2 names :
> "My-/?slashed-cake/name%2F" and
> "My-/?slashed-cake%2Fname/",
> exactly these names, awful for apache people, but beautiful for my
> eccentric baker; with slashes, percent signs, interrogation sign ...
> Can i handle this situation with your scheme?

Have your app do its own encoding/escaping that replaces anything
dangerous.  Something like base-64 encoding or the like would probably
be easy and safe.

Yes, this sounds a little irritating, but as I said, you can't expect
to use arbitrary strings in the pathname. And getting rid of complex
escaping will make your life much easier with mdo_rewrite.

Joshua.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org