You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Jörn Nettingsmeier <po...@uni-duisburg.de> on 2006/01/25 01:01:12 UTC

please never abuse uris to contain parameters.

solprovider@apache.org wrote:
> Devs - One of the reasons to move Modules/Usecases to the Area part of
> the URL is so this is possible without using the querystring.  Instead
> of:
>   http://myserver/mypub/live/doc.html?lenya.module=xsl&xsl=differentxsl
> use:
>   http://myserver/mypub/xsl/differentxsl/doc.html
> (assuming the new module is named "xsl" and requires one parameter.) 
> Using my code, the "xsl" Module does its magic and passes processing
> to the "live" module.

not quite related to this thread, but...

please don't move parameters or usecase identifiers into the uri.
most search engines and robots handle uris and get params differently,
and they know why. controls that define the looks of a page (as opposed
to the content) should be in a GET parameter.
a uri is a resource identifier, note the "uniform" part. :-D
sticking presentation or session-related information into the uri is
just abysmally wrong.

imho, even the area part itself is wrong, but since the users won't ever
see the "authoring" part, i just shrug and live with it. but the url
kludge is definitely the least sexy part of lenya.

from solprovider's suggestion it's a small step to including session
cookies in the uri. don't laugh, there are a couple hare-brained cm
systems out there that do exactly that. how do i know? well, i tried to
mirror some sites of german electoral candidates for an empirical study.
i tried to do incremental dumps over a couple of weeks, but ended up
dumping the whole tree each time without ever realizing, and all the
automated analysis stuff i hacked up just fell to pieces.

don't go down that path, please, or i'll switch to frontpage. (does that
make clear how serious i am?)


regards,

jörn




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: please never abuse uris to contain parameters.

Posted by Jörn Nettingsmeier <po...@uni-duisburg.de>.
solprovider@apache.org wrote:
> On 1/25/06, Joern Nettingsmeier <po...@uni-duisburg.de> wrote:
>> solprovider@apache.org wrote:
>>>> please don't move parameters or usecase identifiers into the uri.
>> my wording was too general. see below for clarification.
>>>> most search engines and robots handle uris and get params
>>>> differently, and they know why. controls that define the looks of a
>>>> page (as opposed to the content) should be in a GET parameter. a
>>>> uri is a resource identifier, note the "uniform" part. :-D sticking
>>>> presentation or session-related information into the uri is just
>>>> abysmally wrong.
>>> Lenya uses parameters and usecase identifiers to create unique pages:
>> i'm aware of that, and i don't like it. a "usecase" should mean "here's
>> a base url" (a *resource*), do something special with it. i.e. change
>> presentation, show me another subset of data, etc.
>>
>> of course there are usecases that are not related to pages and content
>> (such as login). in that case the base url is state information (i.e.
>> where to go back to after finishing the usecase). i don't like that
>> either. say i have a site with login. doing it with a usecase
>> querystring means that somebody who mirrors it gets to pull one login
>> page for every page that has a "login" link on it.
>>> http://myserver/mypub/live/index.html => Homepage
>>> http://myserver/mypub/live/index.html?lenya.usecase=login => Login
>>> page
>> much better would be a link to "login.html?context=somepage.html"
>> because now the content is in the url path and the usability details are
>> in the parameters.
>>
>>> http://myserver/mypub/live/index.html?lenya.usecase=map => Sitemap
>> that is really bad style imho, since the displayed output has nothing to
>> do with index.html at all. it should be sitemap.html?context=index.html
>> (if, say, you want to highlight the current page).
>>
>>> These "parameters" define distinct pages, and should be handled by
>>> search engines and robots as such.  This could be improved by
>>> changing the URLs to: http://myserver/mypub/live/index.html =>
>>> Homepage http://myserver/mypub/login => Login page
>>> http://myserver/mypub/map => Sitemap
>> this is good, sorry for being imprecise (it was late after a long job
>> yesterday). as i said above, there are things currently expressed with
>> querystring params that should be part of the base url.
>> i was thinking of your example to add stylesheet information to the base
>> url when i wrote my reply, and that is definitely bad imho.
>>
>>> The example you quoted was about using different stylesheets for one
>>> document.  One could have the full navigation.  One stylesheet could
>>> be "printable" with no navigation.  One could show META information.
>>>  One could show only the Headers.  Each can have a different layout
>>> and different graphics.  A visitor would consider them to be distinct
>>>  pages, so why is an important part of the URI in the querystring?
>> well, ultimately it's a matter of taste, but i believe in strict
>> separation of content and presentation, and since there is a
>> "traditional" semantic difference between base urls and querystring
>> params (the base url being the "resource" itself and the params being
>> additional information), anything presentation-related should be kept
>> out of the base url as much as possible so that a resource is a resource :)
>> (of course there can't be precise rules. it's similar to accessibility
>> in that respect, you can't validate it, you have to "get the idea".)
> 
> I agree that the destination page should be a parameter so the login
> and map URLs use the querystring for the context.  (And yes, my "map"
> Usecase does highlight the current page for "You were here.")
> 
> I think stylesheet information belongs in Lenya's base URLs because it
> controls everything.    Lenya stylesheets do not distinguish between
> presentation and function.  Changing the stylesheet could just change
> the colors, but that is less important than that it change the
> function to present a completely new page.

ok, if the stylesheet introduces such fundamental changes that the page 
could be considered totally different (again, a matter of taste and good 
judgement), i can agree with that. but the common case of alternate 
presentations (present a printer-friendly view etc.) should be handled 
by a parameter (like in your usecase example: 
?lenya.usecase=xsl&xsl=print). at least i would hate to be forced to 
have such things in the base url.

> http://myserver/mypub/live/index.html?lenya.usecase=login is awful.
> http://myserver/mypub/login is much better.
> http://myserver/mypub/login/returnpage.html fits your objections.
> http://myserver/mypub/login?return=returnpage.html satisifes everybody.

:-D

> The important design change is to move the function (Module name)
> before the data.  We cannot make a decision about whether to use the
> path or the querystring for parameters until then.
> 
>>>> imho, even the area part itself is wrong, but since the users won't
>>>> ever see the "authoring" part, i just shrug and live with it. but
>>>> the url kludge is definitely the least sexy part of lenya.
>>> Yes, that is why many sites proxy to hide the "/pubname/live".  But
>>> the "?lenya.usecase=login" is more difficult to hide.
>> agreed.
>>
>>> I want to move "live" and "authoring" to Usecases, except we are
>>> calling them "Modules" now.
>> seconded.
>>
>>> I want the Module name to replace the
>>> Area.  I want the "live" Module to be the default so it is not
>>> required in the URL.  It should also be easy to configure one of the
>>> Publications as the default so the server's homepage is a
>>> Publication's "live" homepage without specifying either the
>>> publication name or "live" in the URL.  Would you consider these
>>> improvements?
>> definitely. for authoring, i don't really care about url semantics and
>> whether stuff is in the base url or the querystring, since it's all
>> internal anyway. but in live, urls must be clean.
> 
> So you like my ideas. :)

for varying values of "like", but generally, yes. :-D

> ===
>>> Lenya uses
>>> javax.servlet.http.HttpServletResponse.encodeRedirectURL(), which
>>> puts the session id in the URL unless it has received a valid Cookie
>>> from the browser.  All good architectures have an alternative for old
>>> or crippled client software.  Disable Cookies for a Lenya site if you
>>> want to test it.
>> oh my god. i did not know that. it's ok for authoring, but for live,
>> this is a horrible mistake - it's deadly for people running a recursive
>> wget (which unless told otherwise will silently reject cookies).
>> is this cookie mechanism active in live/ by default?
> 
> Yes, but it is only active when Lenya writes the URLs.  Most
> stylesheets do not include the SessionID in their links.  Even the
> navigation elements forget to maintain the SessionID, so browsers
> without Cookies will not maintain the session without much development
> work.  The SessionID is normally only added during <map:redirect
> session="true">.  The only place it is noticable in the "default" pub
> is when match="/" is redirected to "/index.html".  It is unlikely to
> hurt you.

good to know.



-- 
"Open source takes the bullshit out of software."
	- Charles Ferguson on TechnologyReview.com

--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: pol-admin@uni-duisburg.de, Telefon: 0203/379-2736

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: please never abuse uris to contain parameters.

Posted by so...@apache.org.
On 1/25/06, Joern Nettingsmeier <po...@uni-duisburg.de> wrote:
> solprovider@apache.org wrote:
> >> please don't move parameters or usecase identifiers into the uri.
> my wording was too general. see below for clarification.
> >> most search engines and robots handle uris and get params
> >> differently, and they know why. controls that define the looks of a
> >> page (as opposed to the content) should be in a GET parameter. a
> >> uri is a resource identifier, note the "uniform" part. :-D sticking
> >> presentation or session-related information into the uri is just
> >> abysmally wrong.
> > Lenya uses parameters and usecase identifiers to create unique pages:
> i'm aware of that, and i don't like it. a "usecase" should mean "here's
> a base url" (a *resource*), do something special with it. i.e. change
> presentation, show me another subset of data, etc.
>
> of course there are usecases that are not related to pages and content
> (such as login). in that case the base url is state information (i.e.
> where to go back to after finishing the usecase). i don't like that
> either. say i have a site with login. doing it with a usecase
> querystring means that somebody who mirrors it gets to pull one login
> page for every page that has a "login" link on it.
> > http://myserver/mypub/live/index.html => Homepage
> > http://myserver/mypub/live/index.html?lenya.usecase=login => Login
> > page
> much better would be a link to "login.html?context=somepage.html"
> because now the content is in the url path and the usability details are
> in the parameters.
>
> > http://myserver/mypub/live/index.html?lenya.usecase=map => Sitemap
> that is really bad style imho, since the displayed output has nothing to
> do with index.html at all. it should be sitemap.html?context=index.html
> (if, say, you want to highlight the current page).
>
> > These "parameters" define distinct pages, and should be handled by
> > search engines and robots as such.  This could be improved by
> > changing the URLs to: http://myserver/mypub/live/index.html =>
> > Homepage http://myserver/mypub/login => Login page
> > http://myserver/mypub/map => Sitemap
> this is good, sorry for being imprecise (it was late after a long job
> yesterday). as i said above, there are things currently expressed with
> querystring params that should be part of the base url.
> i was thinking of your example to add stylesheet information to the base
> url when i wrote my reply, and that is definitely bad imho.
>
> > The example you quoted was about using different stylesheets for one
> > document.  One could have the full navigation.  One stylesheet could
> > be "printable" with no navigation.  One could show META information.
> >  One could show only the Headers.  Each can have a different layout
> > and different graphics.  A visitor would consider them to be distinct
> >  pages, so why is an important part of the URI in the querystring?
> well, ultimately it's a matter of taste, but i believe in strict
> separation of content and presentation, and since there is a
> "traditional" semantic difference between base urls and querystring
> params (the base url being the "resource" itself and the params being
> additional information), anything presentation-related should be kept
> out of the base url as much as possible so that a resource is a resource :)
> (of course there can't be precise rules. it's similar to accessibility
> in that respect, you can't validate it, you have to "get the idea".)

I agree that the destination page should be a parameter so the login
and map URLs use the querystring for the context.  (And yes, my "map"
Usecase does highlight the current page for "You were here.")

I think stylesheet information belongs in Lenya's base URLs because it
controls everything.    Lenya stylesheets do not distinguish between
presentation and function.  Changing the stylesheet could just change
the colors, but that is less important than that it change the
function to present a completely new page.

http://myserver/mypub/live/index.html?lenya.usecase=login is awful.
http://myserver/mypub/login is much better.
http://myserver/mypub/login/returnpage.html fits your objections.
http://myserver/mypub/login?return=returnpage.html satisifes everybody.

The important design change is to move the function (Module name)
before the data.  We cannot make a decision about whether to use the
path or the querystring for parameters until then.

> >> imho, even the area part itself is wrong, but since the users won't
> >> ever see the "authoring" part, i just shrug and live with it. but
> >> the url kludge is definitely the least sexy part of lenya.
> > Yes, that is why many sites proxy to hide the "/pubname/live".  But
> > the "?lenya.usecase=login" is more difficult to hide.
> agreed.
>
> > I want to move "live" and "authoring" to Usecases, except we are
> > calling them "Modules" now.
> seconded.
>
> > I want the Module name to replace the
> > Area.  I want the "live" Module to be the default so it is not
> > required in the URL.  It should also be easy to configure one of the
> > Publications as the default so the server's homepage is a
> > Publication's "live" homepage without specifying either the
> > publication name or "live" in the URL.  Would you consider these
> > improvements?
> definitely. for authoring, i don't really care about url semantics and
> whether stuff is in the base url or the querystring, since it's all
> internal anyway. but in live, urls must be clean.

So you like my ideas. :)

===
> > Lenya uses
> > javax.servlet.http.HttpServletResponse.encodeRedirectURL(), which
> > puts the session id in the URL unless it has received a valid Cookie
> > from the browser.  All good architectures have an alternative for old
> > or crippled client software.  Disable Cookies for a Lenya site if you
> > want to test it.
>
> oh my god. i did not know that. it's ok for authoring, but for live,
> this is a horrible mistake - it's deadly for people running a recursive
> wget (which unless told otherwise will silently reject cookies).
> is this cookie mechanism active in live/ by default?

Yes, but it is only active when Lenya writes the URLs.  Most
stylesheets do not include the SessionID in their links.  Even the
navigation elements forget to maintain the SessionID, so browsers
without Cookies will not maintain the session without much development
work.  The SessionID is normally only added during <map:redirect
session="true">.  The only place it is noticable in the "default" pub
is when match="/" is redirected to "/index.html".  It is unlikely to
hurt you.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: please never abuse uris to contain parameters.

Posted by Joern Nettingsmeier <po...@uni-duisburg.de>.
solprovider@apache.org wrote:

i wrote:
>> please don't move parameters or usecase identifiers into the uri. 

my wording was too general. see below for clarification.

>> most search engines and robots handle uris and get params
>> differently, and they know why. controls that define the looks of a
>> page (as opposed to the content) should be in a GET parameter. a
>> uri is a resource identifier, note the "uniform" part. :-D sticking
>> presentation or session-related information into the uri is just
>> abysmally wrong.
> 
> Lenya uses parameters and usecase identifiers to create unique pages:

i'm aware of that, and i don't like it. a "usecase" should mean "here's
a base url" (a *resource*), do something special with it. i.e. change
presentation, show me another subset of data, etc.

of course there are usecases that are not related to pages and content
(such as login). in that case the base url is state information (i.e.
where to go back to after finishing the usecase). i don't like that
either. say i have a site with login. doing it with a usecase
querystring means that somebody who mirrors it gets to pull one login
page for every page that has a "login" link on it.

> http://myserver/mypub/live/index.html => Homepage 
> http://myserver/mypub/live/index.html?lenya.usecase=login => Login
> page

much better would be a link to "login.html?context=somepage.html"
because now the content is in the url path and the usability details are
in the parameters.

> http://myserver/mypub/live/index.html?lenya.usecase=map => Sitemap

that is really bad style imho, since the displayed output has nothing to
do with index.html at all. it should be sitemap.html?context=index.html
(if, say, you want to highlight the current page).

> These "parameters" define distinct pages, and should be handled by 
> search engines and robots as such.  This could be improved by
> changing the URLs to: http://myserver/mypub/live/index.html =>
> Homepage http://myserver/mypub/login => Login page 
> http://myserver/mypub/map => Sitemap
> 
> How is this bad?

this is good, sorry for being imprecise (it was late after a long job
yesterday). as i said above, there are things currently expressed with
querystring params that should be part of the base url.
i was thinking of your example to add stylesheet information to the base
url when i wrote my reply, and that is definitely bad imho.

> The example you quoted was about using different stylesheets for one 
> document.  One could have the full navigation.  One stylesheet could 
> be "printable" with no navigation.  One could show META information.
>  One could show only the Headers.  Each can have a different layout
> and different graphics.  A visitor would consider them to be distinct
>  pages, so why is an important part of the URI in the querystring?

well, ultimately it's a matter of taste, but i believe in strict
separation of content and presentation, and since there is a
"traditional" semantic difference between base urls and querystring
params (the base url being the "resource" itself and the params being
additional information), anything presentation-related should be kept
out of the base url as much as possible so that a resource is a resource :)
(of course there can't be precise rules. it's similar to accessibility
in that respect, you can't validate it, you have to "get the idea".)

> For the record, the "parameters" in a URL are called the
> "querystring" because their first use was searching.  But even the
> following URLs produce distinct pages: 
> http://www.google.com/search?q=apache 
> http://www.google.com/search?q=lenya

well, for highly volatile content (i.e. *queries*) querystrings are of
course fine. the resource in this case is the google index.

> The querystring is sent as part of the GET line of HTTP.  In the old 
> days, websites would use an exclamation mark rather than a question 
> mark before their parameters just so search engines and robots would 
> see distinct URLs.

and that's what i don't like. don't second-guess people's intention.

>> imho, even the area part itself is wrong, but since the users won't
>> ever see the "authoring" part, i just shrug and live with it. but
>> the url kludge is definitely the least sexy part of lenya.
> 
> Yes, that is why many sites proxy to hide the "/pubname/live".  But 
> the "?lenya.usecase=login" is more difficult to hide.

agreed.

> I want to move "live" and "authoring" to Usecases, except we are 
> calling them "Modules" now.

seconded.

> I want the Module name to replace the 
> Area.  I want the "live" Module to be the default so it is not 
> required in the URL.  It should also be easy to configure one of the 
> Publications as the default so the server's homepage is a 
> Publication's "live" homepage without specifying either the 
> publication name or "live" in the URL.  Would you consider these 
> improvements?

definitely. for authoring, i don't really care about url semantics and
whether stuff is in the base url or the querystring, since it's all
internal anyway. but in live, urls must be clean.

>> from solprovider's suggestion it's a small step to including
>> session cookies in the uri. don't laugh, there are a couple
>> hare-brained cm systems out there that do exactly that.
> 
> Lenya uses
> javax.servlet.http.HttpServletResponse.encodeRedirectURL(), which
> puts the session id in the URL unless it has received a valid Cookie
> from the browser.  All good architectures have an alternative for old
> or crippled client software.  Disable Cookies for a Lenya site if you
> want to test it.

oh my god. i did not know that. it's ok for authoring, but for live,
this is a horrible mistake - it's deadly for people running a recursive
wget (which unless told otherwise will silently reject cookies).

is this cookie mechanism active in live/ by default?


regards,

jörn



-- 
"Án nýrra verka, án nútimans, hættir fortíðin að vekja áhuga."
"Without new works, without the present the past will cease to be of
interest."
        - Ásmundur Sveinsson (1893-1982)

--
Jörn Nettingsmeier, EDV-Administrator
Institut für Politikwissenschaft
Universität Duisburg-Essen, Standort Duisburg
Mail: pol-admin@uni-duisburg.de, Telefon: 0203/379-2736


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: please never abuse uris to contain parameters.

Posted by so...@apache.org.
On 1/24/06, Jörn Nettingsmeier <po...@uni-duisburg.de> wrote:
> solprovider@apache.org wrote:
> > Devs - One of the reasons to move Modules/Usecases to the Area part of
> > the URL is so this is possible without using the querystring.  Instead
> > of:
> >   http://myserver/mypub/live/doc.html?lenya.module=xsl&xsl=differentxsl
> > use:
> >   http://myserver/mypub/xsl/differentxsl/doc.html
> > (assuming the new module is named "xsl" and requires one parameter.)
> > Using my code, the "xsl" Module does its magic and passes processing
> > to the "live" module.
> please don't move parameters or usecase identifiers into the uri.
> most search engines and robots handle uris and get params differently,
> and they know why. controls that define the looks of a page (as opposed
> to the content) should be in a GET parameter.
> a uri is a resource identifier, note the "uniform" part. :-D
> sticking presentation or session-related information into the uri is
> just abysmally wrong.

Lenya uses parameters and usecase identifiers to create unique pages:
http://myserver/mypub/live/index.html => Homepage
http://myserver/mypub/live/index.html?lenya.usecase=login => Login page
http://myserver/mypub/live/index.html?lenya.usecase=map => Sitemap

These "parameters" define distinct pages, and should be handled by
search engines and robots as such.  This could be improved by changing
the URLs to:
http://myserver/mypub/live/index.html => Homepage
http://myserver/mypub/login => Login page
http://myserver/mypub/map => Sitemap

How is this bad?

The example you quoted was about using different stylesheets for one
document.  One could have the full navigation.  One stylesheet could
be "printable" with no navigation.  One could show META information. 
One could show only the Headers.  Each can have a different layout and
different graphics.  A visitor would consider them to be distinct
pages, so why is an important part of the URI in the querystring?

For the record, the "parameters" in a URL are called the "querystring"
because their first use was searching.  But even the following URLs
produce distinct pages:
http://www.google.com/search?q=apache
http://www.google.com/search?q=lenya

The querystring is sent as part of the GET line of HTTP.  In the old
days, websites would use an exclamation mark rather than a question
mark before their parameters just so search engines and robots would
see distinct URLs.

> imho, even the area part itself is wrong, but since the users won't ever
> see the "authoring" part, i just shrug and live with it. but the url
> kludge is definitely the least sexy part of lenya.

Yes, that is why many sites proxy to hide the "/pubname/live".  But
the "?lenya.usecase=login" is more difficult to hide.

I want to move "live" and "authoring" to Usecases, except we are
calling them "Modules" now.  I want the Module name to replace the
Area.  I want the "live" Module to be the default so it is not
required in the URL.  It should also be easy to configure one of the
Publications as the default so the server's homepage is a
Publication's "live" homepage without specifying either the
publication name or "live" in the URL.  Would you consider these
improvements?

> from solprovider's suggestion it's a small step to including session
> cookies in the uri. don't laugh, there are a couple hare-brained cm
> systems out there that do exactly that.

Lenya uses javax.servlet.http.HttpServletResponse.encodeRedirectURL(),
which puts the session id in the URL unless it has received a valid
Cookie from the browser.  All good architectures have an alternative
for old or crippled client software.  Disable Cookies for a Lenya site
if you want to test it.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org