You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lenya.apache.org by "Gregor J. Rothfuss" <gr...@apache.org> on 2003/09/23 22:41:36 UTC

comments on site.xml proposal

Robert Koberg wrote:

 > p.s. any comment on my proposal for a site.xml, especially since you 
list it
 > as a RFC (even a 'no' I/we don't like this idea... my feelings will 
not be
 > hurt if you decide to tear it shreds)...

ok, see below

this is from your original mail:

 >>>>>>>>
Yes, but it is not really new. This is something that can be stored in a
'site.xml' (I see you have an RFC for this). I would like to 
offer/propose you guys utilize our way of doing the site.xml (snippet 
below) with a Lenya namespace, of course. I can provide you with an XML 
Schema and some JavaScript objects for tree manipulation. I can also 
give you a basic set of XSL that uses the site.xml to transform pages.

I tried to get this adopted in the forrest project, which it was in a 
way, but in a different direction which I feel is very limited in 
several ways.

This is related to the RT in that XML fragments are referenced in 
//(page | folder | site)/regions/region. The region name identifies an 
HTML DIV (in our case) or what-have-you. Content assigned to folders 
cascades down to the pages in that folder.

In addition you will notice that some useful metadata/attributes are on 
the site, folder and page nodes. Hopefully they are meaningful. If not 
please ask for clarification.

If stored in an XMLDB that has XUpdate capabilities you can let the user
edit page/folder/etc 'properties', PUT it to the server as XML and 
update the DB. Users can go to a gui view of a content repository and 
assign content to regions in a page or folder and submit it to be 
updated on the server.
<<<<<<<<<<<<<<<

we have adopted someting similar, with the main difference that we 
decided to *not* have GUIDs for pages due to performance concerns.
see
http://cvs.apache.org/viewcvs.cgi/cocoon-lenya/src/webapp/lenya/pubs/default/content/authoring/sitetree.xml?rev=HEAD&content-type=text/vnd.viewcvs-markup
http://cocoon.apache.org/lenya/docs/concepts/siteTree.html

 >>>>
The site.xml is used as the main XML Source in a transformation. Doing this
allows for any internal links to always be valid no matter how often the
site architecture is rearranged (this could solve the doc problem that
cocoon devs are currently experiencing). The metadata tells the
transformation things like: should the pages in a folder show a
snailtrail/pager, should a page show on the nav, should the page be
generated, should it have a print friendly page, etc.
<<<<<<

we currently do a mix of putting some metadata into the sitetree.xml 
(such as which languages are supported at a particular node) and 
determing the rest (like snailtrail yes/no) from the document itself. we 
match against the doctype or the root element to figure out which 
template to apply.

see http://cocoon.apache.org/lenya/docs/components/URIParametrizer.html

 >>>>>>>>>>>>>>
To generate a site instance (the site.xml is a virtual representation of 
a site), simply run the site.xml through some kind of ContentHandler 
that creates the folders and transforms the pages. The one I am 
currently using generates a 100 page site (with html print friendlies, 
external metadata) in about 4-5 seconds.
<<<<<<<<<<<<<<

you mean statically rendered?

 >>>>>>>>>>>>>
What I have been working on recently is having XMLFilters strip out
unnecessary attributes except for those on the site, parent folder and
target page nodes for a transformation. In one of the filters something 
like XInclude replaces the regions/region/item/@ref with the XML 
instance it refers to.
<<<<<<<<<<<<<

makes sense with site.xml being the central location that all requests 
have to go through (if i understood correctly)

 >>>>>>>>>>>>
I currently keep the site.xml unmarshalled into a JDOM Document and
manipulate it with JDOM methods. The XML fragments are brought in during 
the transformation using the XSL document function. I would like to 
replace this with a full SAX approach and XInclude.
<<<<<<<<<<<<

we are doing some fairly large sitetree.xml, and keeping it in a dom 
would be impractical for us. how many nodes do you typically have?

Here is a snippet of our site.xml for the liveSTORYBOARD site:

<lsb:page css="inherit" generate="1" id="p1395893912" 
name="Welcome.html" onnav="1" pgstatus="publish" print_friendly="0" 
xsl="xsl_homepage">
     <lsb:label>Home</lsb:label>
     <lsb:title>liveSTORYBOARD Content Management System: Simple, 
powerful and secure hosted Web Content Management</lsb:title>
     <lsb:regions>
       <lsb:region name="wide_col">
         <lsb:item ref="a1095201465"></lsb:item>
         <lsb:item ref="c404932357"></lsb:item>
       </lsb:region>
       <lsb:region name="narrow1_col">
         <lsb:item ref="c1109515213"></lsb:item>
         <lsb:item ref="c108879656"></lsb:item>
       </lsb:region>
     </lsb:regions>
   </lsb:page>

i think the regions stuff is pretty cool, especially if you have lots of 
block elements, like related pages etc. i think this format is 
especially useful for portal-type sites that have a lot of inclusion 
going on. we currently do xincludes as part of the content itself, 
usually with some facility in the schema to allow inclusions.

   <lsb:folder css="inherit" expand="0" id="f61944265" name="Products"
onnav="1" pager="0" snailtrail="1" xsl="inherit" index_page="p259623336">
     <lsb:label>Products</lsb:label>
     <lsb:title>liveSTORYBOARD CMS: Simple, powerful and secure Web 
Content Management</lsb:title>

we do not distinguish between pages and folders, every node is 
implicitly a folder which may contain subnodes.

while i am not a fan of GUID (performance concerns, transparency) i can 
see their advantages in your setup. we are making experiences with 
sitetree.xml in different publications. one of the things on the agenda 
is to look at portal-like functionality for lenya, reusing s&n portal or 
some other technology. in that context, site.xml could play a role.
so, while we won't adopt it wholesale any time soon, we may move towards 
some of your ideas in the future.

-gregor

-- 
Gregor J. Rothfuss
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://wyona.com                   http://cocoon.apache.org/lenya
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-dev-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-dev-help@cocoon.apache.org

RE: comments on site.xml proposal

Posted by Robert Koberg <ro...@koberg.com>.

Thanks for reading and responding. [more inline] I realize I am trying to go
up a steep hill here, but let me keep trying :) I think we come at this from
two different angles: Lenya from the server-side focus and us from the
client-side.

> -----Original Message-----
> From: Gregor J. Rothfuss [mailto:gregor@apache.org]
> Sent: Tuesday, September 23, 2003 1:42 PM
> To: Lenya Developers List
> 
> Robert Koberg wrote:
> 
>  > p.s. any comment on my proposal for a site.xml, especially since you
> list it
>  > as a RFC (even a 'no' I/we don't like this idea... my feelings will
> not be
>  > hurt if you decide to tear it shreds)...
> 
> ok, see below
> 
> this is from your original mail:
> 
>  >>>>>>>>
> Yes, but it is not really new. This is something that can be stored in a
> 'site.xml' (I see you have an RFC for this). I would like to
> offer/propose you guys utilize our way of doing the site.xml (snippet
> below) with a Lenya namespace, of course. I can provide you with an XML
> Schema and some JavaScript objects for tree manipulation. I can also
> give you a basic set of XSL that uses the site.xml to transform pages.
> 
> I tried to get this adopted in the forrest project, which it was in a
> way, but in a different direction which I feel is very limited in
> several ways.
> 
> This is related to the RT in that XML fragments are referenced in
> //(page | folder | site)/regions/region. The region name identifies an
> HTML DIV (in our case) or what-have-you. Content assigned to folders
> cascades down to the pages in that folder.
> 
> In addition you will notice that some useful metadata/attributes are on
> the site, folder and page nodes. Hopefully they are meaningful. If not
> please ask for clarification.
> 
> If stored in an XMLDB that has XUpdate capabilities you can let the user
> edit page/folder/etc 'properties', PUT it to the server as XML and
> update the DB. Users can go to a gui view of a content repository and
> assign content to regions in a page or folder and submit it to be
> updated on the server.
> <<<<<<<<<<<<<<<
> 
> we have adopted someting similar, with the main difference that we
> decided to *not* have GUIDs for pages due to performance concerns.
> see
> http://cvs.apache.org/viewcvs.cgi/cocoon-
> lenya/src/webapp/lenya/pubs/default/content/authoring/sitetree.xml?rev=HEA
> D&content-type=text/vnd.viewcvs-markup
> http://cocoon.apache.org/lenya/docs/concepts/siteTree.html

OK, I think I am misunderstanding. The first URL you list above uses ID
attributes. The second uses HREF attributes. I don't see how using ID attrs
affects performance worse than HREFs. If the HREFs have to be generated on
each request, then yes, I see a small performance hit. But if they are
pregenerated I don't see any difference.

To me, using HREFs in this manner creates a brittle sitetree, at least
during the beginning stages of a site's development (and many times later
when other usability issues pop up).

If you have:

<node href="http://blah/clah">
  <node href=" http://blah/clah/dlah.?"/>
  <node href=" http://blah/clah/flah.?"/>
</

And after some usability testing, it is found that a better site layout is:

<node href="http://blah/clah">
  <node href="http://blah/clah/dlah.?"/>
  <node href="http://blah/clah/glah">
    <node href="http://blah/clah/flah.?"/>
  </
</

Wouldn't you have to rewrite the HREF or identifier. If so, wouldn't all
links to it break? I realize, you don't /have to/ but it seems hacky in a
mod_rewrite kind of way.

> 
>  >>>>
> The site.xml is used as the main XML Source in a transformation. Doing
> this
> allows for any internal links to always be valid no matter how often the
> site architecture is rearranged (this could solve the doc problem that
> cocoon devs are currently experiencing). The metadata tells the
> transformation things like: should the pages in a folder show a
> snailtrail/pager, should a page show on the nav, should the page be
> generated, should it have a print friendly page, etc.
> <<<<<<
> 
> we currently do a mix of putting some metadata into the sitetree.xml
> (such as which languages are supported at a particular node) and
> determing the rest (like snailtrail yes/no) from the document itself. we
> match against the doctype or the root element to figure out which
> template to apply.
> 
> see http://cocoon.apache.org/lenya/docs/components/URIParametrizer.html

But I thought you were considering (albeit in an RT) moving away from a
single XML source to multiple XML sources for a view. If so, and you still
relied on a document's doctype to determine the template, what happens if
the two source's doctypes conflict with each other?

One benefit (performance-wise) of keeping this type of metadata separate
from the document itself is for creating site reports. The site.xml can be
transformed into a *wide* variety of generic reports and specific client
determined ones. This way you do not need to crawl all of your docs to
gather this info. 

Another example is in a search index. You can run through the site.xml with
a content handler keeping track of the current page and indexing just the
content/metadata at the page level.

Also, you can use inheritance on the sub-nodes to eliminate duplication of
efforts.

> 
>  >>>>>>>>>>>>>>
> To generate a site instance (the site.xml is a virtual representation of
> a site), simply run the site.xml through some kind of ContentHandler
> that creates the folders and transforms the pages. The one I am
> currently using generates a 100 page site (with html print friendlies,
> external metadata) in about 4-5 seconds.
> <<<<<<<<<<<<<<
> 
> you mean statically rendered?

Yes and in a way, no. Yes that an authorized user can generate a static
version of a site or folder or page. No in that some of things generated
could be JSP, PHP, or in our case XSL. The wrapper L&F and linking is
guaranteed through the generation process leaving some regions on the page
for a runtime transformation. For example a portal page or a (Lucene) search
where the L&F is guaranteed by generating an XSL that is almost a final
page, but is used to transform well-formed DB results (sax events).

> 
>  >>>>>>>>>>>>>
> What I have been working on recently is having XMLFilters strip out
> unnecessary attributes except for those on the site, parent folder and
> target page nodes for a transformation. In one of the filters something
> like XInclude replaces the regions/region/item/@ref with the XML
> instance it refers to.
> <<<<<<<<<<<<<
> 
> makes sense with site.xml being the central location that all requests
> have to go through (if i understood correctly)

It is very useful especially when used in conjunction with a client user
GUI. For example, when creating a link you can provide a structured
(optiongroup/option type of thing) dropdown where users can select the page
(they see the label the app sees the ID) they want to link to.

You can also present an entire site node view to the user. You can use
javascript to make this view editable so they can move pages/folders around,
preview for themselves or generate for a certaian user-base, try it out,
move things around, try it out, etc... 

Also, it is simple to transform a folder or even the whole site to a print
friendly view.

> 
>  >>>>>>>>>>>>
> I currently keep the site.xml unmarshalled into a JDOM Document and
> manipulate it with JDOM methods. The XML fragments are brought in during
> the transformation using the XSL document function. I would like to
> replace this with a full SAX approach and XInclude.
> <<<<<<<<<<<<
> 
> we are doing some fairly large sitetree.xml, and keeping it in a dom
> would be impractical for us. how many nodes do you typically have?

JDOM is not DOM (less memory is required), but you are right. That is why I
am looking to SAX for a recent large project we have.

Currently, we have sites from 50 to 500 pages (not including folders or
print friendlies). Yes, I can see how a site with 10s of thousands of pages
would not work with this approach (Yes... this is where we would fall
down...).

However, could a site with that many pages be broken out into several
smaller sites (at a department level)? All could use the same XSL/CSS to
maintain a consistent L&F. The issue then arises where the client user wants
to link to a page or include a content piece in one of the broken out sites.
Performance would suffer when loading the other sites info, but it is not
that bad (few seconds if not already cached).

> 
> Here is a snippet of our site.xml for the liveSTORYBOARD site:
> 
> <lsb:page css="inherit" generate="1" id="p1395893912"
> name="Welcome.html" onnav="1" pgstatus="publish" print_friendly="0"
> xsl="xsl_homepage">
>      <lsb:label>Home</lsb:label>
>      <lsb:title>liveSTORYBOARD Content Management System: Simple,
> powerful and secure hosted Web Content Management</lsb:title>
>      <lsb:regions>
>        <lsb:region name="wide_col">
>          <lsb:item ref="a1095201465"></lsb:item>
>          <lsb:item ref="c404932357"></lsb:item>
>        </lsb:region>
>        <lsb:region name="narrow1_col">
>          <lsb:item ref="c1109515213"></lsb:item>
>          <lsb:item ref="c108879656"></lsb:item>
>        </lsb:region>
>      </lsb:regions>
>    </lsb:page>
> 
> i think the regions stuff is pretty cool, especially if you have lots of
> block elements, like related pages etc. i think this format is
> especially useful for portal-type sites that have a lot of inclusion
> going on. we currently do xincludes as part of the content itself,
> usually with some facility in the schema to allow inclusions.
> 
> 
>    <lsb:folder css="inherit" expand="0" id="f61944265" name="Products"
> onnav="1" pager="0" snailtrail="1" xsl="inherit" index_page="p259623336">
>      <lsb:label>Products</lsb:label>
>      <lsb:title>liveSTORYBOARD CMS: Simple, powerful and secure Web
> Content Management</lsb:title>
> 
> we do not distinguish between pages and folders, every node is
> implicitly a folder which may contain subnodes.

Then I guess those could be considered regions and you have a way to include
multiple content pieces in a page view. However, there are other reasons
that it is useful to distinguish a page from a folder: presenting a GUI,
reports, linking, folder index pages, pagers and others I will think of
after I send this :)

> 
> 
> while i am not a fan of GUID (performance concerns, transparency) i can
> see their advantages in your setup. we are making experiences with
> sitetree.xml in different publications. one of the things on the agenda
> is to look at portal-like functionality for lenya, reusing s&n portal or
> some other technology. in that context, site.xml could play a role.
> so, while we won't adopt it wholesale any time soon, we may move towards
> some of your ideas in the future.

That sounds good. I was a little too strong/wrong in saying you should adopt
our way. I have been using this approach for 4 years now and too biased. I
have not needed the massive scalability, in the past, which your approach
seems to go after.

We have been more focused on the client-users experience rather than the
developer's. I hope the two can be married in the future. Luckily we have a
4 year time frame to grow.

Thanks again for your response,
-Rob

> 
> -gregor
> 
> --
> Gregor J. Rothfuss
> Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
> http://wyona.com                   http://cocoon.apache.org/lenya
> gregor.rothfuss@wyona.com                       gregor@apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lenya-dev-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: lenya-dev-help@cocoon.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-dev-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-dev-help@cocoon.apache.org