You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Bob Harner <bo...@gmail.com> on 2006/02/01 23:46:16 UTC

Re: Terminology [was: Rewriting LinkRewritingTransformer]

On 1/30/06, Andreas Hartmann <an...@apache.org> wrote:
> Bob Harner schrieb:
> > As briefly discussed on the user list recently (subject: "Losing
> > hyperlinks - what xsl removes them?"), the LinkRewritingTransformer
> > seems to need some improvements so that it can rewrite all types of
> > links.  It currently only rewrites <a href="foo"> where foo is a
> > document-relative URI.  I'm sure I'm NOT the best person to do so
> > (being much less familiar with 1.4 than 1.2.x), but I've been looking
> > over the code and humbly offer the following initial thoughts.  Your
> > advise and guidance is eagerly sought...
> >
> > 1) <editorial>We have really overloaded the word "resource" in Lenya &
> > Cocoon, haven't we?  Sometimes it means "an asset or a CMS document"
> > (per http://wiki.apache.org/lenya/ProposalArchitecture), or sometimes
> > it specifically just an asset (per Resource.java).  The word is also
> > used in sitemap files to refer to a reusable part of a pipeline.
> > Elsewhere it refers vaguely to a "miscellaneous relate file" (the
> > lenya/resources dir).  Sometimes it means the amount of memory, hard
> > drive space, and CPU cycles available.  And Document Types are now
> > officially Resource Types.
>
> Actually, in the repo API I called them "Document Types" again. I'm
> still not sure if the term "Resource" or "Document" is appropriate for
> a "content item". Or maybe "content item" is really superior.
>
> The terms "content type" and "document type" are preoccupied. But IMO
> we should just use the same term as for content items, regardless of
> any preoccupation.
>
> How about this hierarchy:
>
> - Publication
>    - Area
>      - Content
>        - ContentNode (belongs to a ContentType)
>          - ContentItem (a language version)
>            - (Content)Version (of the version history)
>      - Structures (more general than Sites)
>        - Structure
>          - StructureNode (references ContentNode or ContentItem)
>
>
> > This overloading of terminology makes it
> > harder to learn Lenya. I think "Content", "Content Item", and "Content
> > Type" are probably much better terms for a CMS to use. Precise and
> > unambiguous terminology always a good thing.</editorial>
> >
> > 2) As Andreas said a couple weeks ago, "It's about time to handle
> > documents and assets in the same way".  I think there is a need for a
> > comon interface shared by both CMS documents and assets, so both can
> > be handled uniformly -- particulary for link rewriting, where the
> > URI's of both CMS documents and assets need to be rewritten in the
> > same way.  This would be, perhaps, "ContentItem".  And both Document
> > and Resource (which maybe should be named Asset?) should implement
> > this interface and DefaultDocument and Resource should extend a
> > DefaultContentItem class.  Or is there a better idea?
>
> I'm not even sure if we need the separation between Documents and
> Assets. Maybe there is a way to handle both of them uniformly.
> I'd rather add specific functionality:
>
> - Can the content item input/output XML?
> - How is the content item rendered when it is referenced by another
>    content item?
> - What are the presentation options?
> - ...
>
> IMO additional, asset-specific functionality could be handled by an
> asset-management module or something like this, not by the core API.
>
>
> > 3) I think maybe the link rewriting should be done when a CMS document
> > is published, deactivated, or exported, rather than every time it is
> > displayed.
>
> The problem is that the document has to be updated when *another*
> document is changed/removed. This means when you deactivate a document,
> you have to remove the links from all documents which are referencing
> this document. I agree that this would be a good thing, but with the
> current architecture it is a very time-consuming operation.

Let me see if I'm following you:  the only reason for rewriting links
at display time (rather than when the CMS document is create/modified)
is so that we can remove any links that point to other CMS documents
that have been deleted or moved, right?

This seems like an I/O and CPU hog for pages with lots of links, and
the benefit seems minimal. Personally, I might rather have a broken
link than a removed link anyway :-) because at least I can use
external tools to detect broken links, but not if we remove them.  (I
know that in 1.4 such links to missing documents are displayed
specially, but not so in the live area.)

The reason I'm interested in this point is that I'd like to see
LinkRewritingTransformer do a much more thorough job, as I repeated in
another recent thread, but I wouldn't want it to slow down the display
of pages.

Having a link management capability would be the proper solution, of
course (so that when you delete or move a document then you could just
"look up" the list of documents that pointed to it), but that would be
very hard to do without a relational database.

> > This change would be a performance boost for every page.
> > Or am I missing something in why it needs to be done at display time?
> >
> > 4) LinkRewritingTransformer relies heavily on the
> > DefaultDocumentBuilder class, whose isDocument() method simplistically
> > returns true for any URL's starting like "/lenya/mypub/authoring/"
> > even if the URL points to an asset, not a CMS document.  In contrast,
> > note that the sitemaps verify that the URL ends in ".html" before
> > assuming that a URL is really a CMS document.  Should
> > DefaultDocumentBuilder's isDocument() method be changed to look for
> > the ".html" ending?  (But do CMS documents *always* have an ".html"
> > ending?)
>
> No, we can't do this. This is another reason why I think that the
> DocumentBuilder concept is doomed (see the thread "Mapping URLs to
> documents"). This is a quite complex and fundamental issue, IMO we
> have to come to a decision here first.
>
> Thanks for bringing this up!
>
> -- Andreas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail: dev-help@lenya.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: Terminology [was: Rewriting LinkRewritingTransformer]

Posted by Andreas Hartmann <an...@apache.org>.
Bob Harner wrote:

[...]

> Let me see if I'm following you:  the only reason for rewriting links
> at display time (rather than when the CMS document is create/modified)
> is so that we can remove any links that point to other CMS documents
> that have been deleted or moved, right?

Yes, unless I'm missing something.


> This seems like an I/O and CPU hog for pages with lots of links, and
> the benefit seems minimal. Personally, I might rather have a broken
> link than a removed link anyway :-)

OK, than it should be possible to switch this functionality on/off.

> because at least I can use
> external tools to detect broken links, but not if we remove them.  (I
> know that in 1.4 such links to missing documents are displayed
> specially, but not so in the live area.)
> 
> The reason I'm interested in this point is that I'd like to see
> LinkRewritingTransformer do a much more thorough job, as I repeated in
> another recent thread, but I wouldn't want it to slow down the display
> of pages.
> 
> Having a link management capability would be the proper solution, of
> course (so that when you delete or move a document then you could just
> "look up" the list of documents that pointed to it),

+1

> but that would be
> very hard to do without a relational database.

Using JCR and UUIDs would certainly make it easier.

Let's discuss our ideas and requirements in the LinkRewritingTransformer
thread and see what we'll come up with.

-- Andreas


-- 
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
andreas.hartmann@wyona.com                     andreas@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org