You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Bob Harner <bo...@gmail.com> on 2006/01/11 21:23:02 UTC

Losing hyperlinks - what xsl removes them?

Greetings everyone!

I'm having an odd problem with Lenya. Whenever my raw content files
(pubs/default/content/authoring/**/index_en.xml) files have hyperlinks
like this:

<a href="/default/authoring/something.something">foo</a>

then when the page is rendered in Lenya I just get:

foo

In other words, the <a...> tag is completely removed.  I assume some
xsl file somewhere is doing it, but I can't figure out where and wny. 
Any idea what transformation would do this nasty thing?

I know hyperlink URI's in Lenya are normally like "page/index_en.xml"
(these links come out of the FCKeditor WYSIWYG editor link this), and
I assume the "/default/authoring/" part is triggering the hyperlink
removal code.  In fact, when I cahnge "/default/authoring/" to
anything else the the hyperlink *doesn't* disappear.

Here's a typical index_en.xml file having the problem:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
 xmlns:i18n="http://apache.org/cocoon/i18n/2.1"
  xmlns:dcterms="http://purl.org/dc/terms/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:lenya="http://apache.org/cocoon/lenya/page-envelope/1.0">
    <lenya:meta><dc:title>dctitle</dc:title>
    <dc:creator>Levi Vanya</dc:creator>
    <dc:subject>dcsubject</dc:subject>
    <dc:publisher></dc:publisher>
    <dc:contributor></dc:contributor>
    <dc:date>Tue Dec 20 09:06:08 EST 2005</dc:date>
    <dc:type></dc:type><dc:format></dc:format>
    <dc:identifier></dc:identifier><dc:source></dc:source>
    <dc:language>en</dc:language><dc:relation></dc:relation>
    <dc:coverage></dc:coverage><dc:rights>dcrights</dc:rights>
    <dcterms:issued></dcterms:issued><dcterms:modified></dcterms:modified>
    </lenya:meta>
    <head><title>dctitle</title>
    </head>
    <body>
<h1>Default Publication</h1>

<p>Here's a <a href="/default/authoring/test/happy.jpg">Happy</a> link.
</p>

</body></html>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Doug Chestnut <dh...@virginia.edu>.

Andreas Hartmann wrote:
> Bob Harner schrieb:
> 
>> Do you agree that removing such "/mypub/area/docid.html" links is a
>> bug?  Or is there good reason for them to be removed?
> 
> 
> If the document exists in the same area and the link is removed,
> then this is clearly a bug.
> 
> If the URL is a valid document URL and the document does not exist,
> then it is the correct behaviour of the LinkRewritingTransformer.
> 
> If the URL is not a valid document URL, then IMO the link should not
> be removed.

Yeah, there is no way to even know that the link exists unless you 
stumble upon it in the editor.  I can see disabling the link in the live 
area, but the authoring area should help the editors find broken links.

1.4 highlights (w/css) the broken internal links, would be easy to do 
this in 1.2 as well (make the LinkRewritingTransformer add a 
class="broken" attribute to the element instead of removing it (in the 
authoring area).

see: http://lenya.zones.apache.org:9999/default/authoring/index.html (I 
deleted the features doc)

--Doug

> 
> -- Andreas
> 
>>
>> On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
>>
>>> Josias Thoeny schrieb:
>>>
>>>> Maybe you could try to use a relative link instead, or patch the
>>>> DocumentBuilder.
>>>
>>> Just a note - instead of changing the DefaultDocumentBuilder, you
>>> can subclass it and declare your own class in publication.xconf.
>>>
>>> -- Andreas
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>>> For additional commands, e-mail: user-help@lenya.apache.org
>>>
>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Bob Harner <bo...@gmail.com>.
On 1/27/06, Michael Wechner <mi...@wyona.com> wrote:
> Bob Harner wrote:
>
>
> >
> > Are links handled any better in 1.4?
>
> I am not sure, but I don't think so. It could definitely need
> improvement and patches are always very welcome ;-)
>
> Michi

Somehow I knew that would be the answer :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Michael Wechner <mi...@wyona.com>.
Bob Harner wrote:


> 
> Are links handled any better in 1.4?

I am not sure, but I don't think so. It could definitely need 
improvement and patches are always very welcome ;-)

Michi


> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
> 
> 


-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Bob Harner <bo...@gmail.com>.
On 1/13/06, Bob Harner <bo...@gmail.com> wrote:
> On 1/13/06, Andreas Hartmann <an...@apache.org> wrote:
> > Josias Thoeny schrieb:
> > > On Thu, 2006-01-12 at 11:11 -0500, Bob Harner wrote:
> > >> The behavior seems to be correct for links to documents, but in my
> > >> case the link is actually to an asset (a JPEG file).  So even though
> > >> the file really exists, it is't actually a "document".  I believe
> > >> LinkRewritingTransformer should either ignore (leave in place) such
> > >> absolute links to assets or correctly check for their existence.  What
> > >> do you think?
> > >
> > > The easiest way probably is to extend the DefaultDocumentBuilder s.t. it
> > > recognizes only urls with .html as document urls. This way the
> > > LinkRewritingTransformer should leave all other urls (e.g. with .jpg
> > > extension) in place.
> >
> >  > However, if you plan to use the proxy mechanism and you want the asset
> >  > urls to be rewritten using a proxy url prefix, you have to do it the
> >  > other way (recognize the assets as documents). I'm not sure if this
> >  > can be done easily, though.
> >
> > It's about time to handle documents and assets in the same way ... :(
> >
> > -- Andreas
>
> We don't use the proxy mechanism, but we do export the pages to static
> files and serve them up through a separate web server.   We already
> extended StaticHtmlExporter to translate the URL's of such links (to
> both assets and documents) in the document.
>
> I think I will extend (subclass) DefaultDocumentBuilder.java and see
> how that goes.
>

I'm back on this finally and am finding it more difficult than
expected.  There are other problems with LinkRewritingTransformer in
1.2.4 & 1.2.x (and apparently 1.4 too).  It seems far too limited in
what it tries to do:

1) it only rewrites <a href="foo"> tags.  But there can be URL's
needing rewriting in several other tags as well:  <img src="foo">,
<script src="foo">, <object data="foo">, <meta http-equiv="refresh"
content="2;url=foo">, <link href="foo">, <embed src="foo">, <form
action="foo"> and probably others.  (And IIUC XHTML 2.0 (future) will
allow an href tag on *any* element.)  Rewriting such links is an
assumed feature of any mature CMS, IMHO.  I don't think it is really
reasonable to prohibit URL's starting with "/".  In fact, I'd go
further and say that Lenya should even rewrite links whose URL's
contain the same host name as the page, to make them relative.

2) It relies heavily on the DefaultDocumentBuilder class, whose
isDocument() method simplistically returns true for any URL's starting
like "/lenya/mypub/authoring/" even if the URL points to an asset, not
a CMS document.  In contrast, note that the sitemaps verify that the
URL ends in ".html" before assuming that a URL is really a CMS
document.

Are links handled any better in 1.4?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Bob Harner <bo...@gmail.com>.
On 1/13/06, Andreas Hartmann <an...@apache.org> wrote:
> Josias Thoeny schrieb:
> > On Thu, 2006-01-12 at 11:11 -0500, Bob Harner wrote:
> >> The behavior seems to be correct for links to documents, but in my
> >> case the link is actually to an asset (a JPEG file).  So even though
> >> the file really exists, it is't actually a "document".  I believe
> >> LinkRewritingTransformer should either ignore (leave in place) such
> >> absolute links to assets or correctly check for their existence.  What
> >> do you think?
> >
> > The easiest way probably is to extend the DefaultDocumentBuilder s.t. it
> > recognizes only urls with .html as document urls. This way the
> > LinkRewritingTransformer should leave all other urls (e.g. with .jpg
> > extension) in place.
>
>  > However, if you plan to use the proxy mechanism and you want the asset
>  > urls to be rewritten using a proxy url prefix, you have to do it the
>  > other way (recognize the assets as documents). I'm not sure if this
>  > can be done easily, though.
>
> It's about time to handle documents and assets in the same way ... :(
>
> -- Andreas

We don't use the proxy mechanism, but we do export the pages to static
files and serve them up through a separate web server.   We already
extended StaticHtmlExporter to translate the URL's of such links (to
both assets and documents) in the document.

I think I will extend (subclass) DefaultDocumentBuilder.java and see
how that goes.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Andreas Hartmann <an...@apache.org>.
Josias Thoeny schrieb:
> On Thu, 2006-01-12 at 11:11 -0500, Bob Harner wrote:
>> The behavior seems to be correct for links to documents, but in my
>> case the link is actually to an asset (a JPEG file).  So even though
>> the file really exists, it is't actually a "document".  I believe
>> LinkRewritingTransformer should either ignore (leave in place) such
>> absolute links to assets or correctly check for their existence.  What
>> do you think?
> 
> The easiest way probably is to extend the DefaultDocumentBuilder s.t. it
> recognizes only urls with .html as document urls. This way the
> LinkRewritingTransformer should leave all other urls (e.g. with .jpg
> extension) in place.

 > However, if you plan to use the proxy mechanism and you want the asset
 > urls to be rewritten using a proxy url prefix, you have to do it the
 > other way (recognize the assets as documents). I'm not sure if this
 > can be done easily, though.

It's about time to handle documents and assets in the same way ... :(

-- Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Josias Thoeny <jo...@wyona.com>.
On Thu, 2006-01-12 at 11:11 -0500, Bob Harner wrote:
> The behavior seems to be correct for links to documents, but in my
> case the link is actually to an asset (a JPEG file).  So even though
> the file really exists, it is't actually a "document".  I believe
> LinkRewritingTransformer should either ignore (leave in place) such
> absolute links to assets or correctly check for their existence.  What
> do you think?

The easiest way probably is to extend the DefaultDocumentBuilder s.t. it
recognizes only urls with .html as document urls. This way the
LinkRewritingTransformer should leave all other urls (e.g. with .jpg
extension) in place.

However, if you plan to use the proxy mechanism and you want the asset
urls to be rewritten using a proxy url prefix, you have to do it the
other way (recognize the assets as documents). I'm not sure if this can
be done easily, though.

Josias

> 
> I'm at Lenya 1.2.4, by the way.
> 
> On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
> > Bob Harner schrieb:
> > > Do you agree that removing such "/mypub/area/docid.html" links is a
> > > bug?  Or is there good reason for them to be removed?
> >
> > If the document exists in the same area and the link is removed,
> > then this is clearly a bug.
> >
> > If the URL is a valid document URL and the document does not exist,
> > then it is the correct behaviour of the LinkRewritingTransformer.
> >
> > If the URL is not a valid document URL, then IMO the link should not
> > be removed.
> >
> > -- Andreas
> >
> > >
> > > On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
> > >> Josias Thoeny schrieb:
> > >>
> > >>> Maybe you could try to use a relative link instead, or patch the
> > >>> DocumentBuilder.
> > >> Just a note - instead of changing the DefaultDocumentBuilder, you
> > >> can subclass it and declare your own class in publication.xconf.
> > >>
> > >> -- Andreas
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> > >> For additional commands, e-mail: user-help@lenya.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> > For additional commands, e-mail: user-help@lenya.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Bob Harner <bo...@gmail.com>.
The behavior seems to be correct for links to documents, but in my
case the link is actually to an asset (a JPEG file).  So even though
the file really exists, it is't actually a "document".  I believe
LinkRewritingTransformer should either ignore (leave in place) such
absolute links to assets or correctly check for their existence.  What
do you think?

I'm at Lenya 1.2.4, by the way.

On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
> Bob Harner schrieb:
> > Do you agree that removing such "/mypub/area/docid.html" links is a
> > bug?  Or is there good reason for them to be removed?
>
> If the document exists in the same area and the link is removed,
> then this is clearly a bug.
>
> If the URL is a valid document URL and the document does not exist,
> then it is the correct behaviour of the LinkRewritingTransformer.
>
> If the URL is not a valid document URL, then IMO the link should not
> be removed.
>
> -- Andreas
>
> >
> > On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
> >> Josias Thoeny schrieb:
> >>
> >>> Maybe you could try to use a relative link instead, or patch the
> >>> DocumentBuilder.
> >> Just a note - instead of changing the DefaultDocumentBuilder, you
> >> can subclass it and declare your own class in publication.xconf.
> >>
> >> -- Andreas
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> >> For additional commands, e-mail: user-help@lenya.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Andreas Hartmann <an...@apache.org>.
Bob Harner schrieb:
> Do you agree that removing such "/mypub/area/docid.html" links is a
> bug?  Or is there good reason for them to be removed?

If the document exists in the same area and the link is removed,
then this is clearly a bug.

If the URL is a valid document URL and the document does not exist,
then it is the correct behaviour of the LinkRewritingTransformer.

If the URL is not a valid document URL, then IMO the link should not
be removed.

-- Andreas

> 
> On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
>> Josias Thoeny schrieb:
>>
>>> Maybe you could try to use a relative link instead, or patch the
>>> DocumentBuilder.
>> Just a note - instead of changing the DefaultDocumentBuilder, you
>> can subclass it and declare your own class in publication.xconf.
>>
>> -- Andreas
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>> For additional commands, e-mail: user-help@lenya.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Bob Harner <bo...@gmail.com>.
Do you agree that removing such "/mypub/area/docid.html" links is a
bug?  Or is there good reason for them to be removed?

On 1/12/06, Andreas Hartmann <an...@apache.org> wrote:
> Josias Thoeny schrieb:
>
> > Maybe you could try to use a relative link instead, or patch the
> > DocumentBuilder.
>
> Just a note - instead of changing the DefaultDocumentBuilder, you
> can subclass it and declare your own class in publication.xconf.
>
> -- Andreas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Andreas Hartmann <an...@apache.org>.
Josias Thoeny schrieb:

> Maybe you could try to use a relative link instead, or patch the
> DocumentBuilder.

Just a note - instead of changing the DefaultDocumentBuilder, you
can subclass it and declare your own class in publication.xconf.

-- Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Re: Losing hyperlinks - what xsl removes them?

Posted by Josias Thoeny <jo...@wyona.com>.
On Wed, 2006-01-11 at 15:23 -0500, Bob Harner wrote:
> Greetings everyone!
> 
> I'm having an odd problem with Lenya. Whenever my raw content files
> (pubs/default/content/authoring/**/index_en.xml) files have hyperlinks
> like this:
> 
> <a href="/default/authoring/something.something">foo</a>
> 
> then when the page is rendered in Lenya I just get:
> 
> foo
> 
> In other words, the <a...> tag is completely removed.  I assume some
> xsl file somewhere is doing it, but I can't figure out where and wny. 
> Any idea what transformation would do this nasty thing?

The links are probably removed by the LinkRewritingTransformer
(src/java/org/apache/lenya/cms/cocoon/transformation/LinkRewritingTransformer.java)

It removes all links which start with {contextprefix}/{pubid} and are
not valid document urls.

Whether a url is a valid document url is decided in the DocumentBuilder.
Have a look at DefaultDocumentBuilder.java, method
isDocument(Publication publication, String url)

Maybe you could try to use a relative link instead, or patch the
DocumentBuilder.

hth,
Josias

> 
> I know hyperlink URI's in Lenya are normally like "page/index_en.xml"
> (these links come out of the FCKeditor WYSIWYG editor link this), and
> I assume the "/default/authoring/" part is triggering the hyperlink
> removal code.  In fact, when I cahnge "/default/authoring/" to
> anything else the the hyperlink *doesn't* disappear.
> 
> Here's a typical index_en.xml file having the problem:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <html xmlns="http://www.w3.org/1999/xhtml"
>  xmlns:i18n="http://apache.org/cocoon/i18n/2.1"
>   xmlns:dcterms="http://purl.org/dc/terms/"
>    xmlns:dc="http://purl.org/dc/elements/1.1/"
>     xmlns:lenya="http://apache.org/cocoon/lenya/page-envelope/1.0">
>     <lenya:meta><dc:title>dctitle</dc:title>
>     <dc:creator>Levi Vanya</dc:creator>
>     <dc:subject>dcsubject</dc:subject>
>     <dc:publisher></dc:publisher>
>     <dc:contributor></dc:contributor>
>     <dc:date>Tue Dec 20 09:06:08 EST 2005</dc:date>
>     <dc:type></dc:type><dc:format></dc:format>
>     <dc:identifier></dc:identifier><dc:source></dc:source>
>     <dc:language>en</dc:language><dc:relation></dc:relation>
>     <dc:coverage></dc:coverage><dc:rights>dcrights</dc:rights>
>     <dcterms:issued></dcterms:issued><dcterms:modified></dcterms:modified>
>     </lenya:meta>
>     <head><title>dctitle</title>
>     </head>
>     <body>
> <h1>Default Publication</h1>
> 
> <p>Here's a <a href="/default/authoring/test/happy.jpg">Happy</a> link.
> </p>
> 
> </body></html>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org