You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Alexander Schatten <al...@gmx.at> on 2003/05/22 02:00:01 UTC

HTML serializer namespace problem

I use htmlreader to read html documents, convert them to xhtml, then the 
documents are processed using xslt transformer, then html serializer 
like this:

<map:match pattern="**.html">
  <map:generate src="html/{1}.html" type="html"/>
   <map:transform src="style/default.xsl" type="xslt" />
   <map:serialize type="html" />
</map:match>

so this should be clear and straightforward. unfortunately the generated 
documents are full of statements like this

<p xmlns="http://www.w3.org/1999/xhtml" ...



This is obviously not correct as the html serializer should not generate 
xhtml namespaces, and if available, it should remove it. (Additionally 
in the stylesheet the <xsl:out> defines html as method...)

Can there be done something against this?


thank you again!!


Alex




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2003-05-22 at 10:21, Alexander Schatten wrote:
> Joerg Heinicke wrote:
> 
> > What about Bugzilla? A testcase would be really helpful.
> >
> o.k. can you tell me the url?

The HTML/XML serializers used in Cocoon are actually provided by Xalan,
so please post your bugs over there.

A related Xalan complaint is:
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19933

You might want to add your comments over there.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Alexander Schatten <al...@gmx.at>.
Joerg Heinicke wrote:

> What about Bugzilla? A testcase would be really helpful.
>
o.k. can you tell me the url?

btw.: do you mean the problems with 2.1 or the described left-over 
namespaces?


thanx


alex


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Joerg Heinicke <jo...@gmx.de>.
What about Bugzilla? A testcase would be really helpful.

Joerg

Alexander Schatten wrote:
> rg1915@dslextreme.com wrote:
> 
>> I'm also seeing the same behavior in 2.1. I didn't see this in 2.0.4.  To
>> get the html to match I have to use a namespace where I didn't in 2.0. 
>> The HTMLGenerator generates html using the XHTML namespace. I strip off
>> everything except what is in between the body tags. The first element
>> after the body tag is a form tag and it comes out as
>> <form xmlns="http://www.w3.org/1999/xhtml" ...>
>>  
>>
> unfortunately the current milestone of 2.1 seems to have a real weird 
> html generator behaviour, at least, I did not manage to get it work 
> correctly. but the problem is very hard to describe. it worked with 
> namespaces in one case and did not work in an extremly similar, 
> practically identical case...
> 
>> If you could generate a sample stylesheet that leaves what is between the
>> body tags and doesn't exhibit this behavior I'd be most grateful.
>>  
>>
> yes, but this is only the second best solution: the best solution 
> obviously would be if the serializer removes such attributes correctly, 
> as this has nothing to do with standard HTML, dont you think so?
> 
> 
> alex

-- 

System Development
VIRBUS AG
Fon  +49(0)341-979-7419
Fax  +49(0)341-979-7409
joerg.heinicke@virbus.de
www.virbus.de


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Alexander Schatten <al...@gmx.at>.
rg1915@dslextreme.com wrote:

>I'm also seeing the same behavior in 2.1. I didn't see this in 2.0.4.  To
>get the html to match I have to use a namespace where I didn't in 2.0. 
>The HTMLGenerator generates html using the XHTML namespace. I strip off
>everything except what is in between the body tags. The first element
>after the body tag is a form tag and it comes out as
><form xmlns="http://www.w3.org/1999/xhtml" ...>
>  
>
unfortunately the current milestone of 2.1 seems to have a real weird 
html generator behaviour, at least, I did not manage to get it work 
correctly. but the problem is very hard to describe. it worked with 
namespaces in one case and did not work in an extremly similar, 
practically identical case...

>If you could generate a sample stylesheet that leaves what is between the
>body tags and doesn't exhibit this behavior I'd be most grateful.
>  
>
yes, but this is only the second best solution: the best solution 
obviously would be if the serializer removes such attributes correctly, 
as this has nothing to do with standard HTML, dont you think so?


alex


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by rg...@dslextreme.com.
I'm also seeing the same behavior in 2.1. I didn't see this in 2.0.4.  To
get the html to match I have to use a namespace where I didn't in 2.0. 
The HTMLGenerator generates html using the XHTML namespace. I strip off
everything except what is in between the body tags. The first element
after the body tag is a form tag and it comes out as
<form xmlns="http://www.w3.org/1999/xhtml" ...>

If you could generate a sample stylesheet that leaves what is between the
body tags and doesn't exhibit this behavior I'd be most grateful.

Ralph



>      My guess is what you are seeing is correct behavior of your
> stylesheets.  One of the most difficult parts of XSLT is namespace
> stuff.   If you post default.xsl I _might_ be able to tell you what the
> problem is.
>
> Alexander Schatten wrote:
>
>> I use htmlreader to read html documents, convert them to xhtml, then
>> the documents are processed using xslt transformer, then html
>> serializer like this:
>>
>> <map:match pattern="**.html">
>>  <map:generate src="html/{1}.html" type="html"/>
>>   <map:transform src="style/default.xsl" type="xslt" />
>>   <map:serialize type="html" />
>> </map:match>
>>
>> so this should be clear and straightforward. unfortunately the
>> generated documents are full of statements like this
>>
>> <p xmlns="http://www.w3.org/1999/xhtml" ...
>>
>>
>>
>> This is obviously not correct as the html serializer should not
>> generate xhtml namespaces, and if available, it should remove it.
>> (Additionally in the stylesheet the <xsl:out> defines html as
>> method...)
>>
>> Can there be done something against this?
>>
>>
>> thank you again!!
>>
>>
>> Alex
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
>> For additional commands, e-mail: cocoon-users-help@xml.apache.org
>>
>>
>
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
> For additional commands, e-mail: cocoon-users-help@xml.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Charles Yates <ce...@stanford.edu>.
     My guess is what you are seeing is correct behavior of your 
stylesheets.  One of the most difficult parts of XSLT is namespace 
stuff.   If you post default.xsl I _might_ be able to tell you what the 
problem is.

Alexander Schatten wrote:

> I use htmlreader to read html documents, convert them to xhtml, then 
> the documents are processed using xslt transformer, then html 
> serializer like this:
>
> <map:match pattern="**.html">
>  <map:generate src="html/{1}.html" type="html"/>
>   <map:transform src="style/default.xsl" type="xslt" />
>   <map:serialize type="html" />
> </map:match>
>
> so this should be clear and straightforward. unfortunately the 
> generated documents are full of statements like this
>
> <p xmlns="http://www.w3.org/1999/xhtml" ...
>
>
>
> This is obviously not correct as the html serializer should not 
> generate xhtml namespaces, and if available, it should remove it. 
> (Additionally in the stylesheet the <xsl:out> defines html as method...)
>
> Can there be done something against this?
>
>
> thank you again!!
>
>
> Alex
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
> For additional commands, e-mail: cocoon-users-help@xml.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Alexander Schatten <al...@gmx.at>.
J.Pietschmann wrote:

>             ...
> Stuff like <xsl:copy> can also be a cause for the problem.
> Look carefully for xmlns="" or other redeclarations of the
> default namespace.

I have checked this now: definitly no namespaces declared, neither in 
the original html, nor in the XSL file; the only namespace declared is 
obviously the XLST namespace in the XSL file.

but it is true, I use <xsl:copy-of...>


Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Charles Yates <ce...@stanford.edu>.
I think the xmlns declarations are coming from the HTMLGenerator (and 
jtidy).  If this is not configurable AFAIK the only way to not have them 
appear in your result is not use <xsl:copy> or <xsl:copy-of>, instead do 
something like this:

<xsl:match select="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="*|@*"/>
</xsl:element>
</xsl:match>

to create new elements of the same name.

check HTMLGenerator page 
http://cocoon.apache.org/2.1/userdocs/generators/html-generator.html
and jtidy http://lempinen.net/sami/jtidy/ to see if you can configure to 
not include xmlns
(my quick look suggests not)

You may not like it but the contract for serializers does not involve 
modifing the events it receives, it expects them to be correct.  In your 
case it may not seem to be asking much to have a serializer take care of 
the problem, but this violates a basic separation of concerns and in the 
end could cause a lot of problems.

Charles

Alexander Schatten wrote:

> J.Pietschmann wrote:
>
>> Alexander Schatten wrote:
>>
>>> hower, I do not understand what you mean with:
>>>
>>>> You should not serialize XHTML as HTML,
>>>
>>>
>> ...
>>
>>> Why shouldnt I serialize XHTML as HTML? I mean, the reason why I use 
>>> Cocoon is, just because I want to serialize different sources to 
>>> correct HTML??? And therefore the html serializer is used, no?
>>
>>
>>
>> XHTML is XML, not HTML. Your result sample had an XHTML
>> namespace attached, therefore I assumed you wanted to
>> generate XHTML instead of HTML.
>>
>> It seems you have XHTML snippets embedded in the source
>> which were copied through somehow.
>>
> yes, off course I have: this is the idea of Cocoon, isn't it: inside 
> the pipeline everything is X(HT)ML; I use a XSL document that 
> generates the "frame": navigation, header, footer, and HTML generator 
> creates XHTML documents out of available HTML documents, hence parts 
> of those documents (everything inside the <BODY> in this case) is 
> copied (xsl:copy-of) into the new naviation frame, header, body 
> generated by the XSL...
>
> so in the end of the pipeline I have, as always with Cocoon (when 
> publishing to html) XHTML, this is completly clear, but then I use the 
> HTML serializer to create HTML out of this XHTML.
> So in my opinion the purpose of this serializer is (among others) to 
> remove (or modify) everything from the document that is not HTML 
> compliant and generate a correct HTML document-
>
> And this is precisly what it does not do under certain circumstances, 
> as my Code Snippet shows.
>
>
> Alex
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
> For additional commands, e-mail: cocoon-users-help@xml.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Alexander Schatten <al...@gmx.at>.
J.Pietschmann wrote:

> Alexander Schatten wrote:
>
>> hower, I do not understand what you mean with:
>>
>>> You should not serialize XHTML as HTML,
>>
> ...
>
>> Why shouldnt I serialize XHTML as HTML? I mean, the reason why I use 
>> Cocoon is, just because I want to serialize different sources to 
>> correct HTML??? And therefore the html serializer is used, no?
>
>
> XHTML is XML, not HTML. Your result sample had an XHTML
> namespace attached, therefore I assumed you wanted to
> generate XHTML instead of HTML.
>
> It seems you have XHTML snippets embedded in the source
> which were copied through somehow.
>
yes, off course I have: this is the idea of Cocoon, isn't it: inside the 
pipeline everything is X(HT)ML; I use a XSL document that generates the 
"frame": navigation, header, footer, and HTML generator creates XHTML 
documents out of available HTML documents, hence parts of those 
documents (everything inside the <BODY> in this case) is copied 
(xsl:copy-of) into the new naviation frame, header, body generated by 
the XSL...

so in the end of the pipeline I have, as always with Cocoon (when 
publishing to html) XHTML, this is completly clear, but then I use the 
HTML serializer to create HTML out of this XHTML.
So in my opinion the purpose of this serializer is (among others) to 
remove (or modify) everything from the document that is not HTML 
compliant and generate a correct HTML document-

And this is precisly what it does not do under certain circumstances, as 
my Code Snippet shows.


Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Alexander Schatten wrote:
> hower, I do not understand what you mean with:
>> You should not serialize XHTML as HTML,
...
> Why shouldnt I serialize XHTML as HTML? I mean, the reason why I use 
> Cocoon is, just because I want to serialize different sources to correct 
> HTML??? And therefore the html serializer is used, no?

XHTML is XML, not HTML. Your result sample had an XHTML
namespace attached, therefore I assumed you wanted to
generate XHTML instead of HTML.

It seems you have XHTML snippets embedded in the source
which were copied through somehow.

J.Pietschmann



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by Alexander Schatten <al...@gmx.at>.
Thank you for the hints, particularly for

"This indicates there is an redefinition for the default namespace
somewhere. This can happen if you forgot a proper declaration
somewhere in the stylesheet, or created an element using xsl:element, "

I have to check this, maybe I have forgotten something...

hower, I do not understand what you mean with:

J.Pietschmann wrote:

> Alexander Schatten wrote:
>
>>   <map:serialize type="html" />
>
>
> You should not serialize XHTML as HTML, use the XML serializer (or
> the XHTML serializer you can build youself with Saxon 7.5, which
> will improve compatibility with older browsers).

Why shouldnt I serialize XHTML as HTML? I mean, the reason why I use 
Cocoon is, just because I want to serialize different sources to correct 
HTML??? And therefore the html serializer is used, no?


thank you


Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: HTML serializer namespace problem

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Alexander Schatten wrote:
>   <map:serialize type="html" />

You should not serialize XHTML as HTML, use the XML serializer (or
the XHTML serializer you can build youself with Saxon 7.5, which
will improve compatibility with older browsers).

> so this should be clear and straightforward. unfortunately the generated 
> documents are full of statements like this
> 
> <p xmlns="http://www.w3.org/1999/xhtml" ...

This indicates there is an redefinition for the default namespace
somewhere. This can happen if you forgot a proper declaration
somewhere in the stylesheet, or created an element using xsl:element,
for example
<xsl:stylesheet xmlns:xsl="..."
   xmlns="http://www.w3.org/1999/xhtml">
   <xsl:template match="mydoc">
      <html>
        <body>
           <xsl:apply-templates/>
        </body>
      </html>
   </xsl:template>
   <xsl:template match="mysection">
      <xsl:element name="div">
         <xsl:apply-templates/>
      </xsl:element>
   </xsl:template>
   <xsl:template match="mypara">
      <p>
         <xsl:apply-templates/>
      </p>
   </xsl:template>
which will get you
   <html xmlns="http://www.w3.org/1999/xhtml">
     <body>
       <div xmlns="">
         <p xmlns="http://www.w3.org/1999/xhtml">
             ...
Stuff like <xsl:copy> can also be a cause for the problem.
Look carefully for xmlns="" or other redeclarations of the
default namespace.

> 
> in the stylesheet the <xsl:out> defines html as method...)
You mean xsl:output? It doesn't have any effect in Cocoon for
architectural reasons (at least as far as the output method
is considered)

J.Pietschmann



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org