You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Ushakov, Sergey N" <us...@int.com.ru> on 2005/06/17 06:14:14 UTC

XALANJ: occasional namespace corruption on result elements

Hi,

I've come across a case when default namespace is corrupted occasionally on
some of the result elements.

I use Xalan-J command line tool for transforming MSWord-generated XML
documents into formal XML data structures. The resulting document normally
contains a tree of proprietary namespace elements with some of them
populated with XHTML. XHTML is the default namespace.

The problem is that some of the proprietary result elements (that normally
inherit XHTML namespace as the default namespace) occasionally get an extra
explicit namespace attribute: xmlns="" . Another typical sort of corruption
is appearance of explicit XHTML namespace declaration in contexts where one
would expect it to be implicitly inherited.

The result elements are generated using <xsl:element> with no namespace
attributes. The phenomenon shows up depending on input MSWord data: some
input documents provide this namespace corruption, some of them do not. Some
of the result elements on the same nesting level are corrupted more often
than the others. The phenomenon seems to be stable from run to run:
subsequent runs give the same result every time.  My observation is that
namespace corruption is more likely to happen when relatively deep recursion
takes place.

Frankly I can't imagine any fault of mine in this problem. This is somehow
confirmed by the fact that MS XML engine does not produce this problem on
the same data...

Neither have I an idea of how to distill the problem due to the dependency
on input data. And the data and the stylesheet are too big to be posted on
the list...

If anybody is in a position to have a look - I have placed all the data at
http://www.chemitech.ru/xslt/ . Any feedback is welcome :)

Regards,
Sergey Ushakov

PS Some comments on the xslt stylsheet for those who may wish to have a
look. The stylesheet uses the "select-and-build" processing approach.
Meaningful source tree branches are selected first, and then the result tree
is built using the selected source data. The latter is processed by a
mechanism that uses deep recursion to convert Word-style lists into
html-style lists. So the main part of the processing takes place in
recursion over source text paragraph lists.


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org


Re: XALANJ: occasional namespace corruption on result elements - distilled

Posted by da...@us.ibm.com.
> Two kinds of namespace corruption events are observed:
> - excessive default namespace redeclaration on nested elements that
> duplicates the top-level default namespace declaration;

Ugly, but not necessarily fatal.

> - unexpected default namespace undeclaration on result elements that
> _follow_ the result nodes that show the first problem.

That's a bug, because the p element child has the wrong namespace URI.

> 
> The test files illustrating the problem are appended below.
> 
> Does it look like a bug that deserves being filed? Or do I miss 
something?

Xalan-C 1.9 produces the following output:

<?xml version="1.0" encoding="UTF-8"?><t:root 
xmlns:t="urn:some-target-namespace" xmlns="http://www.w3.org/1999/xhtml">
<t:child-1>
<ul>
<li>qqq</li>
</ul>
</t:child-1>
<t:child-2>
<p>www</p>
</t:child-2>
</t:root>

That's identical to the output produced by Saxon 6.5.3 and MS XSL 3.0 
(except for difference to do indenting).

You should file a Jira report and attach the sample input document and 
stylesheet document.

Thanks!

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org


XALANJ: occasional namespace corruption on result elements - distilled

Posted by "Ushakov, Sergey N" <us...@int.com.ru>.
Hi, I have managed to distill the problem that I have reported before. The
result is that the problem is not "occasional" but quite reproducible.

Two kinds of namespace corruption events are observed:
- excessive default namespace redeclaration on nested elements that
duplicates the top-level default namespace declaration;
- unexpected default namespace undeclaration on result elements that
_follow_ the result nodes that show the first problem.

The test files illustrating the problem are appended below.

Does it look like a bug that deserves being filed? Or do I miss something?

Regards,
Sergey Ushakov


input xml file:
---------------
<?xml version="1.0" encoding="UTF-8" ?>
<s:some-root  xmlns:s="urn:some-source-namespace">
  <s:p>qqq</s:p>
</s:some-root>

xsl file:
---------
<?xml version="1.0" encoding="windows-1251" ?>
<xsl:transform version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:s="urn:some-source-namespace"
    xmlns:t="urn:some-target-namespace"
    xmlns="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="s"
    >
  <xsl:output method="xml" indent="yes" />
  <xsl:template match="/">
    <t:root>
      <t:child-1>
        <xsl:variable name="accum0" />
        <xsl:variable name="this-chain">
          <xsl:for-each select="/s:some-root/s:p[1]/node()"> <!-- important:
for-each !!! -->
            <xsl:value-of select="." />
            </xsl:for-each>
          </xsl:variable>
        <xsl:variable name="accum">
          <xsl:copy-of select="$accum0" /> <!-- important: copy-of !!! -->
          </xsl:variable>
        <xsl:variable name="last-chain" select="$this-chain" />
        <ul>
          <xsl:copy-of select="$accum" />
          <xsl:if test="$last-chain">
            <li><xsl:copy-of select="$last-chain" /> <!-- important: copy-of
!!! --></li>
            </xsl:if>
          </ul>
        </t:child-1>
      <t:child-2>
        <p>www</p>
        </t:child-2>
      </t:root>
    </xsl:template>
  </xsl:transform>

result xml file:
----------------
<?xml version="1.0" encoding="UTF-8"?>
<t:root xmlns:t="urn:some-target-namespace"
xmlns="http://www.w3.org/1999/xhtml">
<t:child-1>
<ul><li xmlns="http://www.w3.org/1999/xhtml">qqq</li>
</ul>
</t:child-1>
<t:child-2 xmlns="">
<p>www</p>
</t:child-2>
</t:root>



----- Original Message -----
From: "Ushakov, Sergey N" <us...@int.com.ru>
To: <xa...@xml.apache.org>
Sent: Friday, June 17, 2005 8:14 AM
Subject: XALANJ: occasional namespace corruption on result elements


Hi,

I've come across a case when default namespace is corrupted occasionally on
some of the result elements.

I use Xalan-J command line tool for transforming MSWord-generated XML
documents into formal XML data structures. The resulting document normally
contains a tree of proprietary namespace elements with some of them
populated with XHTML. XHTML is the default namespace.

The problem is that some of the proprietary result elements (that normally
inherit XHTML namespace as the default namespace) occasionally get an extra
explicit namespace attribute: xmlns="" . Another typical sort of corruption
is appearance of explicit XHTML namespace declaration in contexts where one
would expect it to be implicitly inherited.

The result elements are generated using <xsl:element> with no namespace
attributes. The phenomenon shows up depending on input MSWord data: some
input documents provide this namespace corruption, some of them do not. Some
of the result elements on the same nesting level are corrupted more often
than the others. The phenomenon seems to be stable from run to run:
subsequent runs give the same result every time.  My observation is that
namespace corruption is more likely to happen when relatively deep recursion
takes place.

Frankly I can't imagine any fault of mine in this problem. This is somehow
confirmed by the fact that MS XML engine does not produce this problem on
the same data...

Neither have I an idea of how to distill the problem due to the dependency
on input data. And the data and the stylesheet are too big to be posted on
the list...

If anybody is in a position to have a look - I have placed all the data at
http://www.chemitech.ru/xslt/ . Any feedback is welcome :)

Regards,
Sergey Ushakov

PS Some comments on the xslt stylsheet for those who may wish to have a
look. The stylesheet uses the "select-and-build" processing approach.
Meaningful source tree branches are selected first, and then the result tree
is built using the selected source data. The latter is processed by a
mechanism that uses deep recursion to convert Word-style lists into
html-style lists. So the main part of the processing takes place in
recursion over source text paragraph lists.


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org