You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by "Brian Minchau (JIRA)" <xa...@xml.apache.org> on 2005/03/14 21:03:07 UTC

[jira] Commented: (XALANJ-2070) Xalan should support XML 1.1 for input/output XML and stylesheets themselves

     [ http://issues.apache.org/jira/browse/XALANJ-2070?page=comments#action_60777 ]
     
Brian Minchau commented on XALANJ-2070:
---------------------------------------

I have reviewed the second patch, XMLSupportPatch2.txt, but I do not approve it. I have these comments:

1) Due to the serializer wanting to be independant of Xalan (though not the other way around), I recognize that classes like XMLChar and XML11Char may be absorbed into the serializer itself.  This is not ideal, but I don't know what is better.  However I think that the lesser of the evils is to have XMLChar and XML11Char in both org.apache.xml.serializer.utils and in org.apache.xml.utils

As maintanence progresses these should be kept as close to each other as possible.  The references to XMLChar and to XML11Char inside of Xalan proper (not the serializer) should use the definitions in org.apache.xml.utils.  The references to these classes within the serializer, org.apache.xml.serializer.* should use the copies inside of the serializer.

2) Add @xsl.usage internal tags to XMLChar and XML11Char inside of the serializer.

3) Remove the @author tags in XML11Char and XMLChar in all locations.

4) The patch to WriterToUTF8Buffered has a bug. Its method:
write(final char chars[], final int start, final int length)
cleverly (almost too cleverly) chunks up the incoming array
into several smaller chunks, and calls itself with those smaller chunks.  On the recursive call the chunks are small enough and it actually does some work. 

Before this patch each "character" was held in a single Java "char". With this patch a "character" can span 1 or 2 Java "char". The old chunking didn't need to worry about this:
  final int chunks = 1 + length/CHARS_MAX;
  int end_chunk = start;
  for (int chunk = 1; chunk <= chunks; chunk++) {
    int start_chunk = end_chunk;
    end_chunk = start + 
                (int) ((((long) length) * chunk) / chunks);
    // FIX UP CHUNK HERE
    int len_chunk = (end_chunk - start_chunk);
    this.write(chars,start_chunk, len_chunk);
  }

So it is possible that a pair of Java chars that make up a character span the chunk boundary. This is the bug.

I suggest that start_chunk and end_chunk be adjusted for this at the spot where I've marked "FIX UP CHUNK HERE"

   if ( (end_chunk+1) < chars.length ) ) {
     // The last Java char in the chunk
     final char c = chars[end_chunk - 1];
     if (c >= 0xD800 && c <= 0xDBFF) {
       // Include next char in this chunk, 
       // to avoid spanning a Unicode character 
       // that is in two Java chars
       end_chunk++;
     }
   }

Stress test this spanning issue by temporarily reducing
BYTES_MAX to 3 or 6.

5) I can see the need for the static methods on XML11Char to be public, but the static fields should all be private.





> Xalan should support  XML 1.1 for input/output XML and stylesheets themselves
> -----------------------------------------------------------------------------
>
>          Key: XALANJ-2070
>          URL: http://issues.apache.org/jira/browse/XALANJ-2070
>      Project: XalanJ2
>         Type: New Feature
>   Components: Serialization, XSLTC, Xalan-interpretive
>     Versions: CurrentCVS
>     Reporter: Brian Minchau
>     Assignee: Yash Talwar
>      Fix For: CurrentCVS
>  Attachments: XML11SupportPatch.txt, XML11SupportPatch2.txt
>
> Xalan should have support for input XML documents that are XML 1.1,
> for output XML documents that are 1.1, and for stylesheets that are 
> themselves XML 1.1 documents.
> The serialization parameters should support XML 1.1:
> <xsl:output method="xml" version="1.1" />
> An input XML document to a transformation should be supported:
> <?xml version="1.1" ?>
> Having a stylesheet that is itself an XML 1.1 document should be supported:
> <?xml version="1.1" ?>
> which means:
> - write out XML 1.1 writing out NEL LSEP as the end-of-line sequence
> - IRI support in namespaces, namespace URIs can now include character that are according to the specification for an IRI (this is already there because we aren't doing any checking).
> - C0 and C1 range characters are now output as numeric character references.
> However:
> - undeclaration of namespaces shouldn't be done
> - don't have character normalization like NEL or LSEP normalized to whitespace

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org