You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "A B (JIRA)" <ji...@apache.org> on 2006/11/21 18:27:02 UTC

[jira] Created: (DERBY-2106) Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.

Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.
-----------------------------------------------------------------------------------------------------------

                 Key: DERBY-2106
                 URL: http://issues.apache.org/jira/browse/DERBY-2106
             Project: Derby
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 10.2.1.6, 10.2.1.8, 10.3.0.0
            Reporter: A B
            Priority: Minor


Derby uses Apache Xalan to serialize XML data values.  As part of the serialization process Xalan converts the newline character ("\n") to a platform-specific line ending.  This conversion of line endings is allowed by XML serialization rules and therefore is not a bug in Xalan--see XALANJ-1137 for some discussion along those lines.  That said, though, this particular behavior means that an application which uses Derby to serialize XML values can end up with different characters on different platforms.  And further, since Derby currently writes serialized XML to disk, this means that insertion of an XML value on one platform (such as Windows) can lead to different line-ending characters on disk than insertion of that exact same XML value on another platform (such as Linux).

Discussion on the derby-dev list seems to indicate (based on lack of comments to the contrary) that this behavior in Derby is not a "bug" per se, but that it might be nice if Derby could somehow account for Xalan's treatment of newlines to provide consistent XML serialization results across platforms.  The relevant thread is here:

  http://thread.gmane.org/gmane.comp.apache.db.derby.devel/33170/focus=33170 

As indicated in that thread, one simple (but not fully tested) approach is to make a change in the "serializeToString()" method of SqlXmlUtil.java to do an explicit replacement of platform-specific line-endings with a simple newline.  Something like:

+        String eol = PropertyUtil.getSystemProperty("line.separator");
+        if (eol != null)
+            return sWriter.toString().replaceAll(eol, "\n");

        return sWriter.toString();

This small change seems to provide consistent results across all platforms, and appears to work correctly even if line-endings are hard-coded in the XML file (ex. if the literal "\r\n" occurs in the XML file, the above code will *not* replace it, which is good).  However, internal modification of user-supplied data is generally a risky proposal, so more testing would be needed for this particular approach.

Also, any changes to Derby serialization as a part of this issue would need to consider backward-compatibility issues--namely, how would the changes affect XML files that have already been inserted into the database (and therefore that already have platform-specific endings)?  Ideally treatment of existing and new XML data would be consistent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-2106) Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.

Posted by "Daniel John Debrunner (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-2106?page=comments#action_12451828 ] 
            
Daniel John Debrunner commented on DERBY-2106:
----------------------------------------------

I'm trying to clear my thoughts on this issue. Forgetting about how XML values are stored within Derby, or how an application uses XML values I'm concentrating on the pure SQL operator XMLSERIALIZE. Please see this as a dump of thoughts.

The XMLSERIALIZE() operator serializes an XML value to a character string type within Derby.
Character types within Derby are always a sequence of UniCode characters.

The behaviour of the XMLSERIALIZE operator is defined by the SQL standard, which refers off to the page Army referenced:

http://www.w3.org/TR/xslt-xquery-serialization/

Section 5.1.2 of that link is the section with the comment about new line characters, and it refers to an encoding.

>From the SQL/XML spec (6.7 GR 2d) the encoding is the character set of the target datatype, which is UniCode characters for Derby.

So this expression

XMLSERIALIZE( XMLPARSE (DOCUMENT '<copy>&#169; ASF 2006</copy>' PRESERVE WHITESPACE)AS VARCHAR(100));

returns a VARCHAR that includes the Unicode character for copyright symbol (0x00A9) instead of the six characters '&#169;'.

And indeed, Derby does that. :-)

A new-line in UniCode characters is represented by the '\n' character. I would assert this is what any new-line must map to when using XMLSERIALIZE().
The behaviour of XMLSERIALIZE should not be affected by what platform Derby is running on. I think I still though need to see what the SQL/XML and/or XML rules are on input of a XML document with newlines.

> Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2106
>                 URL: http://issues.apache.org/jira/browse/DERBY-2106
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.2.1.8
>            Reporter: A B
>            Priority: Minor
>
> Derby uses Apache Xalan to serialize XML data values.  As part of the serialization process Xalan converts the newline character ("\n") to a platform-specific line ending.  This conversion of line endings is allowed by XML serialization rules and therefore is not a bug in Xalan--see XALANJ-1137 for some discussion along those lines.  That said, though, this particular behavior means that an application which uses Derby to serialize XML values can end up with different characters on different platforms.  And further, since Derby currently writes serialized XML to disk, this means that insertion of an XML value on one platform (such as Windows) can lead to different line-ending characters on disk than insertion of that exact same XML value on another platform (such as Linux).
> Discussion on the derby-dev list seems to indicate (based on lack of comments to the contrary) that this behavior in Derby is not a "bug" per se, but that it might be nice if Derby could somehow account for Xalan's treatment of newlines to provide consistent XML serialization results across platforms.  The relevant thread is here:
>   http://thread.gmane.org/gmane.comp.apache.db.derby.devel/33170/focus=33170 
> As indicated in that thread, one simple (but not fully tested) approach is to make a change in the "serializeToString()" method of SqlXmlUtil.java to do an explicit replacement of platform-specific line-endings with a simple newline.  Something like:
> +        String eol = PropertyUtil.getSystemProperty("line.separator");
> +        if (eol != null)
> +            return sWriter.toString().replaceAll(eol, "\n");
>         return sWriter.toString();
> This small change seems to provide consistent results across all platforms, and appears to work correctly even if line-endings are hard-coded in the XML file (ex. if the literal "\r\n" occurs in the XML file, the above code will *not* replace it, which is good).  However, internal modification of user-supplied data is generally a risky proposal, so more testing would be needed for this particular approach.
> Also, any changes to Derby serialization as a part of this issue would need to consider backward-compatibility issues--namely, how would the changes affect XML files that have already been inserted into the database (and therefore that already have platform-specific endings)?  Ideally treatment of existing and new XML data would be consistent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-2106) Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.

Posted by "A B (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-2106?page=comments#action_12451837 ] 
            
A B commented on DERBY-2106:
----------------------------

> Just as confused as I was at the beginning. :-(

I think we took different routes but ended up at the same place.  Thank you (a ton) for taking the time to look at the specs and offering your input, though.  I definitely appreciate it!

Instead of staring at this wall of specs for an indefinite period of time, I decided to just fix the test for now (as part of DERBY-1758) and therefore filed this as a separate issue.  Hopefully if enough people look at it we can come to some general agreement as to what the preferred solution should be...

Thanks again for looking at this, Dan.

> Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2106
>                 URL: http://issues.apache.org/jira/browse/DERBY-2106
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.2.1.8
>            Reporter: A B
>            Priority: Minor
>
> Derby uses Apache Xalan to serialize XML data values.  As part of the serialization process Xalan converts the newline character ("\n") to a platform-specific line ending.  This conversion of line endings is allowed by XML serialization rules and therefore is not a bug in Xalan--see XALANJ-1137 for some discussion along those lines.  That said, though, this particular behavior means that an application which uses Derby to serialize XML values can end up with different characters on different platforms.  And further, since Derby currently writes serialized XML to disk, this means that insertion of an XML value on one platform (such as Windows) can lead to different line-ending characters on disk than insertion of that exact same XML value on another platform (such as Linux).
> Discussion on the derby-dev list seems to indicate (based on lack of comments to the contrary) that this behavior in Derby is not a "bug" per se, but that it might be nice if Derby could somehow account for Xalan's treatment of newlines to provide consistent XML serialization results across platforms.  The relevant thread is here:
>   http://thread.gmane.org/gmane.comp.apache.db.derby.devel/33170/focus=33170 
> As indicated in that thread, one simple (but not fully tested) approach is to make a change in the "serializeToString()" method of SqlXmlUtil.java to do an explicit replacement of platform-specific line-endings with a simple newline.  Something like:
> +        String eol = PropertyUtil.getSystemProperty("line.separator");
> +        if (eol != null)
> +            return sWriter.toString().replaceAll(eol, "\n");
>         return sWriter.toString();
> This small change seems to provide consistent results across all platforms, and appears to work correctly even if line-endings are hard-coded in the XML file (ex. if the literal "\r\n" occurs in the XML file, the above code will *not* replace it, which is good).  However, internal modification of user-supplied data is generally a risky proposal, so more testing would be needed for this particular approach.
> Also, any changes to Derby serialization as a part of this issue would need to consider backward-compatibility issues--namely, how would the changes affect XML files that have already been inserted into the database (and therefore that already have platform-specific endings)?  Ideally treatment of existing and new XML data would be consistent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-2106) Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.

Posted by "Daniel John Debrunner (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-2106?page=comments#action_12451832 ] 
            
Daniel John Debrunner commented on DERBY-2106:
----------------------------------------------

Though I have to say the phrase Army highlighted:

"When outputting a newline character in the instance of the data model, the serializer is free to represent it using any character sequence that will be normalized to a newline character by an XML parser"

together with the SQL Standard saying that it does not mandate any expression as being deterministic, only "possibly non-deterministic"
does allow for Derby to have its current behaviour.

Just as confused as I was at the beginning. :-(

> Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2106
>                 URL: http://issues.apache.org/jira/browse/DERBY-2106
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.2.1.8
>            Reporter: A B
>            Priority: Minor
>
> Derby uses Apache Xalan to serialize XML data values.  As part of the serialization process Xalan converts the newline character ("\n") to a platform-specific line ending.  This conversion of line endings is allowed by XML serialization rules and therefore is not a bug in Xalan--see XALANJ-1137 for some discussion along those lines.  That said, though, this particular behavior means that an application which uses Derby to serialize XML values can end up with different characters on different platforms.  And further, since Derby currently writes serialized XML to disk, this means that insertion of an XML value on one platform (such as Windows) can lead to different line-ending characters on disk than insertion of that exact same XML value on another platform (such as Linux).
> Discussion on the derby-dev list seems to indicate (based on lack of comments to the contrary) that this behavior in Derby is not a "bug" per se, but that it might be nice if Derby could somehow account for Xalan's treatment of newlines to provide consistent XML serialization results across platforms.  The relevant thread is here:
>   http://thread.gmane.org/gmane.comp.apache.db.derby.devel/33170/focus=33170 
> As indicated in that thread, one simple (but not fully tested) approach is to make a change in the "serializeToString()" method of SqlXmlUtil.java to do an explicit replacement of platform-specific line-endings with a simple newline.  Something like:
> +        String eol = PropertyUtil.getSystemProperty("line.separator");
> +        if (eol != null)
> +            return sWriter.toString().replaceAll(eol, "\n");
>         return sWriter.toString();
> This small change seems to provide consistent results across all platforms, and appears to work correctly even if line-endings are hard-coded in the XML file (ex. if the literal "\r\n" occurs in the XML file, the above code will *not* replace it, which is good).  However, internal modification of user-supplied data is generally a risky proposal, so more testing would be needed for this particular approach.
> Also, any changes to Derby serialization as a part of this issue would need to consider backward-compatibility issues--namely, how would the changes affect XML files that have already been inserted into the database (and therefore that already have platform-specific endings)?  Ideally treatment of existing and new XML data would be consistent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-2106) Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.

Posted by "Daniel John Debrunner (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-2106?page=comments#action_12451830 ] 
            
Daniel John Debrunner commented on DERBY-2106:
----------------------------------------------

XML processing says the 'XML processor MUST behave' as though all line endings have been converted to a 'single #xA character' (which is '\n')

http://www.w3.org/TR/REC-xml/#sec-line-ends

I would say that means the XML processor in Derby should behave such that new-lines are converted to a single '\n' character when using XMLSERIALIZE.

> Improve Derby SQL/XML processing to account for Xalan's use of platform-specific newlines when serializing.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2106
>                 URL: http://issues.apache.org/jira/browse/DERBY-2106
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.2.1.8
>            Reporter: A B
>            Priority: Minor
>
> Derby uses Apache Xalan to serialize XML data values.  As part of the serialization process Xalan converts the newline character ("\n") to a platform-specific line ending.  This conversion of line endings is allowed by XML serialization rules and therefore is not a bug in Xalan--see XALANJ-1137 for some discussion along those lines.  That said, though, this particular behavior means that an application which uses Derby to serialize XML values can end up with different characters on different platforms.  And further, since Derby currently writes serialized XML to disk, this means that insertion of an XML value on one platform (such as Windows) can lead to different line-ending characters on disk than insertion of that exact same XML value on another platform (such as Linux).
> Discussion on the derby-dev list seems to indicate (based on lack of comments to the contrary) that this behavior in Derby is not a "bug" per se, but that it might be nice if Derby could somehow account for Xalan's treatment of newlines to provide consistent XML serialization results across platforms.  The relevant thread is here:
>   http://thread.gmane.org/gmane.comp.apache.db.derby.devel/33170/focus=33170 
> As indicated in that thread, one simple (but not fully tested) approach is to make a change in the "serializeToString()" method of SqlXmlUtil.java to do an explicit replacement of platform-specific line-endings with a simple newline.  Something like:
> +        String eol = PropertyUtil.getSystemProperty("line.separator");
> +        if (eol != null)
> +            return sWriter.toString().replaceAll(eol, "\n");
>         return sWriter.toString();
> This small change seems to provide consistent results across all platforms, and appears to work correctly even if line-endings are hard-coded in the XML file (ex. if the literal "\r\n" occurs in the XML file, the above code will *not* replace it, which is good).  However, internal modification of user-supplied data is generally a risky proposal, so more testing would be needed for this particular approach.
> Also, any changes to Derby serialization as a part of this issue would need to consider backward-compatibility issues--namely, how would the changes affect XML files that have already been inserted into the database (and therefore that already have platform-specific endings)?  Ideally treatment of existing and new XML data would be consistent.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira