You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@royale.apache.org by GitBox <gi...@apache.org> on 2020/11/22 17:13:51 UTC

[GitHub] [royale-asjs] estanglerbm opened a new issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

estanglerbm opened a new issue #947:
URL: https://github.com/apache/royale-asjs/issues/947


   For XML text like this (i.e. e4x raw data during an HTTPService):
   
   <myxml><mynode red=\"value1\"\r\n green=\"value2\" blue=\"value3\" \r\nyellow=\"value4\" /></myxml>
   
   The XML object will remove "\r\n " or " \r\n", causing the attributes to be squished together, throwing a parsing error.
   
   The problem is trimXMLWhitespace():
   
   `		static private function trimXMLWhitespace(value:String):String
   		{
   			return value.replace(/^\s+|\s+$/gm,'');
   		}
   `
   
   The "^" and "$" are matching up with the literal newlines (\r\n).  Perhaps the original intent was to match just the beginning / end of the entire string.
   
   A quick workaround is to either return just "value" or replace with a single space instead of blank:
   
   `		static private function trimXMLWhitespace(value:String):String
   		{
   			return value.replace(/^\s+|\s+$/gm,' ');
   		}
   `
   
   I would imagine that CRLF embedded in actual text values are also getting stripped out unintentionally, but haven't tested that.
   
   Test case:
   [TestXML.mxml.txt](https://github.com/apache/royale-asjs/files/5579860/TestXML.mxml.txt)
   
   Test exception:
   
   Uncaught Error: XML Parsing Error: not well-formed
   Location: file:///C:/src/royale-test-xml/bin/js-debug/index.html
   Line Number 1, Column 39:<sourcetext xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">&lt;parseRoot&gt;&lt;myxml&gt;&lt;mynode red="value1"green="value2" blue="value3"yellow="value4" /&gt;&lt;/myxml&gt;&lt;/parseRoot&gt;
   --------------------------------------^</sourcetext>
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] Harbs commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
Harbs commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732031694


   @greg-dove It looks like you added the trim on creation of the XML about a year ago. Do you remember why?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] estanglerbm commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
estanglerbm commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732261823


   My impression was that the places where trimXMLWhitespace() is used is solely for trimming leading and trailing space for the entire string (whether that's right or wrong), not embedded whitespace within the string (which seem to be handled already during parsing).  But I haven't traced through in detail.
   
   So something like StringUtil.trim(), as opposed to replace() matching newlines.   However, StringUtil.trim() is probably too slow.  Perhaps changing "/gm" to "/g" is enough (turn off multi-line in the regexp);  docs seem to indicate that in an example.  Or not trimming at all.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] greg-dove commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
greg-dove commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732622610


   That should be working after the next build.
   Sorry I only looked back at your last comment here after I committed the change Harbs. I did not try with non-breaking spaces etc. We probably need some additional tests to include things like that.
   
   I removed that trimming support and went with String.trim for now because it is needed in some places. I added some extra tests, here:
   
   [https://github.com/apache/royale-asjs/blob/e0dcd7b199ac559709f9a35b301e49c75aaaa6d3/frameworks/projects/XML/src/test/royale/flexUnitTests/xml/XMLTesterGeneralTest.as#L161](url)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] estanglerbm commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
estanglerbm commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-752954075


   The original issue is resolved.  The trim() fix is a slight bottleneck, but we're talking about 200ms on 15 MB of XML with many tabs to pretty-print and newlines after almost every node (see my mention of that in PR #1025), after preprocessing the data to remove "\r" characters.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] greg-dove commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
greg-dove commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732035024


   I'll try and look at this quickly tomorrow too @Harbs. I hope there is a unit test that will fail if I remove it and that will tell me why. I'll add some new tests for the described examples above and figure it out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] Harbs commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
Harbs commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732071666


   There's two places it's used during parsing. One on line 571:
   `var xmlStr:String = ignoreWhitespace ? trimXMLWhitespace("" + xml) : "" + xml;`
   and a second time on line 637:
   `if (ignoreWhitespace) xml = trimXMLWhitespace( xml) ;`
   
   I cannot see a case where the first one is needed at all. I think the second time is always called.
   
   I commented out both cases and XML tests pass with no errors.
   
   We probably do need to strip out whitespace when parsing XML if `ignoreWhitespace` is true, but the way we're doing it is very simplistic and I'm not sure it's going to work. I'm not sure what the best way is though. Walking the tree and stripping out the whitespace after it's parsed is an expensive way of doing it and I don't see a way of stripping whitespace while parsing using `DOMParser`. We definitely do need unit tests for these cases either way...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] Harbs commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
Harbs commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732263959


   IIRC, the reason we're trimming is to prevent empty text nodes. It's clearly doing the wrong thing in this case, so we need to figure out the correct way to do that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] Harbs commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
Harbs commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732078953


   FYI, here's the discussion which triggered the change:
   https://lists.apache.org/list.html?dev@royale.apache.org:2019-6:Implicit%20casts%20and%20skipAsCercions
   
   Unfortunately the paste is no longer valid.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] Harbs commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
Harbs commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732269050


   @greg-dove It looks like we're dropping empty nodes (and trimming) here:
   https://github.com/apache/royale-asjs/blob/develop/frameworks/projects/XML/src/main/royale/XML.as#L407
   
   Perhaps we can really remove both cases of `trimXMLWhitespace` during parsing.
   
   Do you agree?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [royale-asjs] Harbs commented on issue #947: XML throws parsing exception after it removes all whitespace between attributes if CRLF (\r\n) appears

Posted by GitBox <gi...@apache.org>.
Harbs commented on issue #947:
URL: https://github.com/apache/royale-asjs/issues/947#issuecomment-732272483


   Related: I'm not sure that `trim()` is doing the right thing. `trim()` treats no-break spaces as a space character, while the ECMA spec seems to only consider regular spaces. I also don't see where the ECMA spec says to remove leading and trailing space from text nodes (although I'm pretty sure that's correct)...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org