You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/04 01:46:30 UTC

[GitHub] [daffodil] mbeckerle opened a new pull request #495: Patterns to match strings with newlines.

mbeckerle opened a new pull request #495:
URL: https://github.com/apache/daffodil/pull/495


   Tests added illustrating this.
   
   Diagnostic messages about pattern facets improved
   to include things like &#xE000; when that's part of the pattern facet.
   
   Some cleanup of XMLUtils which was unclear to me whether it was doing the right thing. (It was)
   
   DAFFODIL-2474 
   
   https://issues.apache.org/jira/browse/DAFFODIL-2474


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587659476



##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -702,6 +706,17 @@ object XMLUtils {
     res
   }
 
+  def isXMLIllegalChar(i: Int) = {
+    // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
+    i match {
+      case 0x9 | 0xA | 0xD => false
+      case z if (z >= 0x20 && z <= 0xD7FF) => false
+      case z if (z >= 0xE000 && z <= 0xFFFD ) => false
+      case z if (z >= 0x10000 && z <= 0x10FFFF) => false
+      case _ => true
+    }
+  }
+

Review comment:
       I will remove this.
   
   This is me once again not remembering that I already wrote something long ago. I started writing it again, then found what I needed. 
   
   I may add some scaladoc to these escapers, pua mappers, etc. to explain their usage, what they do (and don't do). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587657267



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section05/facets/PatternRanges.tdml
##########
@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:ex="http://example.com"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  defaultRoundTrip="onePass"
+  defaultValidation="on">
+
+  <tdml:defineSchema name="s">
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+    <dfdl:format ref="ex:GeneralFormat"/>
+
+    <xs:simpleType name="singleByteCharsWithCodepointLessThan1F"
+                    dfdl:encoding="iso-8859-1">
+      <xs:restriction base="xs:string">
+        <!--
+        Using Daffodil's E000 remapping for the XML illegal characters from 00 to 08,
+        09 and 0A are legal, but we must use \t and \n for them since the pattern facet is an attribute value.
+        Using E000 remapping for XML illegal characters  0B and 0C
+        0D is legal (though remapped to LF), but we must use \r for that.
+        Using E000 remapping for XML illegal characters 0E to 1F
+        20 to 7F are legal.
+        -->
+        <!--
+          We cannot depend on Daffodil's mapping from E009 to 9, E00A to A, and E00D to D.
+          Because that won't work in Xerces with "full" validation.
+          We can't use numeric entities for the tab, LF, or CR, because those aren't allowed in attribute
+          values. So we use \t, \n, and \r for those control characters.
+          -->
+        <xs:pattern value="[&#xE000;-&#xE008;\t\n&#xE00B;&#xE00C;\r&#xE00E;-&#xE01F;&#x20;-&#x7F;]*"/>
+      </xs:restriction>
+    </xs:simpleType>
+
+
+    <xs:element name="str" type="ex:singleByteCharsWithCodepointLessThan1F"
+       dfdl:lengthKind="delimited"/>

Review comment:
       I'll add a comment about this not having a delimiter, so uses end-of-file. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587655436



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section05/facets/PatternRanges.tdml
##########
@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:ex="http://example.com"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  defaultRoundTrip="onePass"
+  defaultValidation="on">
+
+  <tdml:defineSchema name="s">
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+    <dfdl:format ref="ex:GeneralFormat"/>
+
+    <xs:simpleType name="singleByteCharsWithCodepointLessThan1F"
+                    dfdl:encoding="iso-8859-1">
+      <xs:restriction base="xs:string">
+        <!--
+        Using Daffodil's E000 remapping for the XML illegal characters from 00 to 08,
+        09 and 0A are legal, but we must use \t and \n for them since the pattern facet is an attribute value.
+        Using E000 remapping for XML illegal characters  0B and 0C
+        0D is legal (though remapped to LF), but we must use \r for that.
+        Using E000 remapping for XML illegal characters 0E to 1F
+        20 to 7F are legal.
+        -->
+        <!--
+          We cannot depend on Daffodil's mapping from E009 to 9, E00A to A, and E00D to D.
+          Because that won't work in Xerces with "full" validation.
+          We can't use numeric entities for the tab, LF, or CR, because those aren't allowed in attribute
+          values. So we use \t, \n, and \r for those control characters.
+          -->
+        <xs:pattern value="[&#xE000;-&#xE008;\t\n&#xE00B;&#xE00C;\r&#xE00E;-&#xE01F;&#x20;-&#x7F;]*"/>

Review comment:
       I'll take a look and fix. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587654901



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/dsom/Facets.scala
##########
@@ -160,7 +162,7 @@ trait Facets { self: Restriction =>
           // The XSD numeric character entity &#xE000; can be used to match ASCII NUL
           // (char code 0).
           //
-          val remapped = XMLUtils.remapPUAToXMLIllegalCharacters(v)
+          val remapped: String = XMLUtils.remapPUAToXMLIllegalCharacters(v)

Review comment:
       Yes, I needed these in order to figure out what was going on here. Type aliases were confusing IDEA. I tend to leave in that sort of thing when it's needed to improve understanding, why take it back out. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587657803



##########
File path: daffodil-test/src/test/scala/org/apache/daffodil/section05/facets/TestPatternRanges.scala
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.section05.facets
+
+import org.apache.daffodil.tdml.Runner
+import org.junit.AfterClass
+import org.junit.Test
+
+object TestPatternRanges {
+  lazy val runner = Runner("/org/apache/daffodil/section05/facets", "PatternRanges.tdml")
+
+  @AfterClass def shutDown = {
+    runner.reset
+  }
+}
+
+import org.apache.daffodil.section05.facets.TestPatternRanges._
+
+class TestPatternRanges {
+

Review comment:
       I'll hoist this in. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587656475



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section05/facets/PatternRanges.tdml
##########
@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:ex="http://example.com"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  defaultRoundTrip="onePass"
+  defaultValidation="on">
+
+  <tdml:defineSchema name="s">
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+    <dfdl:format ref="ex:GeneralFormat"/>
+
+    <xs:simpleType name="singleByteCharsWithCodepointLessThan1F"
+                    dfdl:encoding="iso-8859-1">
+      <xs:restriction base="xs:string">
+        <!--
+        Using Daffodil's E000 remapping for the XML illegal characters from 00 to 08,
+        09 and 0A are legal, but we must use \t and \n for them since the pattern facet is an attribute value.
+        Using E000 remapping for XML illegal characters  0B and 0C
+        0D is legal (though remapped to LF), but we must use \r for that.
+        Using E000 remapping for XML illegal characters 0E to 1F
+        20 to 7F are legal.
+        -->
+        <!--
+          We cannot depend on Daffodil's mapping from E009 to 9, E00A to A, and E00D to D.
+          Because that won't work in Xerces with "full" validation.
+          We can't use numeric entities for the tab, LF, or CR, because those aren't allowed in attribute
+          values. So we use \t, \n, and \r for those control characters.
+          -->
+        <xs:pattern value="[&#xE000;-&#xE008;\t\n&#xE00B;&#xE00C;\r&#xE00E;-&#xE01F;&#x20;-&#x7F;]*"/>
+      </xs:restriction>
+    </xs:simpleType>
+
+
+    <xs:element name="str" type="ex:singleByteCharsWithCodepointLessThan1F"
+       dfdl:lengthKind="delimited"/>

Review comment:
       Yes, but endOfParent isn't supported yet by Daffodil. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587433059



##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -702,6 +706,17 @@ object XMLUtils {
     res
   }
 
+  def isXMLIllegalChar(i: Int) = {
+    // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
+    i match {
+      case 0x9 | 0xA | 0xD => false
+      case z if (z >= 0x20 && z <= 0xD7FF) => false
+      case z if (z >= 0xE000 && z <= 0xFFFD ) => false
+      case z if (z >= 0x10000 && z <= 0x10FFFF) => false
+      case _ => true
+    }
+  }
+

Review comment:
       Doesn't look like this is used anywhere? Is this jus a helpful tool that we may need in the future, or was the intentino to may replace some of the above code with it?
   
   Might also be worth modifying the comment to say "legal characters: ..." or something. Since the function is called isXMLIllegalChars I assumed that comment was the list of illegal charactes and was a bit confused.
   
   Also, just an FYI, I was curious to see how this match/catch compiled to bytecode compared to if-statements, and they end up exactly the same. Not too surprised, but good to know.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle merged pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
mbeckerle merged pull request #495:
URL: https://github.com/apache/daffodil/pull/495


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] tuxji commented on a change in pull request #495: Patterns to match strings with newlines.

Posted by GitBox <gi...@apache.org>.
tuxji commented on a change in pull request #495:
URL: https://github.com/apache/daffodil/pull/495#discussion_r587509679



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/dsom/Facets.scala
##########
@@ -160,7 +162,7 @@ trait Facets { self: Restriction =>
           // The XSD numeric character entity &#xE000; can be used to match ASCII NUL
           // (char code 0).
           //
-          val remapped = XMLUtils.remapPUAToXMLIllegalCharacters(v)
+          val remapped: String = XMLUtils.remapPUAToXMLIllegalCharacters(v)

Review comment:
       I'm curious about these explicit types (lines 133, 136, 165).  What was the reason you added these explicit types?  I thought the Scala compiler inferred types for local values fairly well although sometimes IntelliJ IDEA gets a little confused itself (is that why?).

##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section05/facets/PatternRanges.tdml
##########
@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:ex="http://example.com"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  defaultRoundTrip="onePass"
+  defaultValidation="on">
+
+  <tdml:defineSchema name="s">
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+    <dfdl:format ref="ex:GeneralFormat"/>
+
+    <xs:simpleType name="singleByteCharsWithCodepointLessThan1F"
+                    dfdl:encoding="iso-8859-1">
+      <xs:restriction base="xs:string">
+        <!--
+        Using Daffodil's E000 remapping for the XML illegal characters from 00 to 08,
+        09 and 0A are legal, but we must use \t and \n for them since the pattern facet is an attribute value.
+        Using E000 remapping for XML illegal characters  0B and 0C
+        0D is legal (though remapped to LF), but we must use \r for that.
+        Using E000 remapping for XML illegal characters 0E to 1F
+        20 to 7F are legal.
+        -->
+        <!--
+          We cannot depend on Daffodil's mapping from E009 to 9, E00A to A, and E00D to D.
+          Because that won't work in Xerces with "full" validation.
+          We can't use numeric entities for the tab, LF, or CR, because those aren't allowed in attribute
+          values. So we use \t, \n, and \r for those control characters.
+          -->
+        <xs:pattern value="[&#xE000;-&#xE008;\t\n&#xE00B;&#xE00C;\r&#xE00E;-&#xE01F;&#x20;-&#x7F;]*"/>
+      </xs:restriction>
+    </xs:simpleType>
+
+
+    <xs:element name="str" type="ex:singleByteCharsWithCodepointLessThan1F"
+       dfdl:lengthKind="delimited"/>

Review comment:
       I was a bit puzzled at seeing dfdl:lengthKind `delimited` on a solitary root element until I read the DFDL spec and realized the end of the data stream delimits an element as well.  However, the spec says we also can use the dfdl:lengthKind `endOfParent` to allow the last element to consume data up to the end of the data stream; might `endOfParent` be more self-documenting too?

##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section05/facets/PatternRanges.tdml
##########
@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:ex="http://example.com"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  defaultRoundTrip="onePass"
+  defaultValidation="on">
+
+  <tdml:defineSchema name="s">
+    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+    <dfdl:format ref="ex:GeneralFormat"/>
+
+    <xs:simpleType name="singleByteCharsWithCodepointLessThan1F"
+                    dfdl:encoding="iso-8859-1">
+      <xs:restriction base="xs:string">
+        <!--
+        Using Daffodil's E000 remapping for the XML illegal characters from 00 to 08,
+        09 and 0A are legal, but we must use \t and \n for them since the pattern facet is an attribute value.
+        Using E000 remapping for XML illegal characters  0B and 0C
+        0D is legal (though remapped to LF), but we must use \r for that.
+        Using E000 remapping for XML illegal characters 0E to 1F
+        20 to 7F are legal.
+        -->
+        <!--
+          We cannot depend on Daffodil's mapping from E009 to 9, E00A to A, and E00D to D.
+          Because that won't work in Xerces with "full" validation.
+          We can't use numeric entities for the tab, LF, or CR, because those aren't allowed in attribute
+          values. So we use \t, \n, and \r for those control characters.
+          -->
+        <xs:pattern value="[&#xE000;-&#xE008;\t\n&#xE00B;&#xE00C;\r&#xE00E;-&#xE01F;&#x20;-&#x7F;]*"/>

Review comment:
       You have two redundant comments here - did you plan to keep both comments or merge them?

##########
File path: daffodil-test/src/test/scala/org/apache/daffodil/section05/facets/TestPatternRanges.scala
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.section05.facets
+
+import org.apache.daffodil.tdml.Runner
+import org.junit.AfterClass
+import org.junit.Test
+
+object TestPatternRanges {
+  lazy val runner = Runner("/org/apache/daffodil/section05/facets", "PatternRanges.tdml")
+
+  @AfterClass def shutDown = {
+    runner.reset
+  }
+}
+
+import org.apache.daffodil.section05.facets.TestPatternRanges._
+
+class TestPatternRanges {
+

Review comment:
       Most TDML tests would put a relative import at this line instead of a absolute-package import at line 31.

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -702,6 +706,17 @@ object XMLUtils {
     res
   }
 
+  def isXMLIllegalChar(i: Int) = {
+    // #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
+    i match {
+      case 0x9 | 0xA | 0xD => false
+      case z if (z >= 0x20 && z <= 0xD7FF) => false
+      case z if (z >= 0xE000 && z <= 0xFFFD ) => false
+      case z if (z >= 0x10000 && z <= 0x10FFFF) => false
+      case _ => true
+    }
+  }
+

Review comment:
       Yes, I'm curious too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org