You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/08/10 16:06:28 UTC

[GitHub] [daffodil] mbeckerle commented on a diff in pull request #821: Add support for new tunable "xmlOutputStyle"

mbeckerle commented on code in PR #821:
URL: https://github.com/apache/daffodil/pull/821#discussion_r942582315


##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/debugger/InteractiveDebugger.scala:
##########
@@ -987,10 +987,10 @@ class InteractiveDebugger(runner: InteractiveDebuggerRunner, eCompilers: Express
             debugPrintln(_)
           }
           res match {
-            case ie: InfosetElement => debugPrettyPrintXML(ie)
+            case ie: InfosetElement => debugPrettyPrintXML(ie, state.tunable.xmlOutputStyle)

Review Comment:
   Should not be a string. Should be an enum.  Copy technique from 
   ```
   object UnqualifiedPathStepPolicy extends Enum[UnqualifiedPathStepPolicy] {
   ```
   which is another tunable that is an Enum. 
   
   This will require a pervasive replace of all points where you have ```xmlOutputStyle: String``` replacing it with ```xmlOutputStyle: XMLOutputStyle```, and even in the Java API you have ```String xmlOutputStyle``` to fix. 
   



##########
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala:
##########
@@ -1167,4 +1168,80 @@ class TestScalaAPI {
       assertFalse(ur.isError)
       assertEquals(expectedData, bos.toString())
     }
+
+    @Test
+    def testScalaAPICDATA1(): Unit = {
+      val c = Daffodil.compiler().withTunable("xmlOutputStyle","prettyPrintSafe")

Review Comment:
   Factor out duplicate code here into a reused method in this test class. These tests all seem near identical. Just the expected result string and data file name are different. 
   Each test should look like:
   ```
   doXMLOutputStyleTest("<![CDATA[NO_WHITESPACE_AT_ALL]]>", "NO_WHITESPACE_AT_ALL")
   ````



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")

Review Comment:
   CRLFs and CR should already be replaced above in the remapped(....) call. So you don't need to do this replacement of "\\r\\n" at all. 
   
   You should not be replacing all ">" with "&gt;" because using CDATA brackets, the vast majority of "<" and ">" just get encapsulated by CDATA and can stay as is. 
   
   The only ">" you should replace with "&gt;" is the final ">" of "]]>" which is the terminator of a CDATA region. That you have to replace. So I think this wants to be ```replaceAll("]]>", "]]&gt;")``` 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")

Review Comment:
   What's this "0x" stuff? Don't think this should be modified. 



##########
daffodil-sapi/src/test/resources/test/sapi/mySchemaCDATA1.dfdl.xsd:
##########
@@ -0,0 +1,32 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+      http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<schema xmlns="http://www.w3.org/2001/XMLSchema"
+  targetNamespace="http://example.com" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://example.com">
+
+  <annotation>
+    <appinfo source="http://www.ogf.org/dfdl/">
+      <dfdl:format ref="tns:GeneralFormat" />
+    </appinfo>
+  </annotation>
+
+  <include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+  <element name="foo" type="xsd:string" dfdl:lengthKind="explicit" dfdl:length="20"/>

Review Comment:
   If you use dfdl:lengthKind='delimited' dfdl:terminator="$" then you can just add "$" to the end of your test data, and then reuse the same schema for all the tests. You won't need this different schema per test. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")
+              //Figure out what mode you have to be in
+              if(readyForCDATA(0) == '&') {
+                charEntMode = true
+              } else {
+                charEntMode = false
+                correctFormat.append(cdataIntro)
+              }
+
+              //Traverse over the string surrounding correct areas with CDATA info
+              for(c <- readyForCDATA) {
+                if(charEntMode) {
+                  correctFormat.append(c)
+                  if(c == ';'){
+                    correctFormat.append(cdataIntro)
+                    charEntMode = false
+                  }
+                } else {
+                  if(c == '&'){
+                    correctFormat.append(cdataOutro)
+                    charEntMode = true
+                  }
+                  correctFormat.append(c)
+                }
+              }
+
+              //You are done with the string. If you are still a non
+	            //char ent then close and finish up.

Review Comment:
   What if the string ends with "abcd&gt" i.e., no ";" at end of entity?
   You will create "<![CDATA[abcd]]>&gt" when really it should have been "<![CDATA[abcd&gt]]>". 
   I think your loop needs to grab whole char entities in a sub-loop, and only decide to end the CDATA with the cdataOutro and then set down the entity IF the ";" is found. If no ";" is found and end of string is hit, then you just include the characters inside the CDATA.
   
   Add a test for "abcd&gt" case.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")
+              //Figure out what mode you have to be in
+              if(readyForCDATA(0) == '&') {
+                charEntMode = true
+              } else {
+                charEntMode = false
+                correctFormat.append(cdataIntro)
+              }
+
+              //Traverse over the string surrounding correct areas with CDATA info
+              for(c <- readyForCDATA) {
+                if(charEntMode) {
+                  correctFormat.append(c)
+                  if(c == ';'){
+                    correctFormat.append(cdataIntro)
+                    charEntMode = false
+                  }
+                } else {
+                  if(c == '&'){
+                    correctFormat.append(cdataOutro)
+                    charEntMode = true
+                  }
+                  correctFormat.append(c)
+                }
+              }
+
+              //You are done with the string. If you are still a non

Review Comment:
   indentation off



##########
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala:
##########
@@ -1167,4 +1168,80 @@ class TestScalaAPI {
       assertFalse(ur.isError)
       assertEquals(expectedData, bos.toString())
     }
+
+    @Test
+    def testScalaAPICDATA1(): Unit = {
+      val c = Daffodil.compiler().withTunable("xmlOutputStyle","prettyPrintSafe")
+
+      val schemaFile = getResource("/test/sapi/mySchemaCDATA1.dfdl.xsd")
+      val pf = c.compileFile(schemaFile)
+      var dp = pf.onPath("/")
+      dp = reserializeDataProcessor(dp)
+
+      val file = getResource("/test/sapi/myDataCDATA1.dat")

Review Comment:
   You should get rid of these  ".dat" files, and just use literal strings in these tests. 
   You want to use this:
   ```
   new ByteArrayInputStream(dataString.getBytes("ascii"))
   ```



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/W3CDOMInfosetOutputter.scala:
##########
@@ -56,14 +56,56 @@ class W3CDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
 
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
+
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+          if(xmlOutputStyle == "prettyPrintSafe"){

Review Comment:
   All this logic is duplicated across JDOM, ScalaXML and W3CDOM infoset outputters. 
   Factor out so it can be shared. You may want to put a final method on InfosetOutputter for this, or make a scala object just for this, called perhaps XMLPrettyPrintSafe to hold the method. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")
+              //Figure out what mode you have to be in
+              if(readyForCDATA(0) == '&') {
+                charEntMode = true
+              } else {
+                charEntMode = false
+                correctFormat.append(cdataIntro)
+              }
+
+              //Traverse over the string surrounding correct areas with CDATA info
+              for(c <- readyForCDATA) {
+                if(charEntMode) {
+                  correctFormat.append(c)
+                  if(c == ';'){
+                    correctFormat.append(cdataIntro)
+                    charEntMode = false
+                  }
+                } else {
+                  if(c == '&'){
+                    correctFormat.append(cdataOutro)
+                    charEntMode = true
+                  }
+                  correctFormat.append(c)
+                }
+              }
+
+              //You are done with the string. If you are still a non
+	            //char ent then close and finish up.
+              if(!charEntMode){
+                correctFormat.append(cdataOutro)
+              }
+
+              correctFormat.toString()
+            }else{

Review Comment:
   
   Suggest you pull the entire set of lines for the stringWithPrettyPrintSafe situation out to a private method. 
   (I mean lines 65 to 98 roughly)
   
   Then this if-then-else nest would fit on a screen and it would be clearer what end brace goes with what condition. 
   
   Also "}else{" should have spaces like "} else {"
   



##########
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd:
##########
@@ -559,6 +559,17 @@
             </xs:restriction>
           </xs:simpleType>
         </xs:element>
+        <xs:element name="xmlOutputStyle" type="xs:string" default="none" minOccurs="0">
+          <xs:annotation>
+            <xs:documentation>
+              values is a whitespace separated list of tokens drawn from this set.
+              Values are:
+              - none: (Current behavior - ok if data is not being pretty printed, or will not be re-read in, or if whitespace is fungible in the actual data format),

Review Comment:
   Remove "Current behavior" as that is dated as soon as this gets merged. "Use if data is ..."
   
   Also add that this behavior applies only to type xs:string or types derived from it. 



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")
+              //Figure out what mode you have to be in
+              if(readyForCDATA(0) == '&') {
+                charEntMode = true

Review Comment:
   CodeCov says you need a test for this case. I'm not sure how that's possible, but it seems to be. 
   
   Add test that has to hit this like with "&#x31;" in it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org