You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/08/12 18:49:42 UTC

[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #821: Add support for new tunable "xmlOutputStyle"

stevedlawrence commented on code in PR #821:
URL: https://github.com/apache/daffodil/pull/821#discussion_r944740002


##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/InfosetOutputter.scala:
##########
@@ -54,11 +54,12 @@ trait InfosetOutputter {
    * @param diSimple the simple element that is started. Various fields of
    *                 DISimple can be accessed to determine things like the
    *                 value, nil, name, namespace, etc.
+   * @param xmlOutputStyle the string that determines how to print data.
    * @return true on sucess, false if there was an error and Daffodil should stop all
    *         future calls to the InfosetOutputter
    */
 
-  def startSimple(diSimple: DISimple): Boolean
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean

Review Comment:
   I'm not sure I like this change to the `InfosetOutputter`.
   
   For one, it breaks API backwards compatibility, which isn't a huge deal since we don't know of other implementations outside of Daffodil, but it's still not great.
   
   Though, my main gripe with this is that this CDATA stuff is so specific to the `XMLTextInfosetOutputter`. The `JSONInfosetOutputter` definitely doesn't need it, and other outputters create actual java objects, and so whitespace isn't really an issue for them unless they are pretty printed. But in that case the pretty printers can figure out this logic which is outside the scope of Daffodil.
   
   Because this is specific to the `XMLTextInfosetOutputter`, I would suggest that this should instead be a parameter passed to its constructor, and it's up to the caller how to figure out how that gets in there. For API users that's easy since they can just supply the new parameter. For CLI users, it's a bit more tricky since we don't yet have a way for the CLI to provide options to infoset outputters. This is a similar issue as [DAFFODIL-2234](https://issues.apache.org/jira/browse/DAFFODIL-2234), so maybe we figure that out separately, and this is only a feature for API users until then?
   
   This would also man that you don't have to pass this xmlOutputStyle around all over the place just to get it to the one outputter that actually cares about it. It should reduce the changes quite a bit.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", "&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")
+              //Figure out what mode you have to be in
+              if(readyForCDATA(0) == '&') {
+                charEntMode = true
+              } else {
+                charEntMode = false
+                correctFormat.append(cdataIntro)
+              }
+
+              //Traverse over the string surrounding correct areas with CDATA info
+              for(c <- readyForCDATA) {
+                if(charEntMode) {
+                  correctFormat.append(c)
+                  if(c == ';'){
+                    correctFormat.append(cdataIntro)
+                    charEntMode = false
+                  }
+                } else {
+                  if(c == '&'){
+                    correctFormat.append(cdataOutro)
+                    charEntMode = true
+                  }
+                  correctFormat.append(c)
+                }
+              }
+
+              //You are done with the string. If you are still a non
+	            //char ent then close and finish up.
+              if(!charEntMode){
+                correctFormat.append(cdataOutro)
+              }
+

Review Comment:
   There's a lot of special logic and edge cases here needed to convert a string for CDATA. Mike found some potential edge cases we are missing.
   
   I wonder if we can use one of the XML library dependencies we have to perform this logic. I would be surprised if one of them doesn't already have logic for something like this. Let's not reinvent the wheel if possible.



##########
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala:
##########
@@ -1167,4 +1168,80 @@ class TestScalaAPI {
       assertFalse(ur.isError)
       assertEquals(expectedData, bos.toString())
     }
+
+    @Test
+    def testScalaAPICDATA1(): Unit = {
+      val c = Daffodil.compiler().withTunable("xmlOutputStyle","prettyPrintSafe")
+
+      val schemaFile = getResource("/test/sapi/mySchemaCDATA1.dfdl.xsd")
+      val pf = c.compileFile(schemaFile)
+      var dp = pf.onPath("/")
+      dp = reserializeDataProcessor(dp)
+
+      val file = getResource("/test/sapi/myDataCDATA1.dat")

Review Comment:
   One thing to I try to remember is to also specify a charset when converting a string to bytes. We've had issues where systems with different languages get a different character set and cause test failures. So this would instead be something like:
   ```scala
   new ByteArrayInputStream(dataString.getBytes("ascii", StandardCharsets.UTF_8))
   ```



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/W3CDOMInfosetOutputter.scala:
##########
@@ -56,14 +56,56 @@ class W3CDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
 
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
+
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+          if(xmlOutputStyle == "prettyPrintSafe"){

Review Comment:
   Do those even need this logic? Since those are all Java objects there really isn't a need to convert things to CDATA. The only case where CDATA would be need is if those are pretty printed to strings, but that seems unlikely. The whole reason to use those objects is to avoid issues with pretty printed XML.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org