You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/09/02 19:46:43 UTC

[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #836: Exi fixes

stevedlawrence commented on code in PR #836:
URL: https://github.com/apache/daffodil/pull/836#discussion_r961951075


##########
daffodil-cli/src/it/scala/org/apache/daffodil/parsing/TestCLIParsing.scala:
##########
@@ -1300,28 +1300,6 @@ class TestCLIparsing {
     }
   }
 
-  @Test def test_XXX_CLI_Parsing_SimpleParse_exi(): Unit = {
-
-    val schemaFile = Util.daffodilPath("daffodil-test/src/test/resources/org/apache/daffodil/section00/general/generalSchema.dfdl.xsd")
-    val (testSchemaFile) = if (Util.isWindows) (Util.cmdConvert(schemaFile)) else (schemaFile)
-
-    val shell = Util.start("")
-
-    try {
-      val cmd = String.format(Util.echoN("Hello") + "| %s parse -I exi -s %s -r e1 | md5sum", Util.binPath, testSchemaFile)
-
-      shell.sendLine(cmd)
-      shell.expect(contains("937b3f96ee0b5cd1ac9f537cf8ddc580"))
-
-      Util.expectExitCode(ExitCode.Success, shell)
-
-      shell.send("exit\n")
-      shell.expect(eof)
-    } finally {
-      shell.close()
-    }
-  }
-

Review Comment:
   Why remove his EXI test? Isn't this the only test that verifies EXI?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetOutputter.scala:
##########
@@ -63,8 +63,7 @@ class SAXInfosetOutputter(xmlReader: DFDL.DaffodilParseXMLReader,
       if (diSimple.hasValue) {
         val text =
           if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-            val s = remapped(diSimple.dataValueAsString)
-            scala.xml.Utility.escape(s)
+            remapped(diSimple.dataValueAsString)

Review Comment:
    Do we need to remove this same escaping for other XML infoset outputters that create objects? I think the Scala, W3C, and JDOM infoset outputters do the same thing. The InfosetInputters probably need to be changed to not unescape things as well?
   
   That doesn't necessarily need to be done in this commit, but we should create a bug to investigate it.



##########
daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala:
##########
@@ -1458,9 +1463,11 @@ object Main {
     data: InputSourceDataInputStream,
     saxContentHandler: ContentHandler): ParseResult = {
     val saxXmlRdr = processor.newXMLReaderInstance
-    saxXmlRdr.setContentHandler(saxContentHandler)
+    // SAX_NAMESPACE_PREFIXES_FEATURE is needed to preserve nil attributes with EXI
+    saxXmlRdr.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
     saxXmlRdr.setProperty(XMLUtils.DAFFODIL_SAX_URN_BLOBDIRECTORY, blobDir)
     saxXmlRdr.setProperty(XMLUtils.DAFFODIL_SAX_URN_BLOBSUFFIX, blobSuffix)
+    saxXmlRdr.setContentHandler(saxContentHandler)

Review Comment:
   Are no changes needed to the `unparseWIthSax` function for schema unaware exi unparsing? Doesn't the `EXISource` `XMLReader` need to know the grammar to be able to deserialize the EXI file?
   
    Or does it just serialize the entire grammar into the EXI file? Seems like that would defeat the purpose of schema awareness so I imagine that's not the case?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala:
##########
@@ -290,7 +290,9 @@ class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty: Boolean
   }
 
   override def characters(ch: Array[Char], start: Int, length: Int): Unit = {
-    writer.write(ch, start, length)
+    val str = new String(ch, start, length)
+    val escaped = scala.xml.Utility.escape(str)
+    writer.write(escaped.toArray, 0, escaped.length)

Review Comment:
   I believe `writer.write` has an overload that accepts a `String`. If we use that it might avoid `toArray`, which overhead with that needing to create a copy, I think?



##########
daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala:
##########
@@ -894,7 +899,7 @@ object Main {
               case Some(file) => new FileOutputStream(file)
             }
             val infosetType = parseOpts.infosetType.toOption.get
-            val eitherOutputterOrHandler = getInfosetOutputter(infosetType, output)
+            val eitherOutputterOrHandler = getInfosetOutputter(infosetType, output, parseOpts.schema.toOption)

Review Comment:
   Is there any value to have a unique infoset kind value for exi schema aware, for example `-I exi` vs `-I exi-sa`? This would allow us to create both and test for things like size and speed.



##########
daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala:
##########
@@ -695,9 +696,13 @@ object Main {
       case InfosetType.W3CDOM => Left(new W3CDOMInfosetOutputter())
       case InfosetType.SAX => Right(new DaffodilParseOutputStreamContentHandler(os, pretty=true))
       case InfosetType.EXI => {
-        val exiFactory = DefaultEXIFactory.newInstance()
-        exiFactory.getFidelityOptions.setFidelity(FidelityOptions.FEATURE_PREFIX, true)
-        val exiResult = new EXIResult()
+        val exiFactory = DefaultEXIFactory.newInstance
+        if (schemaOpt.isDefined) {
+          val grammarFactory = GrammarFactory.newInstance
+          val grammar = grammarFactory.createGrammars(schemaOpt.get.toURL.openStream(), DFDLCatalogResolver.get)

Review Comment:
   The `getInfosetOutputter` function is called for every iteration of a performance run. I imagine building this grammar is pretty expensive, so should probably be down outside the main loop and passed into this function when EXI is needed. In fact, maybe we should create the entire EXIFactory outside of the main loop and pass that into this function, so that this function only creates a new EXIResult/handler?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org