You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/05 21:28:45 UTC

[GitHub] [daffodil] mbeckerle commented on a change in pull request #478: Feature Aware DaffodilParseXMLReader and Prefixes Fixes

mbeckerle commented on a change in pull request #478:
URL: https://github.com/apache/daffodil/pull/478#discussion_r588612770



##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/QNameBase.scala
##########
@@ -385,8 +387,8 @@ final case class LocalDeclQName(prefix: Option[String], local: String, namespace
 /**
  * QName for a global declaration or definition (of element, type, group, format, etc.)
  */
-final case class GlobalQName(prefix: Option[String], local: String, namespace: NS)
-  extends NamedQName(prefix, local, namespace) {
+final case class GlobalQName(lexicalPrefix: Option[String], local: String, namespace: NS)

Review comment:
       This feels improperly named to me. When you define something that is named, you never specify a prefix. The namespace is the target namespace of the schema, and there is no prefix specified. The target namespace doesn't even have to have *any* prefix defined. Other schemas that import/include this schema *may* supply a prefix most likely they do, but they could define it to be the default namespace and have no other means of access in a RefQName to one of the items in this schema. 
   
   Only RefQNames have lexicalPrefix. GlobalQNames never have a prefix, or if they do, it's because we're choosing one of the prefixes for the target namespace that we prefer. 
   
   So perhaps this paremter to GlobalQName should be called preferredTnsPrefix ?
   
   When we create a local element decl, the namespace is either no-namespace, or the TNS depending on the form of the element (or elementFormDefault of the schema, if the element has no form attribute, which is typical. I've never actually seen the form attribute used, and I'm not sure we support the form attribute. We do support elementFormDefault on the xs:schema however.) Again there is no explicit prefix. So I think the constructor for NamedQName should call it's prefix parameter preferredTnsPrefix. 

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -66,13 +88,74 @@ class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty: Boolean
 
   override def startPrefixMapping(prefix: String, uri: String): Unit = {
     val _prefix = if (prefix == "") null else prefix
-    prefixMapping = NamespaceBinding(_prefix, uri, prefixMapping)
+    globalPrefixMapping = NamespaceBinding(_prefix, uri, globalPrefixMapping)

Review comment:
       As we call this startPrefixMapping repeatedly, the globalPrefixMapping var is no longer really the global one is it? 
   Or is this only called at the top level before we start handling elements?
   
   I expected this to be named currentPrefixMapping. 

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -812,6 +812,8 @@ object XMLUtils {
     actual: Node,
     ignoreProcInstr: Boolean = true,
     checkPrefixes: Boolean = false,
+    // checkNamespaces isn't order specific; it only checks that they have the same (element-wise)

Review comment:
       What does "element wise" namespaces mean? Maybe just say same set of namespace bindings (unordered). 

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -24,15 +24,37 @@ import scala.xml.NamespaceBinding
 
 import org.apache.daffodil.infoset.XMLInfosetOutputter
 import org.apache.daffodil.util.Indentable
+import org.apache.daffodil.util.MStackOf
 import org.apache.daffodil.util.MStackOfBoolean
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import org.apache.daffodil.xml.XMLUtils
 import org.xml.sax.Attributes
 import org.xml.sax.ContentHandler
 import org.xml.sax.Locator
 
 class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty: Boolean = false)

Review comment:
       Whole class Scala doc please. This is a SAX content handler for when you want to write textualized XML to a java output stream from SAX events created by Daffodil's parser. 
   
   Question. Why does this have Daffodil in its name. Isn't this a generic thing that woudl work with any source of SAX events? If that's the case, isn't there one of these in Xerces, or Apache Commons XML or something? 

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/QNameBase.scala
##########
@@ -425,8 +427,8 @@ final case class RefQName(prefix: Option[String], local: String, namespace: NS)
     }
   }
 
-  def toStepQName = StepQName(prefix, local, namespace)
-  def toGlobalQName = GlobalQName(prefix, local, namespace)
+  def toStepQName = StepQName(lexicalPrefix, local, namespace)
+  def toGlobalQName = GlobalQName(lexicalPrefix, local, namespace)

Review comment:
       Worth a coment here that for GlobalQNames, comparison of a RefQName to a GlobalQName to see if they match ignores the prefix. The only use of this lexicalPrefix parameter should be in diagnostic messages. 
   
   There's an argument that this should always be supplied as None, not the lexicalPrefix. But most schemas don't play subtle namespace games, so that may lead to diagnostic messages that are more verbose (using namespaces) when the prefix would have been more meaningful to the user. So I think keep the lexicalPrefix here anyway. 

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -66,13 +88,74 @@ class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty: Boolean
 
   override def startPrefixMapping(prefix: String, uri: String): Unit = {
     val _prefix = if (prefix == "") null else prefix
-    prefixMapping = NamespaceBinding(_prefix, uri, prefixMapping)
+    globalPrefixMapping = NamespaceBinding(_prefix, uri, globalPrefixMapping)
+    currentElementPrefixMapping = NamespaceBinding(_prefix, uri, currentElementPrefixMapping)
   }
 
   override def endPrefixMapping(prefix: String): Unit = {
     // do nothing
   }
 
+  /**
+   * Uses Attributes, which is passed in to the startElement callback, to extract prefix mappings
+   * and other attributes and return a tuple of an updated globalPrefixMapping and a Seq of attribute=val
+   * String pairings
+   */
+  def processAttributes(atts: Attributes, prefixMapping: NamespaceBinding): (NamespaceBinding, Seq[String]) = {
+    var pm = prefixMapping
+    var attrPairings: Seq[String] = Seq()
+    var i = 0
+    var mappingsList: Seq[(String, String)] = Seq()
+    while (i < atts.getLength) {
+      val qName = atts.getQName(i)
+      val attrVal =  atts.getValue(i)
+      if (qName.nonEmpty) { //if qName is populated; as in when namespacePrefixes == true
+        if (qName.startsWith("xmlns:") || qName == "xmlns")  { // describing namespace mapping
+          // get prefix
+          val pre = if (qName.startsWith("xmlns:")) {
+            qName.substring(6)
+          } else {
+            null
+          }
+          // we make this call to check if the prefix already exists. If it doesn't exist, we get the
+          // npe so we can add it, but if it does exist, nothing happens and it doesn't get re-added
+          // and we instead proceed to the next item in Attributes
+          val prefixUri = XMLUtils.getURIOption(pm, pre)
+          prefixUri match {
+            case Some(_) => // already exists; do nothing
+            case None => // add it
+              mappingsList +:= (pre, attrVal)
+          }
+        } else {
+          // regular attribute
+          attrPairings +:= s" $qName=" + "\"" + s"${attrVal}" + "\""

Review comment:
       Can this happen? DFDL doesn't have attributes. Coverage says this isn't tested. Can this even happen? 
   Maybe this is Assert.invariantFailed("DFDL Infoset XML shouldn't have attributes.")
   
   Maybe we need an example with both ns bindings on an element and xsi:type="xs:string" or xsi:nil="true" so that this will be covered?

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetOutputter.scala
##########
@@ -144,29 +149,25 @@ class SAXInfosetOutputter(xmlReader: DFDL.DaffodilParseXMLReader)
   }
 
   private def doStartPrefixMapping(diElem: DIElement, contentHandler: ContentHandler): Unit = {
-    val nsbStart = diElem.erd.minimizedScope
-    val nsbEnd = if (diElem.isRoot) {
-      scala.xml.TopScope
-    } else {
-      diElem.diParent.erd.minimizedScope
-    }
+    val (nsbStart: NamespaceBinding, nsbEnd: NamespaceBinding) = getNsbStartAndEnd(diElem)
     var n = nsbStart
+    var mappingsList: Seq[(String, String)] = Seq()
     while (n != nsbEnd && n != null && n != scala.xml.TopScope) {
       val prefix = if (n.prefix == null) "" else n.prefix
       val uri = if (n.uri == null) "" else n.uri
-      contentHandler.startPrefixMapping(prefix, uri)
+      // we generate a list here by prepending so can build the prefixMapping in the
+      // same order as it is in minimizedScope when we call startPrefixMapping; we do this because
+      // getPrefix in the contentHandler is order dependent
+      mappingsList +:= (prefix, uri)
       n = n.parent
     }
+    mappingsList.foreach{ case (prefix, uri) =>
+      contentHandler.startPrefixMapping(prefix, uri)
+    }
   }
 
   private def doEndPrefixMapping(diElem: DIElement, contentHandler: ContentHandler): Unit = {
-    val nsbStart = diElem.erd.minimizedScope
-    val nsbEnd = if (diElem.isRoot) {
-      scala.xml.TopScope
-    } else {
-      diElem.diParent.erd
-        .minimizedScope
-    }
+    val (nsbStart: NamespaceBinding, nsbEnd: NamespaceBinding) = getNsbStartAndEnd(diElem)
     var n = nsbStart

Review comment:
       I think this code is depending on iterating down nsbStart's bindings eventually gets you to nsbEnd. I.e., nsbEnd is the tail of the nsbStart chain. 
   
   I am not sure what == and != mean on NamespaceBinding objects. But they might be doing elaborate comparisons to see if they define equivalent scopes (all same bindings, just different order). 
   
   I think your while could be while ( (n ne nsbEnd && .... i.e., using pointer equality here. This may actually be more correct than some sort of binding-scope equality, given what your algorthm is trying to achieve here which is to one by one remove the bindings that were added to the front. 
   

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetOutputter.scala
##########
@@ -175,26 +176,78 @@ class SAXInfosetOutputter(xmlReader: DFDL.DaffodilParseXMLReader)
     }
   }
 
-  private def doStartElement(diElem: DIElement, contentHandler: ContentHandler): Unit = {
-    val (ns: String, elemName: String, qName: String) = getNameSpaceElemNameAndQName(diElem)
-    doStartPrefixMapping(diElem, contentHandler)
+  /**
+   * Add the prefixes and uris from the element's NamespaceBinding to Attributes,
+   * when namespacePrefixes feature is true
+   */
+  private def doAttributesPrefixMapping(diElem: DIElement, attrs: AttributesImpl): AttributesImpl = {
+    val (nsbStart: NamespaceBinding, nsbEnd: NamespaceBinding) = getNsbStartAndEnd(diElem)
+    var n = nsbStart
+    while (n != nsbEnd && n != null && n != scala.xml.TopScope) {
+      val prefix = if (n.prefix == null) "xmlns" else s"xmlns:${n.prefix}"
+      val uri = if (n.uri == null) "" else n.uri
+      // uri and localname are always empty for NamespaceBinding attributes
+      attrs.addAttribute("", "", prefix, "CDATA", uri)
+      n = n.parent
+    }
+    attrs
+  }
 
-    val attrs = if (isNilled(diElem)) {
-      createNilAttribute()
+  private def getNsbStartAndEnd(diElem: DIElement) = {

Review comment:
       Is there any requirement about sharing between nsbStart and nsbEnd here?
   Seems to me we're depending on these being somehow one an extension of the other. Is that the case?
   
   then we should state that in a comment, and maybe check it as an Assert.invariant. 
   
   Alternatively, where minimized scope is computed, the invariant that each element's minimized scope is an extension of its parents minimized scope could be checked there. (maybe it is checked there?)
   
   I think your code is depending on this for correctness, so just let's be explicit about this sharing. 
   
   What I'm wondering about is if you compare two NamespaceBinding objects with ==, does it check for "equivalence" of the namespace bindings? I.e, are they the same list, then true else sort both sets of bindings, etc. etc.  Algorrithmically here, I think you are treating NamespaceBindings as linked chains, and when comparing them I think you basically always want eq or ne, not == or !=. 
   
   The Scala XML library really made a big mistake with these things. Namespace bindings should have been an ordinary List of Binding objects where a binding is just a single prefix + NS pair. So many idioms would have worked out better that way. Ordinary mapping, finding, etc. Instead we have these collection like things that are not standard scala collections. 

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/QNameBase.scala
##########
@@ -224,8 +224,7 @@ trait QNameBase extends Serializable {
    * must show up in diagnostic messages. Mistakes by having the wrong prefix
    * or by omitting one, are very common.
    */
-  def prefix: Option[String]
-  def prefixOrNull: String = prefix.orNull
+  def lexicalPrefix: Option[String]

Review comment:
       I think your scaladoc isn't complete here.
   
   I think you mean by lexical prefix, the prefix that was taken off of a QName reference to a named thing. 
   A QName is a:b where Some("a") is the prefix, or it's just b, in which case None is the lexicalPrefix.
   
   One more ambiguity to clarify.
   If a schema defines several prefixes for a namespace, which will be the lexicalPrefix?
   
   I've seen lots of things like
   ```
   targetNamespace="http://pcap.com"
   xmlns:pcap="http://pcap.com"
   xmlns:tns="http://pcap.com"
   xmlns="http://pcap.com"
   ```
   So the default namespace is the same as tns is the same as pcap. So a QName reference given this default such as
   ```
   <xs:element ref="foobar"/>
   ```
   will be to the pcap.com URL, but what will be the lexical prefix in this case? None, or Some("pcap") or Some("tns"), or undetermined, but one of those?
   

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseXMLReader.scala
##########
@@ -43,35 +42,28 @@ class DaffodilParseXMLReader(dp: DataProcessor) extends DFDL.DaffodilParseXMLRea
   private var errorHandler: ErrorHandler = _
   private var dtdHandler: DTDHandler = _
   private var entityResolver: EntityResolver = _
-  var saxParseResultPropertyValue: ParseResult = _
-  var saxBlobDirectoryPropertyValue: Path = Paths.get(System.getProperty("java.io.tmpdir"))
-  var saxBlobPrefixPropertyValue: String = "daffodil-sax-"
-  var saxBlobSuffixPropertyValue: String = ".blob"
-
-  private val featureMap = mutable.Map[String, Boolean](
-    XMLUtils.SAX_NAMESPACES_FEATURE -> false,
-    XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE -> false
-  )
+  private var saxParseResultPropertyValue: ParseResult = _

Review comment:
       Class needs scaladoc. What is a DaffodilParseXMLReader? It reads data, creates SAX events from it, and calls the supplied content handler with the events yes?
   
   I am somehow now thinking this class is used only for testing, because it constructs and uses (when parse is called) a SAXInfosetOutputter which we use to test unparsing.  Is that right? 
   
   I'm sort of losing track of what is test rig and what is core SAX algorithm. If this used for testing only, can it live in src/test ?

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -66,13 +88,74 @@ class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty: Boolean
 
   override def startPrefixMapping(prefix: String, uri: String): Unit = {
     val _prefix = if (prefix == "") null else prefix
-    prefixMapping = NamespaceBinding(_prefix, uri, prefixMapping)
+    globalPrefixMapping = NamespaceBinding(_prefix, uri, globalPrefixMapping)
+    currentElementPrefixMapping = NamespaceBinding(_prefix, uri, currentElementPrefixMapping)
   }
 
   override def endPrefixMapping(prefix: String): Unit = {
     // do nothing
   }
 
+  /**
+   * Uses Attributes, which is passed in to the startElement callback, to extract prefix mappings
+   * and other attributes and return a tuple of an updated globalPrefixMapping and a Seq of attribute=val
+   * String pairings
+   */
+  def processAttributes(atts: Attributes, prefixMapping: NamespaceBinding): (NamespaceBinding, Seq[String]) = {
+    var pm = prefixMapping
+    var attrPairings: Seq[String] = Seq()
+    var i = 0
+    var mappingsList: Seq[(String, String)] = Seq()
+    while (i < atts.getLength) {
+      val qName = atts.getQName(i)
+      val attrVal =  atts.getValue(i)
+      if (qName.nonEmpty) { //if qName is populated; as in when namespacePrefixes == true
+        if (qName.startsWith("xmlns:") || qName == "xmlns")  { // describing namespace mapping
+          // get prefix
+          val pre = if (qName.startsWith("xmlns:")) {
+            qName.substring(6)
+          } else {
+            null
+          }
+          // we make this call to check if the prefix already exists. If it doesn't exist, we get the
+          // npe so we can add it, but if it does exist, nothing happens and it doesn't get re-added

Review comment:
       Type npe here does not mean null pointer exception it is supposed to be None I think. 
   Unless this really does throw an NPE which is a really bad API, and we should change it. 

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -116,9 +247,16 @@ class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty: Boolean
       }
     }
     writer.write("</")
-    writer.write(qName)
+    outputTagName(uri, localName, qName)
     writer.write(">")
     outputNewlineStack.pop()
+
+    if (!globalPrefixMappingContextStack.isEmpty) globalPrefixMappingContextStack.pop
+    if (globalPrefixMappingContextStack.isEmpty) {
+      globalPrefixMapping = null
+    } else {
+      globalPrefixMapping = globalPrefixMappingContextStack.top
+    }

Review comment:
       Ok. I didn't see a single thing in that class that has anything to do with DFDL/Daffodil. It's a generic SAX even to text content handler. Can't we just use one and not have to maintain this code?
   
   If not, what did I miss?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org