You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "stevedlawrence (via GitHub)" <gi...@apache.org> on 2023/07/21 14:25:00 UTC

[GitHub] [daffodil] stevedlawrence commented on pull request #1049: Update scala-xml to 2.2.0

stevedlawrence commented on PR #1049:
URL: https://github.com/apache/daffodil/pull/1049#issuecomment-1645676989

   Digging into what is causing this failure, changes to the scala-xml library now require that the `loadDocument` function be called to actually do the parsing in a `FactoryAdapter` implementation (which is what our `DaffodilXMLLoader` is). Here's that function:
   
   https://github.com/scala/scala-xml/blob/main/shared/src/main/scala/scala/xml/parsing/FactoryAdapter.scala#L114-L144
   
   This is because that function sets the private `xmlReader` variable that is later required to be defined when `endDocument` is called or else it throws a `NoSuchElementException`.
   
   Fortunately, it is a pretty straightforward change in the `DaffodilXMLLoader` to call `loadDocument` instead of directly calling `xmlReader.parse(..)`. It's just a one-liner change:
   ```diff
   daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/DaffodilXMLLoader.scala 
   @@ -687,9 +687,14 @@ class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
          val parser = parserFromURI(optSchemaURI)
          val xrdr = parser.getXMLReader()
          val saxSource = scala.xml.Source.fromSysId(source.uriForLoading.toString)
          try {
   -        xrdr.parse(saxSource)
   +        loadDocument(saxSource, xrdr)
          } catch {
            // can be thrown by the resolver if a schemaLocation of
            // an import/include cannot be resolved.
            // Regular Xerces doesn't report that as an error.
   ```
   
   Unfortunately, this leads to errors when that tries to validate the XML (which is the point of this bit of code), with errors like:
   > SAXParseException: cvc-elt.1.a: Cannot find the declaration of element 'tdml:testSuite'
    
   I've found that the following line in the `loadDocument` function is the culprit:
   
   https://github.com/scala/scala-xml/blob/main/shared/src/main/scala/scala/xml/parsing/FactoryAdapter.scala#L120
   
   The entity resolver is actually null, so scala-xml sets one with `setEntityResolver`, which seems to break things. Instead, the resolver must be set with this, which is what we we do in our `DaffodilXMLLoader`.
   
   ```scala
   xrdr.setProperty("http://apache.org/xml/properties/internal/entity-resolver", resolver)
   ```
   
   But I think that is Xerces specific so can't be part of scala-xml.
   
   But even with that, `setEntityResolver` being called still breaks things.
   
   I'm not sure why an `EntityResolver` set with `setEntityResolver` doesn't work, but I'm not sure of a good workaround. We could suggest a change to scala-xml to make it so it doesn't set an entity resolver at all, with the assumption that the caller does it in an appropriate way (e.g. setEntityResolver vs setProperty). Or maybe we're doing something wrong in how we set up the XML reader and a change would allow it to work via setEntityResolver?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org