You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/31 15:17:05 UTC

[GitHub] [daffodil] mbeckerle opened a new pull request #519: WIP on doctype restriction

mbeckerle opened a new pull request #519:
URL: https://github.com/apache/daffodil/pull/519


   DAFFODIL-1422 is a ticket about restricting the XML we accept so that we do not allow DOCTYPE declarations in it.
   This is a security related provision, as use of DOCTYPEs leaves XML loaders subject to a variety of problems such as documents exploding in size as the DOCTYPE declarations are expanded. 
   DOCTYPEs are an old obsolete idea, and simply disallowing them entirely is the best option.
   
   There are other jira tickets about disallowing resolvers and loaders from dereferencing URIs treating them as internet URLs. 
   
   The upshot of all this is that "loading" XML is tricky, and needs to be done carefully via a centralized library that provides the various options different loading requires, while not exposing/allowing the security vulnerabilities. 
   
   This change set does not yet include establishing a single central library that all XML loading goes through. 
   
   Right now this is at the point where it is apparent such a library is needed, because there are too many places that are invoking XML loaders for them to just all be "done the right way". Under maintenance this is too likely to drift. 
   
   A key starting point is to survey every place Daffodil does XML loading. These include loading of:
   
   * DFDL schemas and the include/import schema files they reference
   ** This can include DFDL schemas, but also XSD for annotation languages (e.g., schematron annotations)
   ** Note that this validating loader is loading a schema, but loading it not as a schema, but as ordinary XML. This should be validated against the schema for DFDL schemas. 
   * XML Infosets being unparsed
   ** (Currently not a validating loader)
   * TDML files for testing - this can in turn lead to loading of DFDL schemas
   ** Test cases can load XML Infoset files. 
   ** validation here involves validating defineSchema elements which contain DFDL schema. 
   * Config files
   * daffodil-propgen loads the XSD files for DFDL annotations, tunables, etc.
   * Xerces validator loads a schema, and the include/import files it references
   * Xerces validator loads XML being validated 
   
   This may not be a comprehensive list. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on pull request #519: WIP on doctype restriction

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on pull request #519:
URL: https://github.com/apache/daffodil/pull/519#issuecomment-833781688


   Note: if we disable DOCTYPE we are disabling general entities, as those can only be supplied in a doctype. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle closed pull request #519: WIP on doctype restriction

Posted by GitBox <gi...@apache.org>.
mbeckerle closed pull request #519:
URL: https://github.com/apache/daffodil/pull/519


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #519: WIP on doctype restriction

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #519:
URL: https://github.com/apache/daffodil/pull/519#discussion_r628389004



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -807,7 +807,18 @@ object Main extends Logging {
           case Left(bytes) => new ByteArrayInputStream(bytes)
           case Right(is) => is
         }
-        scala.xml.XML.load(is)
+        val parser: SAXParser = {
+          val f = SAXParserFactory.newInstance()
+          f.setNamespaceAware(false)
+          val p = f.newSAXParser()
+          p
+        }
+        //
+        // FIXME: This needs to use a Daffodil-created loader, so that we can
+        // insure that the XMLReader used from it has the setSecureDefaults treatment.
+        //
+        System.err.println("FIXME: Needs to use a Daffodil-created loader.")

Review comment:
       Still to fix. 

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/externalvars/ExternalVariablesXMLValidator.scala
##########
@@ -17,22 +17,31 @@
 
 package org.apache.daffodil.externalvars
 
+import org.apache.daffodil.util.Misc
+import org.apache.daffodil.xml.DFDLCatalogResolver
+import org.apache.daffodil.xml.XMLUtils
+import org.apache.xerces.jaxp.validation.XMLSchemaFactory
+
 import javax.xml.transform.stream.StreamSource
 import org.xml.sax.SAXException
+
 import java.io.File
 
 object ExternalVariablesValidator {
 
   final val extVarXsd = {
-    val stream = this.getClass().getResourceAsStream("/xsd/dafext.xsd")
+    val uri = Misc.getRequiredResource("org/apache/daffodil/xsd/dafext.xsd")
+    val stream = uri.toURL.openStream()
     stream
   }
 
   def validate(xmlFile: File): Either[java.lang.Throwable, _] = {
     try {
-      val factory = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+      val factory = new XMLSchemaFactory()
+      factory.setResourceResolver(DFDLCatalogResolver.get)
       val schema = factory.newSchema(new StreamSource(extVarXsd))
       val validator = schema.newValidator()
+      validator.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)

Review comment:
       move to XMLUtils.setSecureDefaults(v: Validator) ??

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/api/DaffodilSchemaSource.scala
##########
@@ -109,6 +113,27 @@ class URISchemaSource protected (val fileOrResource: URI) extends DaffodilSchema
   }
 }
 
+/**
+ * Convenient for testing. Allows creating XML strings for loading that contain things
+ * not supported by scala XML syntax like DOCTYPE decls.
+ *
+ * @param str - string that is loaded as XML.
+ */
+case class StringSchemaSource(str: String)

Review comment:
       If for testing only, can live in src/test/scala

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/externalvars/ExternalVariablesLoader.scala
##########
@@ -66,13 +68,15 @@ object ExternalVariablesLoader {
 
   def fileToBindings(file: File): Queue[Binding] = {
     Assert.usage(file ne null)
-    ExternalVariablesValidator.validate(file) match {
+    val validatorResult = ExternalVariablesValidator.validate(file)
+    validatorResult match {
       case Left(ex) => Assert.abort(ex)
       case Right(_) => // Success
     }
     val enc = determineEncoding(file) // The encoding is needed for ConstructingParser
     val input = scala.io.Source.fromURI(file.toURI)(enc)
-    val node = ConstructingParser.fromSource(input, true).document.docElem

Review comment:
       Add comment that this was changed from the Constructing loader because of a unexpected error, and reusing the DaffodlXMLLoader works, no need for a special case loader for external variable bindings. 

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -450,6 +450,64 @@ object XMLUtils {
   val SAX_NAMESPACES_FEATURE = "http://xml.org/sax/features/namespaces"
   val SAX_NAMESPACE_PREFIXES_FEATURE = "http://xml.org/sax/features/namespace-prefixes"
 
+  /**
+   * Always enable this feature (which disables doctypes).
+   */
+  val XML_DISALLOW_DOCTYPE_FEATURE = "http://apache.org/xml/features/disallow-doctype-decl"
+
+  /**
+   * Always disable this. Might not be necessary if doctypes are disallowed.
+   */
+  val XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE = "http://xml.org/sax/features/external-parameter-entities"
+
+  /**
+   * Always disable this. Might not be necessary if doctypes are disallowed.
+   */
+  val XML_EXTERNAL_GENERAL_ENTITIES_FEATURE = "http://xml.org/sax/features/external-general-entities"
+
+  /**
+   * Sets properties that disable insecure XML reader behaviors.
+   * @param xmlReader - the reader to change feature settings on.
+   */
+  def setSecureDefaults(xmlReader: XMLReader) : Unit = {
+    try {
+      xmlReader.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+      xmlReader.setFeature(XMLUtils.XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE, false)

Review comment:
       Add comment that these next 2 restrictions are not strictly speaking necessary, as they are implied by disallowing doctypes. 

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -450,6 +450,64 @@ object XMLUtils {
   val SAX_NAMESPACES_FEATURE = "http://xml.org/sax/features/namespaces"
   val SAX_NAMESPACE_PREFIXES_FEATURE = "http://xml.org/sax/features/namespace-prefixes"
 
+  /**
+   * Always enable this feature (which disables doctypes).
+   */
+  val XML_DISALLOW_DOCTYPE_FEATURE = "http://apache.org/xml/features/disallow-doctype-decl"
+
+  /**
+   * Always disable this. Might not be necessary if doctypes are disallowed.
+   */
+  val XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE = "http://xml.org/sax/features/external-parameter-entities"
+
+  /**
+   * Always disable this. Might not be necessary if doctypes are disallowed.
+   */
+  val XML_EXTERNAL_GENERAL_ENTITIES_FEATURE = "http://xml.org/sax/features/external-general-entities"
+
+  /**
+   * Sets properties that disable insecure XML reader behaviors.
+   * @param xmlReader - the reader to change feature settings on.
+   */
+  def setSecureDefaults(xmlReader: XMLReader) : Unit = {
+    try {
+      xmlReader.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+      xmlReader.setFeature(XMLUtils.XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE, false)
+      xmlReader.setFeature(XMLUtils.XML_EXTERNAL_GENERAL_ENTITIES_FEATURE, false)
+      // System.err.println("SAX XMLReader supports disallow properties: " + xmlReader)
+    } catch {
+      case e: SAXNotRecognizedException => {
+        // System.err.println("xmlReader: " + e.getMessage()) // good place for a breakpoint
+        throw e
+      }
+    }
+  }
+
+//  def setSecureDefaults(schemaFactory: org.apache.xerces.jaxp.validation.XMLSchemaFactory) : Unit = {

Review comment:
       Remove commented code. Add comment above that the disallowing of doctypes works for XMLReader and Validator, but not XMLSchemaFactory. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on pull request #519: WIP on doctype restriction

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on pull request #519:
URL: https://github.com/apache/daffodil/pull/519#issuecomment-829553867


   We need tests that use doctype decls and general entities and verify that we reject them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on pull request #519: WIP on doctype restriction

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on pull request #519:
URL: https://github.com/apache/daffodil/pull/519#issuecomment-838686013


   Closing. Will open a new PR for this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org