You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "stevedlawrence (via GitHub)" <gi...@apache.org> on 2024/02/20 17:05:40 UTC

Re: [PR] De-personalize Diagnostic File URLs in Compiled Schemas [daffodil]

stevedlawrence commented on code in PR #1065:
URL: https://github.com/apache/daffodil/pull/1065#discussion_r1496045236


##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -45,16 +45,42 @@ class DFDLSchemaFileLoadErrorHandler(schemaFileLocation: SchemaFileLocation)
   private def loaderErrors = loaderErrors_
   private def loaderWarnings = loaderWarnings_
 
-  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map {
-    new SchemaDefinitionError(schemaFileLocation, "Error loading schema due to %s", _)
+  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map { err =>
+    val errMessage = err.getMessage

Review Comment:
   It's worth adding a comment why we are creating a custom SchemaFileLocation here instead of using the on passed in. I believe the reason is because these errors come from Xerces and it knows the details of line/column/file information, but the schemaFileLocation passed into this class only knows the file location. Creating a custom location context based on the Xerces error gives more information.
   
   In fact, should we just remove the `schemaFileLocation` parameter from the constructor? I think that parameter is no longer used and any errors in from Xerces probably do want to use this?



##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -1150,10 +1162,15 @@ class Main(
             }
 
             val infosetType = parseOpts.infosetType.toOption.get
+            val parseOptsSchema = if (parseOpts.schema.isDefined) {
+              Option(parseOpts.schema()._2)
+            } else {
+              None
+            }
             val infosetHandler = InfosetType.getInfosetHandler(
-              parseOpts.infosetType.toOption.get,
+              infosetType,
               processor,
-              parseOpts.schema.toOption,
+              parseOptsSchema,

Review Comment:
   I think you can do something like to get the URI:
   ```scala
   parseOpts.schema.map(_._2).toOption
   ```
   
   Note that we can also change `parseOpts.infosetType.toOption.get` to just `parseOpts.infosetType()`. In fact, if we do that, I suggest we just remove the `infosetType` variable--I don't think it really adds anything over `parseOpts.infosetType()` is only used in that one spot.
   
   Another thought, it might be useful to create a case class like this
   
   ```scala
   case class SchemaOption(raw: String, uri: URI)
   ```
   
   And then have the value converter return `SchmaOption(s, uri)` instead of a tuple.
   
   Then the we can use `.raw` or `.uri` to access what we want instead of `._1` or `._2`, some examples:
   ```scala
   parseOpts.schema.map(_.uri).toOption
   ```
   
   This CLI code is so spread out that it's hard to remember what the values of tuples represent, this makes that a little easier.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -45,16 +45,42 @@ class DFDLSchemaFileLoadErrorHandler(schemaFileLocation: SchemaFileLocation)
   private def loaderErrors = loaderErrors_
   private def loaderWarnings = loaderWarnings_
 
-  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map {
-    new SchemaDefinitionError(schemaFileLocation, "Error loading schema due to %s", _)
+  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map { err =>
+    val errMessage = err.getMessage
+    val lineNumber = err.getLineNumber
+    val columnNumber = err.getColumnNumber
+    val newSchemaLocation = SchemaFileLocation(
+      Option(lineNumber.toString),
+      Option(columnNumber.toString),
+      schemaFileLocation.uriString,
+      schemaFileLocation.diagnosticFilepath,
+      schemaFileLocation.toString,
+      "Schema File",

Review Comment:
   I did a test to see what this looks like and it's something like this:
   
   > Schema context: Schema File Location line 32 column 74 in daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd
   
   I wonder if `Schema File` doesn't really add anything and can be removed? Normally this would be something like `Schema context: ex:someElement Location ...` where `ex:someElement` is helpful information, but we don't have access to that when errors come from Xerces--we just have to rely on Xerces include that in the error message.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/IIBase.scala:
##########
@@ -203,7 +203,13 @@ abstract class IIBase(
             relToAbs,
             s"Resolving relative schemaLocations absolutely is deprecated. Did you mean /$slText",
           )
-          URISchemaSource(uri)
+          val slTextUri = new URI(slText)
+          val schemaPath = if (slTextUri.isAbsolute) {
+            this.xsdArg.diagnosticFilepath

Review Comment:
   Isn't `this.xsdArg` the importing file, not the file being imported?
   
   Also, something to consider is `URI.isAbsolute` basically just tests if the URI has a scheme, it's different from the concept of "absolute" and "relative" paths. Very few schemaLocationProperties are absolute in the URI sense, and so this is just going to use the other branch.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala:
##########
@@ -139,6 +139,8 @@ final class SchemaSet private (
    */
   override lazy val uriString: String = schemaSource.uriForLoading.toString
 
+  override lazy val diagnosticFilepath: String = schemaSource.diagnosticFilepath

Review Comment:
   I'm not sure I understand this since a schemaSet is a collection of schemas, I thought. But there is a uriString above, so I'm not sure why a SchemaSet would have a uriStrig either. I guess it's not clear to me what a SchemaSet represents.



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -167,7 +178,7 @@ protected sealed abstract class NodeSchemaSourceBase(
 ) extends URISchemaSource({
     val tempSchemaFile = XMLUtils.convertNodeToTempFile(node, tmpDir.orNull, nameHint)
     val tempURI = tempSchemaFile.toURI
-    tempURI
+    (Misc.stripJarPrefix(tempURI.getPath), tempURI)

Review Comment:
   Similar, NodeSchemaSources are only used for tests, so this might not matter. Using the tempURI.getPath is probably fine for these.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/compiler/Compiler.scala:
##########
@@ -325,7 +325,7 @@ class Compiler private (
     optRootName: Option[String] = None,
     optRootNamespace: Option[String] = None,
   ): ProcessorFactory = {
-    val source = URISchemaSource(file.toURI)
+    val source = URISchemaSource((file.getPath, file.toURI))

Review Comment:
   Thoughts on using separate parameters instead of a tuple for the URISchemaSource constructor?



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/exceptions/SchemaFileLocatable.scala:
##########
@@ -35,24 +32,42 @@ trait HasSchemaFileLocation extends LookupLocation {
 }
 
 object SchemaFileLocation {
-  def apply(context: SchemaFileLocatable, tunables: DaffodilTunables) =
+  def apply(context: SchemaFileLocatable) =
     new SchemaFileLocation(
       context.lineNumber,
       context.columnNumber,
       context.uriString,
+      context.diagnosticFilepath,
       context.toString,
       context.diagnosticDebugName,
-      tunables.maxParentDirectoriesForDiagnostics,
     )
+
+  def apply(
+    lineNumber: Option[String],
+    columnNumber: Option[String],
+    uriString: String,
+    diagnosticFilepath: String,
+    contextToString: String,
+    diagnosticDebugName: String,
+  ) = {
+    new SchemaFileLocation(
+      lineNumber,
+      columnNumber,
+      uriString,
+      diagnosticFilepath,
+      contextToString,
+      diagnosticDebugName,
+    )
+  }

Review Comment:
   This essentially just makes the `SchemaFileLocation` constructor public. If we wanted to keep it that way we could create a custom `XercesSchmeaFileLocatable`. Then the places where we use this new constructor, we instead do something like
   ```scala
   val xsfl = new XerexSchemaFileLocatable(xercesError, ....)
   val schemaLocation = SchemaLocation(xsfl)
   new SchemaDefinitionError(schemaLocation, ...)
   ```
   ``` 



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/Import.scala:
##########
@@ -101,7 +101,8 @@ final class Import(importNode: Node, xsd: XMLSchemaDocument, seenArg: IIMap)
         val uri = resolver.resolveURI(ns.toString)
         if (uri == null) None
         else {
-          val res = URISchemaSource(URI.create(uri))
+          val res =
+            URISchemaSource((schemaSet.schemaSource.diagnosticFilepath, URI.create(uri)))

Review Comment:
   Is using `schemaSet.schemaSource` correct? I think `schemaSet` contains references to all included schemas, and `schemaSet.schemaSource` is the root schema. The diagnostic for this imported schema should not be the root schema.
   
   That said, I'm not sure exactly what to use here. In this case, the URI is found using the "namespace" which is pretty different from how `schemaLocation` is resolved. Might need some more thought put into this part... 



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -158,6 +167,8 @@ class InputStreamSchemaSource(
     inSrc
   }
   override def uriForLoading = tempURI
+
+  override def diagnosticFilepath: String = Misc.stripJarPrefix(tempURI.getPath)

Review Comment:
   The tempURI will never be in a jar. I think the diagnostic file path for an InputStreamSchemaSource can just be the full URI path. I believe this is only used for tests so the diganostic probably doesn't matter that much.
   
   Alternativeyl, we could allow users to pass in a diagnostic path, but I'm not sure it matters.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -137,9 +163,16 @@ final class DFDLSchemaFile(
 
   override lazy val optXMLSchemaDocument = iiParent.optXMLSchemaDocument
 
-  override lazy val uriString = schemaSource.uriForLoading.toString
+  override lazy val uriString = {
+    val uriString_ = schemaSource.uriForLoading.toString
+    uriString_
+  }

Review Comment:
   Was this added for debugging?



##########
daffodil-cli/src/test/scala/org/apache/daffodil/cli/cliTest/TestCLISaveParser.scala:
##########
@@ -318,4 +318,21 @@ class TestCLISaveParser {
     }
   }
 
+  /**
+   * Attempted to save-parser an invalid schema so we can check the diagnostic error for leaked information
+   */
+  @Test def test_CLI_Saving_SaveParser_error(): Unit = {
+    val schema = path(
+      "daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd",
+    )
+
+    withTempFile { parser =>
+      runCLI(args"save-parser -s $schema $parser") { cli =>
+        cli.expectErr("[error]")
+        cli.expectErr("Schema context")
+        cli.expectErr("daffodil-sapi")

Review Comment:
   Can we expect the full `Schema Context` line since this test is about making sure we aren't outputting absolute paths? For example, something like this:
   ```scala
   cli.expectErr("Schema context: ... daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd")
   ```
   
   This way if we something ever changes and we output `file://some/absolute/path/daffodil-sapi/src/test/....` this test will fail and catch it? Right now this test would still pass even if that happened.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocIncludesAndImportsMixin.scala:
##########
@@ -144,6 +144,12 @@ trait SchemaDocIncludesAndImportsMixin { self: XMLSchemaDocument =>
     this.uriStringFromAttribute.getOrElse("file:unknown")
   }
 
+  override lazy val diagnosticFilepath = if (self.schemaFile.isDefined) {
+    self.schemaFile.get.diagnosticFilepath
+  } else {
+    self.schemaSet.diagnosticFilepath

Review Comment:
   Mentioned elsewhere, but are we sure this is correct? Isn't this just the filepath of the root schema? I think getting information from the schemaSet might not be correct? I'm not even sure the schemaSet should have a diagnosticFilePath since it's a set of schemas?



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -137,9 +163,16 @@ final class DFDLSchemaFile(
 
   override lazy val optXMLSchemaDocument = iiParent.optXMLSchemaDocument
 
-  override lazy val uriString = schemaSource.uriForLoading.toString
+  override lazy val uriString = {
+    val uriString_ = schemaSource.uriForLoading.toString
+    uriString_
+  }
+
+  override def diagnosticFilepath = schemaSource.diagnosticFilepath
 
-  override protected lazy val diagnosticDebugNameImpl = schemaSource.uriForLoading.toString
+  override protected lazy val diagnosticDebugNameImpl = {
+    sset.schemaSource.diagnosticFilepath

Review Comment:
   Can this just be `lazy val diagnosticDebugNameImpl = diagnosticFilePath`



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/IIBase.scala:
##########
@@ -203,7 +203,13 @@ abstract class IIBase(
             relToAbs,
             s"Resolving relative schemaLocations absolutely is deprecated. Did you mean /$slText",
           )
-          URISchemaSource(uri)
+          val slTextUri = new URI(slText)
+          val schemaPath = if (slTextUri.isAbsolute) {
+            this.xsdArg.diagnosticFilepath
+          } else {
+            Misc.stripJarPrefix(slTextUri.getPath)

Review Comment:
   Looking at stripJarPrefix, this just gets everything after `.jar!` in the URI. So if slTextURI isn't found a jar (e.g. schemaLocation is relative path and the enclossingSchema is a file, then it's going to be a full path, which might not be what we want.
   
   I'm not sure what the right solution is, but this feels like we might be missing some edge cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org