You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "stevedlawrence (via GitHub)" <gi...@apache.org> on 2023/06/22 18:05:55 UTC

[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #1037: Allows include/import schemaLocations to have relative up path steps.

stevedlawrence commented on code in PR #1037:
URL: https://github.com/apache/daffodil/pull/1037#discussion_r1238857325


##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/DaffodilXMLLoader.scala:
##########
@@ -250,7 +262,11 @@ class DFDLCatalogResolver private ()
         // use the systemId (which comes from the schemaLocation attribute)
         // and the classpath.
         val baseURI = if (baseURIString == null) None else Some(new URI(baseURIString))
-        val optURI = Misc.getResourceRelativeOption(sysId, baseURI)
+        // try it as a direct classpath resolution first, and if that fails,
+        // try removing all the upward path steps (if any).
+        val optURI = Misc.getResourceRelativeOption(sysId, baseURI).orElse(
+          Misc.getResourceRelativeOption(removeInitialUpwardPathSteps(sysId), baseURI)

Review Comment:
   I wonder if removing all up-steps might be too aggressive and could potentially hide incorrect includes?
   
   One example, say the importing schema was at `jar:file:/path/to/foo.jar!/xsd/main.dfdl.xsd` and it includes a schemaLocation of `../../../../../xsd/include.dfdl.xsd`, with the intention of including `/xsd/include.dfdl.xsd` from a different jar on the classpath. That's too many up-steps, but if `/xsd/include.dfdl.xsd` is on the classpath then it will still work with this change when using Java. But that include will not work when these schemas are extracted out of jars because the up-steps go too far up. This could potentially be confusing if this include worked in one case but not another, and it might be hard to track down the issue of being a bad include because of that?
   
   Another example, say we have`jar:file:/path/to/foo.jar!/xsd/main/main.dfdl.xsd`  and it imports `../common/formats.dfdl.xsd`, so intending to read `/xsd/common/formats.dfdl.xsd` out of the same jar because we never up-stepped out of that jar. If that fails to resolve, with this change we'll then look for `/common/formats.dfdl.xsd` on the classpath, which could potentially exist and include the wrong thing, when it should have been an error.
   
   So I guess what I'm wondering is if we need to be more careful about the default to looking at the root if something fails? If he base is a Jar URI and there are up-steps, then maybe it should be an error if we have too many up-steps? And if there are up-steps but the up-steps don't go out of the jar, then it should be an error and we shouldn't attempt to look anywhere else?
   
   I imagine this could become an enhancement to `getResourceRelativeOption`? For Jar URIs, if up-steps go too far it returns None, if up-steps go exactly to the jar it looks on the classpath (maybe looking at the current jar first?), otherwise it uses the normal relative resource behavior?
   
   In fact, the first thing `getResourceRelativeOption` does is look for the resource on the classpath and completely ignores the context unless tha fails. That feels wrong to me.
   
   Or maybe that extra complication isn't worth the effort for some probably rare edge cases?



##########
daffodil-schematron/src/test/resources/xsd/embedded-1.dfdl.xsd:
##########
@@ -18,7 +18,8 @@
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
 
-    <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+    <!-- schemaLocation can have upward relative path step, or can be from a classpath root. -->
+    <xs:include schemaLocation="../org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>

Review Comment:
   This raises an interesting case. We use this DFDLGeneralFormat include in lots of schemas and the path is wrong for all of them if jars are extracted. Other resolves would expect the `org` directory to be in the same directory as the including schema, which id never the case.
   
   If we go with my above suggestion, then at least everything will be broken for both Daffodil and things not using Daffodils resolver, so we're at least consistent. But that would mean every one of these DFDLGeneralFormat includes needs to be changed to have the right amount of up-steps, which is a pretty-invasive change. I guess we could special case imports starting with `org/apache/dafodil` and create a warning to suggest fixing this includes to use up-steps, with eventual deprecation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org