You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/02/02 23:25:30 UTC

[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #481: ADD test for "%%" to bypass existing test for "%NL"

stevedlawrence commented on a change in pull request #481:
URL: https://github.com/apache/incubator-daffodil/pull/481#discussion_r569001136



##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/cookers/EntityReplacer.scala
##########
@@ -565,43 +565,59 @@ trait NoCharClassEntitiesMixin {
   protected def propName: String
 
   /**
-   * Override if set of prohibited class entities is different
+   * The raw string to test is supplied as the first parameter
+   * A list of disallowed character class entities is supplied as the 2nd parameter
+   * This list is added to the regex string below
+   * Ex regex: (?:^|[^%])(?:%%)*(%(?:NL|ES);)
+   * (?:^|[^%]) look for the beginning of the string or a non-% character
+   * (?:%%)* look for an even number of %'s
+   * (%(?:NL|ES);) look for any single % left after previous match along with and NL or ES followed by a ;
+   *
    */
-  protected def noCharClassEntities(raw: String, context: ThrowsSDE): Unit = {
-    context.schemaDefinitionUnless(!raw.contains("%NL;"), "Property dfdl:%s cannot contain %%NL;", propName)
-    context.schemaDefinitionUnless(!raw.contains("%ES;"), "Property dfdl:%s cannot contain %%ES;", propName)
-    context.schemaDefinitionUnless(!raw.contains("%WSP;"), "Property dfdl:%s cannot contain %%WSP;", propName)
-    context.schemaDefinitionUnless(!raw.contains("%WSP+;"), "Property dfdl:%s cannot contain %%WSP+;", propName)
-    context.schemaDefinitionUnless(!raw.contains("%WSP*;"), "Property dfdl:%s cannot contain %%WSP*;", propName)
-  }
-
-  protected def testRaw(raw: String, context: ThrowsSDE): Unit = {
-    noCharClassEntities(raw, context)
+  protected def noCharClassEntities(raw: String, disallowedCharClassEntities: String,
+                                    context: ThrowsSDE): Unit = {
+    // if there is an odd number of '%' continue with the SDE
+    val regexString = "(?:^|[^%])(?:%%)*(%(?:" + disallowedCharClassEntities + ");)"
+    val noCharClassEntitiesRegex = regexString.r
+    val matchedGroups = noCharClassEntitiesRegex.findAllMatchIn(raw).map {_.group(1)}
+    context.schemaDefinitionUnless(
+      matchedGroups.isEmpty,
+      "Property dfdl:%s contains disallowed character classes: %s" ,
+      propName,
+      matchedGroups.mkString(", "))
+  }
+
+  protected def testRaw(raw: String, disallowedCharClassEntities: String, context: ThrowsSDE): Unit = {
+    noCharClassEntities(raw, "NL|ES|WSP|WSP+|WSP*", context)
   }

Review comment:
       I would avoid passing in things to testRaw, that should really only passing in the raw string, and then implementations of testRaw need to have the information for how to perform the raw test provided some other way. I'm also not a huge fan of the parameter being a string of entities separated by pipes, since it sort of implies it's being inserted into a regex. I think a Seq of strings is a little easierto visually scan, and is easier to change it the regex changes in the future.
   
   What if instead we do something to how propName is used in this trait:
   
   ```scala
   trait DisallowedCharClassEntitiesMixin {
     protected def propName: String
     protected def disallowedCharClassEntities: Seq[String]
   
     private lazy val disalllowedRegex = {
       val disallowedCharClassMatch = disallowedCharClassEntities.mkstring("|")
       val regexString = "(?:^|[^%])(?:%%)*(%(?:" + disallowedCharClassMatch + ");)"
       regexString.r
     }
   
     protected def testRaw(raw:String, context: ThrowsSDE): Unit = {
       val matchedGroup = disallowedRegex.finaAllMatchIn(raw).map { _.group(1) }
       conext.schemaDefinitionUnless(...)
     }
   }
   ```
   So this renames the trait to be a bit more descriptive about what it's really doing, since it isn't about "No" entities, but "Disallowed" entities. It also adds adds an abstract def ``disallowedCharClassEntities`` member that things that mix in this trait must implement to say what the disallowed entities are. Using this member, it builds and saves a regex (so it doesn't have to rebuild and compile it every time it's used). And the testRaw function has all the logic for actually testing and creating SDE's. The noCharClassEntities function isn't really needed anymore.
   
   Then things that implement this trait just have to set that one member, e.g.
   
   ```scala
   class StringLiteralNoCharClassEntities
     extends ...
     with DisallowedCharClassEntitiesMixin {
     
     override val disallowedCharClassEntities = Seq("NL", "ES", "WSP", "WSP+", "WSP*")
   
     ...
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org