You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/04/01 03:47:48 UTC

[GitHub] [daffodil] jw3 opened a new pull request #520: Preserve raw validation output

jw3 opened a new pull request #520:
URL: https://github.com/apache/daffodil/pull/520


   The Validator API provides structured results in a form fitting translation to Daffodil diagnostics, but discards the original data in the native validator format.  For some applications data in this native format is a requirement and needs preserved by writing to disk in a predictable way.
   
   The changes here provide a means of passing the raw validation data through the API and writing it to disk from the CLI.
   
   DAFFODIL- 2482
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-814316321


   Windows tests fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608160212



##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       > You can see example implementations of this in the new CLI tests in this PR.
   
   The examples were in the JAPI and SAPI, but gone now.  But its just pattern matching on the result type without using the raw trait, nothing fancy




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608058402



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -347,6 +351,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
     val parser = opt[File](short = 'P', argName = "file", descr = "use a previously saved parser.")
     val output = opt[String](argName = "file", descr = "write output to a given file. If not given or is -, output is written to stdout.")
     val validate: ScallopOption[ValidationMode.Type] = opt[ValidationMode.Type](short = 'V', default = Some(ValidationMode.Off), argName = "mode", descr = "the validation mode. 'on', 'limited', 'off', or a validator plugin name.")
+    val valOutput = opt[File]("validate_output", descr = "path to file to write raw validation output.")

Review comment:
       I would reccomend changing the name to ``val validateOutput`` and removing the "validate_output" string. Scallop will then autogenerate a the long option that is consistent with others (i.e. ``--validate-output``).
   
   Also worth adding ``argName="file"`` so that the autogenerated --help output includes ``<file>`` instead of the default of ``<arg>``.
   
   Also, it looks like scallop is defaulting to a short option of -v. I'm concered that that could be confusing, since that usually means verbose. Maybe we should just disable the short option by adding "noshort" as a parameter?

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -888,6 +899,23 @@ object Main extends Logging {
               val loc = parseResult.resultState.currentLocation.asInstanceOf[DataLoc]
               displayDiagnostics(parseResult)
 
+              // allow raw validation output
+              if(!parseResult.isProcessingError) {

Review comment:
       This will get the validation result even if the --validate-output flag isn't set. Is is possible that validationResult() could do a decent amount of work that should be avoided if valOutput is not set?

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -888,6 +899,23 @@ object Main extends Logging {
               val loc = parseResult.resultState.currentLocation.asInstanceOf[DataLoc]
               displayDiagnostics(parseResult)
 
+              // allow raw validation output
+              if(!parseResult.isProcessingError) {
+                (parseOpts.valOutput.toOption, parseResult.validationResult()) match {
+                  case (Some(of), Some(vr: RawValidationResult)) =>
+                    Try(new FileOutputStream(of)) match {
+                      case Success(os) =>
+                        os.write(vr.rawValidationData())
+                        os.close()
+                      case Failure(ex) =>
+                        log(LogLevel.Error, "Failure writing raw validation data to file.", ex)

Review comment:
       This doesn't change the exit code. I wonder if this should just be a warning? Or if it should stay an error but make sure there is a non-zero exit code?

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -685,6 +685,8 @@ class ParseResult(dp: DataProcessor, override val resultState: PState)
   extends DFDL.ParseResult
   with WithDiagnosticsImpl {
 
+  private var valres: Option[ValidationResult] = None

Review comment:
       Rather than making this a var, I wonder if we should instead do this validation prior to creating the ParseResult, and then our ParseResult become something like:
   ```scala
   class ParseResult(
     dp: DataProcessor,
     override val resultState: PState,
     val validationResult: Option[ValidationResult])
   ```
   Avoids the mutable state

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -685,6 +685,8 @@ class ParseResult(dp: DataProcessor, override val resultState: PState)
   extends DFDL.ParseResult
   with WithDiagnosticsImpl {
 
+  private var valres: Option[ValidationResult] = None

Review comment:
       Also, I notice this doesn't update the Scala or Java API's. Is the intention to keep this a sort of internal/experimental API until all the new validation stuff is run through its paces?

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -328,7 +332,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
   object parse extends scallop.Subcommand("parse") {
     banner("""|Usage: daffodil parse (-s <schema> [-r [{namespace}]<root>] [-p <path>] |
               |                       -P <parser>)
-              |                      [--validate [mode]]
+              |                      [--validate [mode]] [--validate_output [file]]

Review comment:
       The square brackets around file generaly mean that the argument is optional, whereas angle brackets usually means the arg is required. In this case it's required, so should be angle brackets.

##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       Is there a big advantage to having a special RawValidationResult trait? The CLI can easily just match/case the ValidationResult to a SchematronResult and get what it needs? I guess my concern is that that a user of this schemtron validator might want access to the Svrl String, or maybe even the svrl XML Document, but instead we're converting it to bytes just to be generic and forcing them to do something with it. But being generic doesn't seem to gain much since the user still needs to know what it is to be able to do anything with it.
   
   So I guess what's the advantage of this bytes thing over something like this in the CLI:
   
   ```scala
   validateResult match {
     case sr: SchematronResult => os.write(sr.svrl.toString)
     case _ => // noop, no additional information
   }
   ```
   This way if someone wants direct access to the svrl document, they can get it without any extra overhead of converting those generic bytes back to the original data. But if they just want to output it to a file (like the Daffodil CLI does), then we can do so with a pretty easy cast and toString.
   
   

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -356,6 +361,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
 
     requireOne(schema, parser) // must have one of --schema or --parser
     conflicts(parser, List(rootNS)) // if --parser is provided, cannot also provide --root
+    conflicts(stream, List(valOutput))
     validateFileIsFile(config) // --config must be a file that exists

Review comment:
       Do we need a coconstraint that --validate-output requires --validation be enabled? Something like
   ```scala
     dependsOnAll(valOutput, List(validate))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608127858



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -685,6 +685,8 @@ class ParseResult(dp: DataProcessor, override val resultState: PState)
   extends DFDL.ParseResult
   with WithDiagnosticsImpl {
 
+  private var valres: Option[ValidationResult] = None

Review comment:
       Yeah, Id thought about that sort of change too, not a fan of the var, but it seemed like it would grow the scope of this PR more than I liked so avoided it.
   
   Will take a quick look to see how much it would involve to do that refactor.  I wouldnt mind doing it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-823289581


   Only this closing comment. You've identified some things maybe refactorable to improve. It is fine to park those at this point. Lots of Daffodil can be improved by refactoring in various ways. So I wouldn't delay getting this useful feature into 3.1.0 over that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608153349



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       Ah, I somehow missed the sapi/japi changes. Two concerns here though:
   
   1. ``Option`` is a Scala thing that isn't familiar to Java users so should be avoided in the JAPI. This API should follow Java conventions, which I guess means return null if there is no ValidationResult?
   
   2. The ValidationResult that is returned is in the org.apache.daffodil.api package. But the Java/Scala doc that we generate only comes from the org.apache.daffodil.sapi/japi packages. This means no documentation will be generated for the ValidationResult, so users wont' really know how to use it. The way we have been maintaining the Java and Scala API is to create wrappers in sapi/japi that are essentially proxies for classes outside of those packages. So there would be a japi.ValidationResult and sapi.ValidationResult which wrap the api.ValidationResult, but have java/scaladoc that the user will see. It's kindof terrible, and we have tickets to fix this so there's only a single API and not all these proxy classes, but it's the standard for what we have right now. It's does also have a nice benefit in that it makes it very clear what is public API and what isn't (i.e. if it's in the org.apache.daffodil.sapi/japi then it's public), and clearly defines when we need to be concerned w
 ith backwards compatibility.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r614752116



##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/api/Validator.scala
##########
@@ -54,20 +55,31 @@ trait ValidatorFactory {
 
 /**
  * Results of a validation execution
- * @param warnings [[org.apache.daffodil.api.ValidationWarning]] objects
- * @param errors [[org.apache.daffodil.api.ValidationFailure]] objects
  */
-final case class ValidationResult(warnings: java.util.Collection[ValidationWarning], errors: java.util.Collection[ValidationFailure])
+trait ValidationResult {
+  def warnings(): java.util.Collection[ValidationWarning]
+  def errors(): java.util.Collection[ValidationFailure]
+}
+
+/**
+ * Provider of raw validation output
+ */
+trait RawValidationResult {
+  def rawValidationData(): Array[Byte]
+}
 
 object ValidationResult {
   /**
    * an empty [[org.apache.daffodil.api.ValidationResult]]
    */
   val empty: ValidationResult = ValidationResult(Seq.empty, Seq.empty)
 
-  def apply(warnings: Seq[ValidationWarning], errors: Seq[ValidationFailure]): ValidationResult = {
+  def apply(w: Seq[ValidationWarning], e: Seq[ValidationFailure]): ValidationResult = {
     import scala.collection.JavaConverters.asJavaCollectionConverter
-    ValidationResult(warnings.asJavaCollection, errors.asJavaCollection)
+    new ValidationResult{
+      val warnings: java.util.Collection[ValidationWarning] = w.asJavaCollection
+      val errors: java.util.Collection[ValidationFailure] = e.asJavaCollection
+    }

Review comment:
       I did a refactor on this and it looks good, but grows pretty large considering the changes needed to keep the internal API object out of the external APIs.
   
   Going to follow up with a ticket like "Refactor Validation API to remove Java types"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-822893572


   Any closing comments?  The summary is this is using some things that need a bit of refactoring (CLI config passing), but managed to avoid adding anything new that would also need refactored.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608603018



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       > Personally, I would like to see more use-cases of this rawValidation data before we make it part of the public API.
   
   Concur.
   
   > like maybe we want alternative syntax like --validate=foo.sch:output=details.svrl
   
   Ive went down this road, there was an implementation in the original validator refactorings that support syntax similar to this.
   
   Question is/was, does it scale?  I say not beyond a low number of shortish parameters.
   
   Still valuable to input arbitrary configuration to validators, so there is [support for this by passing a config file](https://github.com/apache/daffodil/blob/master/daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala#L233) that was added in the validator work.
   
   But Id say that doesnt help here, because managing the output files down in the validator creates issues with reusing the validators, which is important part of that design.  So the data has to get up out of the validators to be written.
   
   > Thoughts?
   
   The raw stream wasnt the first choice, but its not so unappealing if you squint.  Conceptually there are two streams of data that come out of validators (1) parsed structured data by way of validation result (2) optional raw source data that the structured data was parsed from.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608128224



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -685,6 +685,8 @@ class ParseResult(dp: DataProcessor, override val resultState: PState)
   extends DFDL.ParseResult
   with WithDiagnosticsImpl {
 
+  private var valres: Option[ValidationResult] = None

Review comment:
       The Scala and Java APIs got updates to the ParseResult to expose the validation result.  What other updates were you looking for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-821325697


   Updated PR to push most of the implementation details down into the Schematron validator.
   
   Got rid of the raw stream concept, instead using the existing ability to pass in a config file to get the SVRL output path into the validator.  Sending SVRL write errors back up as diagnostics.
   
   Much less surface area this way, but the config passing approach is not obvious.  I think it will do for now, can be refined later by [refactoring the validator CLI arguments](https://github.com/apache/daffodil/pull/520#discussion_r608633044) to further hide some of these details, will add a ticket and follow up in another PR.
   
   Also the r[efactoring of the java collections usage in the Validator API](https://github.com/apache/daffodil/pull/520#discussion_r609815124) will be followed up on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-821583927


   I have to go to a windows machine and see what is up with those failures.  They look familiar to issues with windows paths when nested in a command line I encountered in a previous PR, but applying the change from memory is not working and CI is too long of a loop.  I will get to a machine or fire up a VM this weekend and get it fixed...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-823326873


   Sounds good.  Will merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-821325697


   Updated PR to push most of the implementation details down into the Schematron validator.
   
   Got rid of the raw stream concept, instead using the existing ability to pass in a config file to get the SVRL output path into the validator.  Sending SVRL write errors back up as diagnostics.
   
   Much less surface area this way, but the config passing approach is not obvious.  I think it will do for now, can be refined later by [refactoring the validator CLI arguments](https://github.com/apache/daffodil/pull/520#discussion_r608633044) to further hide some of these details, will add a ticket and follow up in another PR.
   
   Also the [refactoring of the java collections usage in the Validator API](https://github.com/apache/daffodil/pull/520#discussion_r609815124) will be followed up on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r610071239



##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       @tuxji, right on :+1: 
   
   > there would be nothing to stop someone from doing a match on ValidationResult to get a case Some(vr: MyCustomValidationResult 
   
   The only thing stopping you would be having a dependency to the artifact that contains MyCustomValidationResult.  Which in the case of the SchematronResult and the CLI there is no dependency there, but the CLI only needs the API level raw type.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-823332651


   +1 from me. Looks good! :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-814077996


   Need one update yet to actually test the output in the cli unit test...
   
   ---
   
   Windows paths in new tests broke, will fix...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608106547



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -328,7 +332,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
   object parse extends scallop.Subcommand("parse") {
     banner("""|Usage: daffodil parse (-s <schema> [-r [{namespace}]<root>] [-p <path>] |
               |                       -P <parser>)
-              |                      [--validate [mode]]
+              |                      [--validate [mode]] [--validate_output [file]]

Review comment:
       :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-822893572


   Any closing comments?  The summary is this is using some things that need a bit of refactoring (CLI config passing), but managed to avoid exposing anything new that would also need refactored.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608140948



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -888,6 +899,23 @@ object Main extends Logging {
               val loc = parseResult.resultState.currentLocation.asInstanceOf[DataLoc]
               displayDiagnostics(parseResult)
 
+              // allow raw validation output
+              if(!parseResult.isProcessingError) {
+                (parseOpts.valOutput.toOption, parseResult.validationResult()) match {
+                  case (Some(of), Some(vr: RawValidationResult)) =>
+                    Try(new FileOutputStream(of)) match {
+                      case Success(os) =>
+                        os.write(vr.rawValidationData())
+                        os.close()
+                      case Failure(ex) =>
+                        log(LogLevel.Error, "Failure writing raw validation data to file.", ex)

Review comment:
       Ok, I like changing it to a warning.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608544414



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -685,6 +685,8 @@ class ParseResult(dp: DataProcessor, override val resultState: PState)
   extends DFDL.ParseResult
   with WithDiagnosticsImpl {
 
+  private var valres: Option[ValidationResult] = None

Review comment:
       The ParseResult refactor went well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608156166



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -347,6 +351,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
     val parser = opt[File](short = 'P', argName = "file", descr = "use a previously saved parser.")
     val output = opt[String](argName = "file", descr = "write output to a given file. If not given or is -, output is written to stdout.")
     val validate: ScallopOption[ValidationMode.Type] = opt[ValidationMode.Type](short = 'V', default = Some(ValidationMode.Off), argName = "mode", descr = "the validation mode. 'on', 'limited', 'off', or a validator plugin name.")
+    val valOutput = opt[File]("validate_output", descr = "path to file to write raw validation output.")

Review comment:
       Check, check, check.
   
   :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] tuxji commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
tuxji commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r609801194



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -328,7 +332,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
   object parse extends scallop.Subcommand("parse") {
     banner("""|Usage: daffodil parse (-s <schema> [-r [{namespace}]<root>] [-p <path>] |
               |                       -P <parser>)
-              |                      [--validate [mode]]
+              |                      [--validate [mode]] [--validate_output file]

Review comment:
       Please use brackets around "file" for consistency with the other usage lines: `[--validate_output <file>]`

##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       I see what you mean - lines 906-917 in Main.scala performs a match on ValidationResult to get a case Some(vr: RawValidationResult) and writes out vr.rawValidationData() accordingly.  I also think there would be nothing to stop someone from doing a match on ValidationResult to get a case Some(vr: MyCustomValidationResult) if they want to use some custom data without conversion to/from raw bytes, right?  So people are free to use both ways in their own custom validator implementation - extend RawValidatorResult to let the CLI write the file and also provide custom methods in MyCustomValidationResult, which seems fair to me.

##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/api/Validator.scala
##########
@@ -54,20 +55,31 @@ trait ValidatorFactory {
 
 /**
  * Results of a validation execution
- * @param warnings [[org.apache.daffodil.api.ValidationWarning]] objects
- * @param errors [[org.apache.daffodil.api.ValidationFailure]] objects
  */
-final case class ValidationResult(warnings: java.util.Collection[ValidationWarning], errors: java.util.Collection[ValidationFailure])
+trait ValidationResult {
+  def warnings(): java.util.Collection[ValidationWarning]
+  def errors(): java.util.Collection[ValidationFailure]
+}
+
+/**
+ * Provider of raw validation output
+ */
+trait RawValidationResult {
+  def rawValidationData(): Array[Byte]
+}
 
 object ValidationResult {
   /**
    * an empty [[org.apache.daffodil.api.ValidationResult]]
    */
   val empty: ValidationResult = ValidationResult(Seq.empty, Seq.empty)
 
-  def apply(warnings: Seq[ValidationWarning], errors: Seq[ValidationFailure]): ValidationResult = {
+  def apply(w: Seq[ValidationWarning], e: Seq[ValidationFailure]): ValidationResult = {
     import scala.collection.JavaConverters.asJavaCollectionConverter
-    ValidationResult(warnings.asJavaCollection, errors.asJavaCollection)
+    new ValidationResult{
+      val warnings: java.util.Collection[ValidationWarning] = w.asJavaCollection
+      val errors: java.util.Collection[ValidationFailure] = e.asJavaCollection
+    }

Review comment:
       Here's an interesting tidbit.  No other place in the Daffodil codebase uses java.util.Collection or the asJavaCollection converter from Seq to Collection.  I'm also surprised that an API class in daffodil-lib would use Collection instead of Seq, although doing so lets Java callers use ValidationResult directly.  If all this is a new and better practice, I suspect we could apply it in more places.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-814077996






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608107416



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -356,6 +361,7 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
 
     requireOne(schema, parser) // must have one of --schema or --parser
     conflicts(parser, List(rootNS)) // if --parser is provided, cannot also provide --root
+    conflicts(stream, List(valOutput))
     validateFileIsFile(config) // --config must be a file that exists

Review comment:
       Nice :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-814077996


   Need one update yet to actually test the output in the cli unit test...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 edited a comment on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 edited a comment on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-821325697


   Updated PR to push most of the implementation details down into the Schematron validator.
   
   Got rid of the raw stream concept, instead using the existing ability to pass in a config file to get the SVRL output path into the validator.  Sending SVRL write errors back up as diagnostics.
   
   Much less surface area this way, but the config passing approach is not obvious.  I think it will do for now, can be refined later by [refactoring the validator CLI arguments](https://github.com/apache/daffodil/pull/520#discussion_r608633044) to further hide some of these details, will add a ticket and follow up in another PR.
   
   Also the refactoring of the java collections usage in the Validator API will be followed up on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608633044



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       > Question is/was, does it scale? I say not beyond a low number of shortish parameters.
   
   Fair, but I don't think the current solution scales all that well either--it only supports a single option and a single output stream.
   
   As an alternative to what I suggested earlier, maybe we do something where the .conf file (not sure what the syntax is for specifying that, can' find any test) can/should be used for configurations that are static (e.g. an encoding option, enable/disable features), but additional options can be provided to override those, or when more dynamic options are needed like in this case. We could also use an alternative syntax that is maybe less ugly, for example:
   ```
   --validate foo.sh --validate-option config=foo.conf --validate-option output=details.svrl
   ```
   We have very similar syntax for defining external variables, e.g. ``-Dvariable1=value1 -Dvariable2=value2``
   
   So ``--validate`` can be provided once to specify the mode/validator plugin, and ``--validate-option`` is a key-value pair that can be specified multiple times for different options. One option could provide a config file to configure the validator, other options could override those when needed or to provide more dynamic options.
   
   An added benefit (depending on how you view it?) is that now validators are responsible for supporting/parsing the config option, and so they can use whatever configuration format makes sense for them. Maybe some validators already have a well defined and popular config file format that is different than what we currently support. If these are truely pluggable, we might not want to force a specific format.
   
   > Conceptually there are two streams of data that come out of validators
   
   While I agree that's probably the case, I'm not sure a byte array is always the most natural way to get the second thing that could come out of a validator. I'd argue when using the API, we never want a byte array--we usually want specific underlying data structures with casting probably being the best way to get to them. And when not using the API, then I'd rather assume the validator knows best what information should be output, where, and what the most efficient way to do it, and using some config stuff to tweak it.
   
   All that said, if you're not sold, I won't stand in the way of this change. This definitey provides a useful feature that we do need, and it is well done. And if we do eventually decide we need something different is this is run through its paces, we can always go through the normal deprecation process. For example, it would be pretty trivial to to change ``--validate-output=foo`` to ``--validate-option output=foo`` behind the scenes.
   
   +1 :+1:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608152159



##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -888,6 +899,23 @@ object Main extends Logging {
               val loc = parseResult.resultState.currentLocation.asInstanceOf[DataLoc]
               displayDiagnostics(parseResult)
 
+              // allow raw validation output
+              if(!parseResult.isProcessingError) {

Review comment:
       Yeah could be possible, it makes sense to guard that with a check that validation output was actually asked for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r614750565



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       Have wrestled with the refactors here, they grow the size of this PR pretty quick.  I am thinking this to get this functionality in but avoid adding things we dont need.
   
   1. Drop the raw stream, the validator can write the output based on config passed into it
   2. Revert all changes to CLI, we already have support for the conf file being passed.  Ill write a test for that and it can be used to configure the validator to output svrl for now.  The refactor to multiple validator args can be separate PR.
   3. Keep the other refactors in Schematron and Parse Result
   
   Ill follow up with a ticket something like "Refactor validator CLI argument passing" with the description from above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 merged pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 merged pull request #520:
URL: https://github.com/apache/daffodil/pull/520


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608589320



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       The CLI uses plenty of internal API only stuff, so I have no problem with keeping this API internal for now.
   
   Personally, I would like to see more use-cases of this rawValidation data before we make it part of the public API. It just feels very specific to schematron, so almost feels like it should be a schematron specific option, like maybe we want alternative syntax like ``--validate=foo.sch:output=details.svrl``?
   
   I think what you have really nice and there's some nice refactorings too, I'm just hesitant because we've been burned in the past with creating a public API without enough thought towards use-cases and it's led to some ugly workarounds when we fix them but still need to maintain backwards compatibility.
   
   Which makes me think again about the new CLI options, which is sort of a public API. I wonder if maybe we really do need an alternative syntax, like mentioned above? Perhaps the --validate option wants to be a mode/pluggable validator followed by a delimiter separated list of key/value pairs to configure that validation? Then the CLI just parses and sends these options to the validator and that is responsible for implementing those options. So the CLI won't handle any of the output logic?
   
   And this easily extends to additional options. For example, maybe we also want a validation encoding option for how validation output is encoded, or to enable/disable certain validation capabilities? Rather than creating a new ``--validate-encoding`` we just support a new option in the list of ``--validate`` options, e.g.
   ```
   --validate=foo.sch:output=details.svrl:encoding=UTF-8
   ```
   
   This does mean the main CLI can't really do or know anything related to validators aside from initialize them and send them some configuration options, but it does allow these validators to be easily extendable without having to modify the CLI with more and more options or needing to change the API?
   
   Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608117483



##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       The validators are pluggable and assuming other implementations could have raw data that would be useful to write then you wouldnt want to have to wire those other projects into the CLI explicitly to allow that.
   
   > I guess my concern is that that a user of this schemtron validator might want access to the Svrl String, or maybe even the svrl XML Document, but instead we're converting it to bytes just to be generic and forcing them to do something with it. 
   
   With this PR you can do either.
   
   The raw trait uses bytes to be generic in handling _raw_ output across all possible validator implementations, since we dont know what an implementation in the wild might spit out.  The only raw consumer right now is the CLI where it is just dumping bytes to a file, and we use the raw trait to avoid depending on the schematron project explicitly.
   
   You can still extract SVRL without using the raw trait, but to do so you have to explicitly depend on the schematron implementation.  You could follow this pattern with any validator and consumer.
   
   You can see example implementations of this in the new CLI tests in this PR.
   
   Another approach to avoid the second trait would be to roll it into the ValidationResult trait providing a default empty raw property.  I thought about that when implementing this, but an empty raw property in the main trait seemed less appealing than another trait.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608160212



##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       > You can see example implementations of this in the new CLI tests in this PR.
   
   Uhm, not in the CLI tests, its over in the SAPI and JAPI tests in this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-821325697


   Updated PR to push most of the implementation details down into the Schematron validator.
   
   Got rid of the raw stream concept, instead using the existing ability to pass in a config file get the SVRL output path into the validator.  Sending SVRL write errors back up as diagnostics.
   
   Much less surface area this way, but the config passing approach is not obvious.  I think it will do for now, can be refined later by [refactoring the validator CLI arguments](https://github.com/apache/daffodil/pull/520#discussion_r608633044) to further hide some of these details, will add a ticket and follow up in another PR.
   
   Also the refactoring of the java collections usage in the Validator API will be followed up on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608805844



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       Thats all sane.  Ill take a look at refactoring towards what you describe.
   
   > Fair, but I don't think the current solution scales all that well either--it only supports a single option and a single output stream.
   
   You just arent squinting hard enough.  :grin: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608301743



##########
File path: daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
##########
@@ -866,6 +867,12 @@ class ParseResult private[japi] (pr: SParseResult, deprecatedOutput: Maybe[JDOMI
    * @return true if any validation errors occurred, false otherwise.
    */
   def isValidationError(): Boolean = pr.isValidationError
+
+  /**
+   * Optional result of the infoset validation.
+   * @return optional ValidationResult
+   */
+  def validationResult(): Option[ValidationResult] = pr.validationResult()

Review comment:
       So looking at what is involved here, and thinking about the comment above...
   
   > Is the intention to keep this a sort of internal/experimental API until all the new validation stuff is run through its paces?
   
   The only immediate use of this raw capability is for the CLI.  Is there any value in publishing this API to the SAPI and JAPI now?  Or would it be just as well to let it internal until there is demand.
   
   I lean toward removing the additions I made to SAPI/JAPI and only exposing it through the CLI for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on pull request #520:
URL: https://github.com/apache/daffodil/pull/520#issuecomment-821583927


   I have to go to a windows machine and see what is up with those failures.  They look familiar to issues with windows paths I encountered in a previous PR, but applying the change from memory is not working and CI is too long of a loop.  I will get to a machine or fire up a VM this weekend and get it fixed...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r608160212



##########
File path: daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronResult.scala
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.validation.schematron
+
+import org.apache.daffodil.api.RawValidationResult
+import org.apache.daffodil.api.ValidationFailure
+import org.apache.daffodil.api.ValidationResult
+import org.apache.daffodil.api.ValidationWarning
+
+import java.util
+
+/**
+ * Captures the output of Schematron validation as a Daffodil ValidationResult
+ * @param warnings Schematron warnings parsed into ValidationWarning objects
+ * @param errors Schematron errors parsed into ValidationFailure objects
+ * @param svrl Full SVRL output
+ */
+case class SchematronResult(warnings: util.Collection[ValidationWarning],
+                            errors: util.Collection[ValidationFailure],
+                            svrl: String) extends ValidationResult with RawValidationResult {
+
+  def rawValidationData(): Array[Byte] = svrl.getBytes
+}

Review comment:
       > You can see example implementations of this in the new CLI tests in this PR.
   
   Uhm, well maybe not, but I can add some.  I must of been thinking of the SAPI test that does this.  But doesnt illustrate the point as well as a schematron test in the CLI would.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] jw3 commented on a change in pull request #520: Preserve raw validation output

Posted by GitBox <gi...@apache.org>.
jw3 commented on a change in pull request #520:
URL: https://github.com/apache/daffodil/pull/520#discussion_r610068467



##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/api/Validator.scala
##########
@@ -54,20 +55,31 @@ trait ValidatorFactory {
 
 /**
  * Results of a validation execution
- * @param warnings [[org.apache.daffodil.api.ValidationWarning]] objects
- * @param errors [[org.apache.daffodil.api.ValidationFailure]] objects
  */
-final case class ValidationResult(warnings: java.util.Collection[ValidationWarning], errors: java.util.Collection[ValidationFailure])
+trait ValidationResult {
+  def warnings(): java.util.Collection[ValidationWarning]
+  def errors(): java.util.Collection[ValidationFailure]
+}
+
+/**
+ * Provider of raw validation output
+ */
+trait RawValidationResult {
+  def rawValidationData(): Array[Byte]
+}
 
 object ValidationResult {
   /**
    * an empty [[org.apache.daffodil.api.ValidationResult]]
    */
   val empty: ValidationResult = ValidationResult(Seq.empty, Seq.empty)
 
-  def apply(warnings: Seq[ValidationWarning], errors: Seq[ValidationFailure]): ValidationResult = {
+  def apply(w: Seq[ValidationWarning], e: Seq[ValidationFailure]): ValidationResult = {
     import scala.collection.JavaConverters.asJavaCollectionConverter
-    ValidationResult(warnings.asJavaCollection, errors.asJavaCollection)
+    new ValidationResult{
+      val warnings: java.util.Collection[ValidationWarning] = w.asJavaCollection
+      val errors: java.util.Collection[ValidationFailure] = e.asJavaCollection
+    }

Review comment:
       Good point.  IIRC this path was taken for ease of transition in the JAPI, and it was done in the original refactor PR, not new here.
   
   But looking back now it probably should be a scala type and converted in the JAPI.  Ill see what a refactor here looks like.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org