You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Don Brutzman <br...@nps.edu> on 2020/12/28 20:16:45 UTC

testing CSV example under 3.0.0

Am trying to parse CSV example after installing latest version of 3.0.0.

* Apache Daffodil Examples
   https://daffodil.apache.org/examples

* DFDLSchemas / CSV
   https://github.com/DFDLSchemas/CSV

* csv.dfdl.xsd
   https://github.com/DFDLSchemas/CSV/blob/master/src/main/resources/com/tresys/csv/xsd/csv.dfdl.xsd

Had to go searching for

* <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />

and found a copy at

* https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/resources/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd

and also looked around for example data, which matches excerpt on examples page:

* https://raw.githubusercontent.com/DFDLSchemas/CSV/master/src/test/resources/com/tresys/csv/data/simpleCSV.csv

Then, using Ant invocation as follows,

     <target name="daffodil.parse.csv"  description="daffodil.apache.org example">
         <echo>daffodil parse csv</echo>
         <exec executable="daffodil"  dir="." vmlauncher="false">
             <arg value="parse"/>
             <arg value="--schema"/>
             <arg value="examples/csv/csv.dfdl.xsd"/>
             <arg value="--output"/>
             <arg value="examples/csv/simpleCSV.parse.xml"/>
             <arg value="examples/csv/simpleCSV.csv"/>
         </exec>
     </target>

... was able to produce attached simpleCSV.parse.xml (attached) which in turn matches following and result on HTML page,

* https://github.com/DFDLSchemas/CSV/blob/master/src/test/resources/com/tresys/csv/infosets/simpleCSV.xml

So that's good.  However am also getting a lot of Daffodil schema warnings:

daffodil.parse.csv:
daffodil parse csv
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: title Location line 59 column 16 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: item Location line 66 column 16 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: sequence[1] Location line 54 column 8 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: sequence[1] Location line 58 column 14 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: sequence[1] Location line 65 column 14 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: header Location line 55 column 10 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: record Location line 63 column 10 in RobodataDFDL/examples/csv/csv.dfdl.xsd
[warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is not yet implemented. The 'replace' value will be used.
Schema context: element reference ex:file Location line 36 in RobodataDFDL/examples/csv/csv.dfdl.xsd
BUILD SUCCESSFUL (total time: 3 seconds)

Wondering please:

a. am i using the correct DFDLGeneralFormat.dfdl.xsd file for this example,
b. is there a more recent version that is suitable for version 3.0.0 ?

Thanks for all feedback.

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman@nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman

Re: testing CSV example under 3.0.0

Posted by Steve Lawrence <sl...@apache.org>.
On 12/28/20 7:21 PM, Don Brutzman wrote:
> On 12/28/2020 12:49 PM, Steve Lawrence wrote:

-- clip --

> 
> I had felt the urge to add a local copy of the schema because
> double-checking with XMLSpy throws an error on the schema otherwise:

Ah, yes, if want to use the schema as an XML schema, you may need to
include this file to suppress warnings/errors. Note that this file
doesn't have any actually XML schema information in it. It only contains
DFDL annotations, so if something like Ant or XML Spy were to ignore
this particular include, the schema should still behave correctly. That
might not be an option though.

> Perhaps it would be nice if there was a well-defined online URI that an
> DFDL engine might similarly provide (with online retrieval) via an XML
> catalog, that would allow validation of the schema to proceed properly.
> 
> * <xs:include
> schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>

That's not a bad idea. We could perhaps do something like

  <xs:include
schemaLocation="http://daffodil.apache.org/xsd/DFDLGeneralFormat.dfdl.xsd"
/>

However, I think some schema validators don't actually download
schemaLocation URL's, and there are some security concerns related to
this (see DFFODIL-602). But we could make it so Daffodil only looks in
jars and will not go off box for this special URI, and it makes it an
obvious place for people to find this file if they do need to manually
download it. I'll open a ticket for this.


>> b) This is the most recent version of the CSV schema. The issue you're
>> seeing is that the encodingErrorPolicy is set to "error" in
>> csv.dfdl.xsd, but Daffodil does not support this value--we only support
>> encodingErrorPolicy="replace".
> 
> Thanks for explanation.
> 
> Looks like this property is fully explained at
> 
> * DFDL Spec, 11.2.1 Property dfdl:encodingErrorPolicy
>   https://daffodil.apache.org/docs/dfdl/#_Toc54264422
> 
> Interoperability (via specification compliance) is of course helpful.
> 
> Warnings are also helpful, and should lead to fixes.  Eliminating
> warnings is best case, so that the presence of "expected" warnings
> doesn't hide the presence of unexpected warnings.
> 
> Curious what the long-term resolution of this might be, presumably
> - Apache Daffodil  support for encodingErrorPolicy="error"
> - Similar IBM DFDL support for encodingErrorPolicy="replace"
> 

Yep, we have a bug to support "error" (DAFFODIL-935). Mike Beckerle just
bumped up the priority of this, but it's has a handful of subtle edge
cases to deal with, so I can't say when it will be fixed.

Unfortunately, I don't have any insight into IBM's plans to support
"replace".

Re: testing CSV example under 3.0.0

Posted by Don Brutzman <br...@nps.edu>.
On 12/28/2020 12:49 PM, Steve Lawrence wrote:
>
> Welcome!

Thanks Steve.  Have been watching Daffodil from afar for awhile, am glad to finally become able to engage.

> a) Yes, that's the correct DFDLGeneralFormat.dfdl.xsd. Though I'm
> surprised you need that. That file is distributed in the Daffodil jars,
> and when Daffodil compiles a schema it should be able to find it inside
> a jar. So you shouldn't need to provide a separate copy of that file
> when using Daffodil. If that's not the case, let us know--it's likely a bug.

Aha, just tested - that is correct.

I had felt the urge to add a local copy of the schema because double-checking with XMLSpy throws an error on the schema otherwise:

=====================
File C:\x-nps-gitlab\NetworkOptionalWarfare\RobodataDFDL\examples\csv\csv.dfdl.xsd is valid but contains one or more warnings.
  Unable to load a schema from 'org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd'.
   I/O operation on file 'C:\x-nps-gitlab\NetworkOptionalWarfare\RobodataDFDL\examples\csv\org\apache\daffodil\xsd\DFDLGeneralFormat.dfdl.xsd' failed.
    Details
     System Error 2: The system cannot find the file specified.
=====================

Not seeing a way to silence this XMLSpy error... but it appears to flag other errors first so that is helpful.

Perhaps it would be nice if there was a well-defined online URI that an DFDL engine might similarly provide (with online retrieval) via an XML catalog, that would allow validation of the schema to proceed properly.

* <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>

Meanwhile added Ant schema validation of the CSV schema, which passes without giving a warning about the unavailable xs:include.  This took some configuration (not natively implemented in Apache Ant) and requires downloading XMLSchema.xsd plus other files.

* https://ant.apache.org/manual/Tasks/schemavalidate.html

         <echo message="xmlvalidate    examples/csv/csv.dfdl.xsd"/>
         <xmlvalidate    file="examples/csv/csv.dfdl.xsd" failonerror="false" warn="true" lenient="true"/>
         <echo message="schemavalidate examples/csv/csv.dfdl.xsd"/>
         <schemavalidate file="examples/csv/csv.dfdl.xsd" failonerror="true" warn="true" lenient="false" fullchecking="false">
             <schema namespace="http://www.w3.org/2001/XMLSchema" file="validation/XMLSchema.xsd"/>
         </schemavalidate>
         <echo message="... schemavalidate passed"/>

> b) This is the most recent version of the CSV schema. The issue you're
> seeing is that the encodingErrorPolicy is set to "error" in
> csv.dfdl.xsd, but Daffodil does not support this value--we only support
> encodingErrorPolicy="replace".
> 
> Normally encodingErrorPolicy="error" should be a schema definition error
> with a message like:
> 
>    Daffodil does not support encodingErrorPolicy="error", use "replace"
> instead"
> 
> But we found that there are a lot of schemas out there that already use
> "error", particularly those created by IBM. (Note that IBM DFDL supports
> "error" but not "replace", so we are incompatible with each other in
> that regard).
> 
> Fortunately, most of the time this property doesn't actually matter--it
> really only has an affect if you are parsing data that has encoding
> errors, which is pretty uncommon. So because this property doesn't
> usually matter, but we still want to support all these schemas that
> already use "error", we decided to just ignore this property and always
> treat it as if it have a value of "replace". That way we can support
> schemas that use either "error" or "replace". But this does mean we are
> ignoring a property when it is "error", so we output some warnings just
> to make it clear that Daffodil might behave differently than the schema
> author intended.
> 
> In this case, the schema author set it to "error" simple so that it
> would work in both IBM DFDL and Daffodil.
> 
> So this warning can just be ignored. If you want to get rid of the
> warnings and you don't care about portability with IBM DFDL, then the
> easy solution is to just change encodingErrorPolicy to "replace" and the
> warnings will go away.

Thanks for explanation.

Looks like this property is fully explained at

* DFDL Spec, 11.2.1 Property dfdl:encodingErrorPolicy
   https://daffodil.apache.org/docs/dfdl/#_Toc54264422

Interoperability (via specification compliance) is of course helpful.

Warnings are also helpful, and should lead to fixes.  Eliminating warnings is best case, so that the presence of "expected" warnings doesn't hide the presence of unexpected warnings.

Curious what the long-term resolution of this might be, presumably
- Apache Daffodil  support for encodingErrorPolicy="error"
- Similar IBM DFDL support for encodingErrorPolicy="replace"

Again thanks for helping me understand Daffodil better.


> On 12/28/20 3:16 PM, Don Brutzman wrote:
>> Am trying to parse CSV example after installing latest version of 3.0.0.
>>
>> * Apache Daffodil Examples
>>    https://daffodil.apache.org/examples
>>
>> * DFDLSchemas / CSV
>>    https://github.com/DFDLSchemas/CSV
>>
>> * csv.dfdl.xsd
>>
>> https://github.com/DFDLSchemas/CSV/blob/master/src/main/resources/com/tresys/csv/xsd/csv.dfdl.xsd
>>
>>
>> Had to go searching for
>>
>> * <xs:include
>> schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
>>
>> and found a copy at
>>
>> *
>> https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/resources/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd
>>
>>
>> and also looked around for example data, which matches excerpt on
>> examples page:
>>
>> *
>> https://raw.githubusercontent.com/DFDLSchemas/CSV/master/src/test/resources/com/tresys/csv/data/simpleCSV.csv
>>
>>
>> Then, using Ant invocation as follows,
>>
>>      <target name="daffodil.parse.csv"  description="daffodil.apache.org
>> example">
>>          <echo>daffodil parse csv</echo>
>>          <exec executable="daffodil"  dir="." vmlauncher="false">
>>              <arg value="parse"/>
>>              <arg value="--schema"/>
>>              <arg value="examples/csv/csv.dfdl.xsd"/>
>>              <arg value="--output"/>
>>              <arg value="examples/csv/simpleCSV.parse.xml"/>
>>              <arg value="examples/csv/simpleCSV.csv"/>
>>          </exec>
>>      </target>
>>
>> ... was able to produce attached simpleCSV.parse.xml (attached) which in
>> turn matches following and result on HTML page,
>>
>> *
>> https://github.com/DFDLSchemas/CSV/blob/master/src/test/resources/com/tresys/csv/infosets/simpleCSV.xml
>>
>>
>> So that's good.  However am also getting a lot of Daffodil schema warnings:
>>
>> daffodil.parse.csv:
>> daffodil parse csv
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: title Location line 59 column 16 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: item Location line 66 column 16 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: sequence[1] Location line 54 column 8 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: sequence[1] Location line 58 column 14 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: sequence[1] Location line 65 column 14 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: header Location line 55 column 10 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: record Location line 63 column 10 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
>> not yet implemented. The 'replace' value will be used.
>> Schema context: element reference ex:file Location line 36 in
>> RobodataDFDL/examples/csv/csv.dfdl.xsd
>> BUILD SUCCESSFUL (total time: 3 seconds)
>>
>> Wondering please:
>>
>> a. am i using the correct DFDLGeneralFormat.dfdl.xsd file for this example,
>> b. is there a more recent version that is suitable for version 3.0.0 ?
>>
>> Thanks for all feedback.

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman@nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman

Re: testing CSV example under 3.0.0

Posted by Steve Lawrence <sl...@apache.org>.
Welcome!

a) Yes, that's the correct DFDLGeneralFormat.dfdl.xsd. Though I'm
surprised you need that. That file is distributed in the Daffodil jars,
and when Daffodil compiles a schema it should be able to find it inside
a jar. So you shouldn't need to provide a separate copy of that file
when using Daffodil. If that's not the case, let us know--it's likely a bug.

b) This is the most recent version of the CSV schema. The issue you're
seeing is that the encodingErrorPolicy is set to "error" in
csv.dfdl.xsd, but Daffodil does not support this value--we only support
encodingErrorPolicy="replace".

Normally encodingErrorPolicy="error" should be a schema definition error
with a message like:

  Daffodil does not support encodingErrorPolicy="error", use "replace"
instead"

But we found that there are a lot of schemas out there that already use
"error", particularly those created by IBM. (Note that IBM DFDL supports
"error" but not "replace", so we are incompatible with each other in
that regard).

Fortunately, most of the time this property doesn't actually matter--it
really only has an affect if you are parsing data that has encoding
errors, which is pretty uncommon. So because this property doesn't
usually matter, but we still want to support all these schemas that
already use "error", we decided to just ignore this property and always
treat it as if it have a value of "replace". That way we can support
schemas that use either "error" or "replace". But this does mean we are
ignoring a property when it is "error", so we output some warnings just
to make it clear that Daffodil might behave differently than the schema
author intended.

In this case, the schema author set it to "error" simple so that it
would work in both IBM DFDL and Daffodil.

So this warning can just be ignored. If you want to get rid of the
warnings and you don't care about portability with IBM DFDL, then the
easy solution is to just change encodingErrorPolicy to "replace" and the
warnings will go away.


On 12/28/20 3:16 PM, Don Brutzman wrote:
> Am trying to parse CSV example after installing latest version of 3.0.0.
> 
> * Apache Daffodil Examples
>   https://daffodil.apache.org/examples
> 
> * DFDLSchemas / CSV
>   https://github.com/DFDLSchemas/CSV
> 
> * csv.dfdl.xsd
>  
> https://github.com/DFDLSchemas/CSV/blob/master/src/main/resources/com/tresys/csv/xsd/csv.dfdl.xsd
> 
> 
> Had to go searching for
> 
> * <xs:include
> schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
> 
> and found a copy at
> 
> *
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-lib/src/main/resources/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd
> 
> 
> and also looked around for example data, which matches excerpt on
> examples page:
> 
> *
> https://raw.githubusercontent.com/DFDLSchemas/CSV/master/src/test/resources/com/tresys/csv/data/simpleCSV.csv
> 
> 
> Then, using Ant invocation as follows,
> 
>     <target name="daffodil.parse.csv"  description="daffodil.apache.org
> example">
>         <echo>daffodil parse csv</echo>
>         <exec executable="daffodil"  dir="." vmlauncher="false">
>             <arg value="parse"/>
>             <arg value="--schema"/>
>             <arg value="examples/csv/csv.dfdl.xsd"/>
>             <arg value="--output"/>
>             <arg value="examples/csv/simpleCSV.parse.xml"/>
>             <arg value="examples/csv/simpleCSV.csv"/>
>         </exec>
>     </target>
> 
> ... was able to produce attached simpleCSV.parse.xml (attached) which in
> turn matches following and result on HTML page,
> 
> *
> https://github.com/DFDLSchemas/CSV/blob/master/src/test/resources/com/tresys/csv/infosets/simpleCSV.xml
> 
> 
> So that's good.  However am also getting a lot of Daffodil schema warnings:
> 
> daffodil.parse.csv:
> daffodil parse csv
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: title Location line 59 column 16 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: item Location line 66 column 16 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: sequence[1] Location line 54 column 8 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: sequence[1] Location line 58 column 14 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: sequence[1] Location line 65 column 14 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: header Location line 55 column 10 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: record Location line 63 column 10 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> [warning] Schema Definition Warning: dfdl:encodingErrorPolicy="error" is
> not yet implemented. The 'replace' value will be used.
> Schema context: element reference ex:file Location line 36 in
> RobodataDFDL/examples/csv/csv.dfdl.xsd
> BUILD SUCCESSFUL (total time: 3 seconds)
> 
> Wondering please:
> 
> a. am i using the correct DFDLGeneralFormat.dfdl.xsd file for this example,
> b. is there a more recent version that is suitable for version 3.0.0 ?
> 
> Thanks for all feedback.
> 
> all the best, Don