You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by John Wass <jw...@gmail.com> on 2020/12/18 17:21:02 UTC

Embedded Schematron progress

The Embedded Schematron PR is moving along, hoping to get it out of WIP
soon.  https://github.com/apache/incubator-daffodil/pull/463

The JPEG and BMP schema repos are being used for testing now, and the PNG
looks like it would provide some great coverage.. maybe too great :/  Any
other noteworthy sources of sch+data that might be beneficial to test with?

Observations on embedding
- Behavior has been predictable, and errors have been clear
- There are multiple placement options for schematron rules in a schema
- The Validator API has held up well, but might be one issue to come out of
this effort

Examples at
https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd

Re: Embedded Schematron progress

Posted by Rege Nteligen <r....@gmail.com>.
I could put the dfdl schema and schematron schema that I was using to test GIF files somewhere if that would be helpful.

On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote: 
> The Embedded Schematron PR is moving along, hoping to get it out of WIP
> soon.  https://github.com/apache/incubator-daffodil/pull/463
> 
> The JPEG and BMP schema repos are being used for testing now, and the PNG
> looks like it would provide some great coverage.. maybe too great :/  Any
> other noteworthy sources of sch+data that might be beneficial to test with?
> 
> Observations on embedding
> - Behavior has been predictable, and errors have been clear
> - There are multiple placement options for schematron rules in a schema
> - The Validator API has held up well, but might be one issue to come out of
> this effort
> 
> Examples at
> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
> 

Re: Embedded Schematron progress

Posted by John Wass <jw...@gmail.com>.
Implemented that and added a test for it in
section02.schema_definition_errors.TestSDE.  The cli tests are in there as
well.


On Mon, Jan 18, 2021 at 9:57 AM John Wass <jw...@gmail.com> wrote:

> > I think this warning (or something like it) is actually somewhat useful
> and we shouldn't get rid of it completely.
>
> Concur.  Warn just seems a bit high of a "log level".
>
> > Seems like this should be good practice in general, but maybe we don't
> want to require it?
>
> So far it has only been a problem with embedded schematron.  Was trying to
> think of a way to only eliminate the warn in that case...
>
> > Change the logic to only output this warning if there is no source
> attribute AND the appinfo contains any elements with the dfdl namespace.
>
> I like this.  Will incorporate that to get rid of my local modifications
> and get tests passing in CI and we can see any other/better suggestions
> come about.
>
>
>
>
> On Mon, Jan 18, 2021 at 9:37 AM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> I think this warning (or something like it) is actually somewhat useful
>> and we shouldn't get rid of it completely. If someone tries to add some
>> DFDL annotations but forgets the source in the appinfo, then removing
>> this would just cause Daffodil to silently ignore the block, which could
>> be confusing. Some potential solutions:
>>
>> 1) We recommend that a source attribute be used for all appinfo elements
>> in DFDL schemas, even if not necessary. Seems like this should be good
>> practice in general, but maybe we don't want to require it?
>>
>> 2) Change the logic to only output this warning if there is no source
>> attribute AND the appinfo contains any elements with the dfdl namespace.
>> This way, if someone does accidentally leave off the source attribute
>> for DFDL annotations, they'll get a helpful warning. We might even want
>> to go a step further and give a warning if an appinfo element contains
>> elements in the dfdl namespace but that does not have the dfdl source
>> attribute. This catches both the case where the source is left off, but
>> also catches the case where someone just typos the source. This way we
>> still allow appinfo's without sources, but we only silently ignore them
>> if they do not contain any dfdl elements.
>>
>>
>> On 1/18/21 7:08 AM, John Wass wrote:
>> > Writing some CLI tests showed a diagnostic warning has gone undetected
>> in
>> > the embedded schematron parse.  Using xs:appinfo to add schematron hits
>> > [AnnotatedSchemaComponent#L403](
>> >
>> https://github.com/apache/incubator-daffodil/blob/master/daffodil-core/src/main/scala/org/apache/daffodil/dsom/AnnotatedSchemaComponent.scala#L403
>> > ).
>> >
>> > `this.SDW(WarnID.AppinfoNoSource, """xs:appinfo without source
>> attribute.
>> > Is source="http://www.ogf.org/dfdl/" missing?""")`
>> >
>> > Thoughts on siliencing or lessening significance of this (eg. log
>> instead
>> > of diagnostic)?
>> >
>> >
>> > On Wed, Jan 6, 2021 at 9:19 AM John Wass <jw...@gmail.com> wrote:
>> >
>> >> The schema and tests for BMP/GIF/JPEG were moved into branches on those
>> >> DFDLSchemas repos.  After this PR is merged and a the next release is
>> >> pubished those tests could be added to each of those repos.  I suppose
>> the
>> >> embedded schematron schema could merged any time without the tests.
>> Those
>> >> repos would be a good context to continue and resolve the "best
>> practices
>> >> in the schematron" discussions.
>> >>
>> >> On Tue, Dec 22, 2020 at 9:53 AM John Wass <jw...@gmail.com> wrote:
>> >>
>> >>>> The second one is similar to examples in the GIF schema
>> >>> <
>> https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76
>> >.
>> >>> That schema can be added in the PR unit tests, to go along with the
>> BMP and
>> >>> JPEG.
>> >>>
>> >>> Added the gif schema to the tests, looking good.  Specifically looked
>> at
>> >>> rule `count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>> >>> ../number(Size_of_Global_Color_Table) + 1)`.
>> >>>
>> >>> Working on embedding the bmp schema now as the final integration test.
>> >>>
>> >>>
>> >>> On Mon, Dec 21, 2020 at 7:49 AM John Wass <jw...@gmail.com> wrote:
>> >>>
>> >>>>> Does the process create SVRL files when it completes?
>> >>>>
>> >>>> No, the svrl is consumed and converted into Daffodil diagnostics.
>> >>>>
>> >>>>
>> >>>>>  Is there a commandline option to direct the SVRL file to a specific
>> >>>> path and name?
>> >>>>
>> >>>> It doesn't, but is a good idea and certainly could.  Passing a flag
>> >>>> through the validator config could trigger writing the file.
>> >>>>
>> >>>> Probably be in a follow up PR.
>> >>>>
>> >>>>
>> >>>>> I'm curious of those type of tests will work with this process.
>> >>>>
>> >>>> They should.  The first can be checked in a unit test that matches a
>> >>>> byte.  The second one is similar to examples in the GIF schema
>> >>>> <
>> https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76
>> >.
>> >>>> That schema can be added in the PR unit tests, to go along with the
>> BMP and
>> >>>> JPEG.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> I took a look at the sample xsd's with the imbedded schematron
>> >>>>> asserts.  It looks good.  Does the process create SVRL files when it
>> >>>>> completes?  Is there a commandline option to direct the SVRL file
>> to a
>> >>>>> specific path and name?
>> >>>>>
>> >>>>> I was recently working with a modified daffodil GIF schema and
>> >>>>> schematron to report various findings with GIF files.  Several test
>> >>>>> involved testting that keyword were not in HEX blob fields.  I'm
>> curious of
>> >>>>> those type of tests will work with this process.  This is a sample
>> assert:
>> >>>>>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')])
>> and
>> >>>>> not(./LSD_Blob[matches(.,'53514C')])">
>> >>>>>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible
>> >>>>> malicious SQL reference between segemnts
>> >>>>>             </sch:assert>
>> >>>>>
>> >>>>> I've also done test to see if the count of bytes in one field
>> matched
>> >>>>> the size of the field value from another field:
>> >>>>>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
>> >>>>> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>> >>>>> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
>> >>>>>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
>> >>>>> Size_of_Global_Color_Table RGB values.
>> >>>>>             </sch:assert>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
>> >>>>>> The Embedded Schematron PR is moving along, hoping to get it out of
>> >>>>> WIP
>> >>>>>> soon.  https://github.com/apache/incubator-daffodil/pull/463
>> >>>>>>
>> >>>>>> The JPEG and BMP schema repos are being used for testing now, and
>> the
>> >>>>> PNG
>> >>>>>> looks like it would provide some great coverage.. maybe too great
>> :/
>> >>>>> Any
>> >>>>>> other noteworthy sources of sch+data that might be beneficial to
>> test
>> >>>>> with?
>> >>>>>>
>> >>>>>> Observations on embedding
>> >>>>>> - Behavior has been predictable, and errors have been clear
>> >>>>>> - There are multiple placement options for schematron rules in a
>> >>>>> schema
>> >>>>>> - The Validator API has held up well, but might be one issue to
>> come
>> >>>>> out of
>> >>>>>> this effort
>> >>>>>>
>> >>>>>> Examples at
>> >>>>>>
>> >>>>>
>> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
>> >>>>>>
>> >>>>>
>> >>>>
>> >
>>
>>

Re: Embedded Schematron progress

Posted by John Wass <jw...@gmail.com>.
> I think this warning (or something like it) is actually somewhat useful
and we shouldn't get rid of it completely.

Concur.  Warn just seems a bit high of a "log level".

> Seems like this should be good practice in general, but maybe we don't
want to require it?

So far it has only been a problem with embedded schematron.  Was trying to
think of a way to only eliminate the warn in that case...

> Change the logic to only output this warning if there is no source
attribute AND the appinfo contains any elements with the dfdl namespace.

I like this.  Will incorporate that to get rid of my local modifications
and get tests passing in CI and we can see any other/better suggestions
come about.




On Mon, Jan 18, 2021 at 9:37 AM Steve Lawrence <sl...@apache.org> wrote:

> I think this warning (or something like it) is actually somewhat useful
> and we shouldn't get rid of it completely. If someone tries to add some
> DFDL annotations but forgets the source in the appinfo, then removing
> this would just cause Daffodil to silently ignore the block, which could
> be confusing. Some potential solutions:
>
> 1) We recommend that a source attribute be used for all appinfo elements
> in DFDL schemas, even if not necessary. Seems like this should be good
> practice in general, but maybe we don't want to require it?
>
> 2) Change the logic to only output this warning if there is no source
> attribute AND the appinfo contains any elements with the dfdl namespace.
> This way, if someone does accidentally leave off the source attribute
> for DFDL annotations, they'll get a helpful warning. We might even want
> to go a step further and give a warning if an appinfo element contains
> elements in the dfdl namespace but that does not have the dfdl source
> attribute. This catches both the case where the source is left off, but
> also catches the case where someone just typos the source. This way we
> still allow appinfo's without sources, but we only silently ignore them
> if they do not contain any dfdl elements.
>
>
> On 1/18/21 7:08 AM, John Wass wrote:
> > Writing some CLI tests showed a diagnostic warning has gone undetected in
> > the embedded schematron parse.  Using xs:appinfo to add schematron hits
> > [AnnotatedSchemaComponent#L403](
> >
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-core/src/main/scala/org/apache/daffodil/dsom/AnnotatedSchemaComponent.scala#L403
> > ).
> >
> > `this.SDW(WarnID.AppinfoNoSource, """xs:appinfo without source attribute.
> > Is source="http://www.ogf.org/dfdl/" missing?""")`
> >
> > Thoughts on siliencing or lessening significance of this (eg. log instead
> > of diagnostic)?
> >
> >
> > On Wed, Jan 6, 2021 at 9:19 AM John Wass <jw...@gmail.com> wrote:
> >
> >> The schema and tests for BMP/GIF/JPEG were moved into branches on those
> >> DFDLSchemas repos.  After this PR is merged and a the next release is
> >> pubished those tests could be added to each of those repos.  I suppose
> the
> >> embedded schematron schema could merged any time without the tests.
> Those
> >> repos would be a good context to continue and resolve the "best
> practices
> >> in the schematron" discussions.
> >>
> >> On Tue, Dec 22, 2020 at 9:53 AM John Wass <jw...@gmail.com> wrote:
> >>
> >>>> The second one is similar to examples in the GIF schema
> >>> <
> https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76
> >.
> >>> That schema can be added in the PR unit tests, to go along with the
> BMP and
> >>> JPEG.
> >>>
> >>> Added the gif schema to the tests, looking good.  Specifically looked
> at
> >>> rule `count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
> >>> ../number(Size_of_Global_Color_Table) + 1)`.
> >>>
> >>> Working on embedding the bmp schema now as the final integration test.
> >>>
> >>>
> >>> On Mon, Dec 21, 2020 at 7:49 AM John Wass <jw...@gmail.com> wrote:
> >>>
> >>>>> Does the process create SVRL files when it completes?
> >>>>
> >>>> No, the svrl is consumed and converted into Daffodil diagnostics.
> >>>>
> >>>>
> >>>>>  Is there a commandline option to direct the SVRL file to a specific
> >>>> path and name?
> >>>>
> >>>> It doesn't, but is a good idea and certainly could.  Passing a flag
> >>>> through the validator config could trigger writing the file.
> >>>>
> >>>> Probably be in a follow up PR.
> >>>>
> >>>>
> >>>>> I'm curious of those type of tests will work with this process.
> >>>>
> >>>> They should.  The first can be checked in a unit test that matches a
> >>>> byte.  The second one is similar to examples in the GIF schema
> >>>> <
> https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76
> >.
> >>>> That schema can be added in the PR unit tests, to go along with the
> BMP and
> >>>> JPEG.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> I took a look at the sample xsd's with the imbedded schematron
> >>>>> asserts.  It looks good.  Does the process create SVRL files when it
> >>>>> completes?  Is there a commandline option to direct the SVRL file to
> a
> >>>>> specific path and name?
> >>>>>
> >>>>> I was recently working with a modified daffodil GIF schema and
> >>>>> schematron to report various findings with GIF files.  Several test
> >>>>> involved testting that keyword were not in HEX blob fields.  I'm
> curious of
> >>>>> those type of tests will work with this process.  This is a sample
> assert:
> >>>>>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')])
> and
> >>>>> not(./LSD_Blob[matches(.,'53514C')])">
> >>>>>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible
> >>>>> malicious SQL reference between segemnts
> >>>>>             </sch:assert>
> >>>>>
> >>>>> I've also done test to see if the count of bytes in one field matched
> >>>>> the size of the field value from another field:
> >>>>>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
> >>>>> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
> >>>>> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
> >>>>>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
> >>>>> Size_of_Global_Color_Table RGB values.
> >>>>>             </sch:assert>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
> >>>>>> The Embedded Schematron PR is moving along, hoping to get it out of
> >>>>> WIP
> >>>>>> soon.  https://github.com/apache/incubator-daffodil/pull/463
> >>>>>>
> >>>>>> The JPEG and BMP schema repos are being used for testing now, and
> the
> >>>>> PNG
> >>>>>> looks like it would provide some great coverage.. maybe too great :/
> >>>>> Any
> >>>>>> other noteworthy sources of sch+data that might be beneficial to
> test
> >>>>> with?
> >>>>>>
> >>>>>> Observations on embedding
> >>>>>> - Behavior has been predictable, and errors have been clear
> >>>>>> - There are multiple placement options for schematron rules in a
> >>>>> schema
> >>>>>> - The Validator API has held up well, but might be one issue to come
> >>>>> out of
> >>>>>> this effort
> >>>>>>
> >>>>>> Examples at
> >>>>>>
> >>>>>
> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
> >>>>>>
> >>>>>
> >>>>
> >
>
>

Re: Embedded Schematron progress

Posted by Steve Lawrence <sl...@apache.org>.
I think this warning (or something like it) is actually somewhat useful
and we shouldn't get rid of it completely. If someone tries to add some
DFDL annotations but forgets the source in the appinfo, then removing
this would just cause Daffodil to silently ignore the block, which could
be confusing. Some potential solutions:

1) We recommend that a source attribute be used for all appinfo elements
in DFDL schemas, even if not necessary. Seems like this should be good
practice in general, but maybe we don't want to require it?

2) Change the logic to only output this warning if there is no source
attribute AND the appinfo contains any elements with the dfdl namespace.
This way, if someone does accidentally leave off the source attribute
for DFDL annotations, they'll get a helpful warning. We might even want
to go a step further and give a warning if an appinfo element contains
elements in the dfdl namespace but that does not have the dfdl source
attribute. This catches both the case where the source is left off, but
also catches the case where someone just typos the source. This way we
still allow appinfo's without sources, but we only silently ignore them
if they do not contain any dfdl elements.


On 1/18/21 7:08 AM, John Wass wrote:
> Writing some CLI tests showed a diagnostic warning has gone undetected in
> the embedded schematron parse.  Using xs:appinfo to add schematron hits
> [AnnotatedSchemaComponent#L403](
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-core/src/main/scala/org/apache/daffodil/dsom/AnnotatedSchemaComponent.scala#L403
> ).
> 
> `this.SDW(WarnID.AppinfoNoSource, """xs:appinfo without source attribute.
> Is source="http://www.ogf.org/dfdl/" missing?""")`
> 
> Thoughts on siliencing or lessening significance of this (eg. log instead
> of diagnostic)?
> 
> 
> On Wed, Jan 6, 2021 at 9:19 AM John Wass <jw...@gmail.com> wrote:
> 
>> The schema and tests for BMP/GIF/JPEG were moved into branches on those
>> DFDLSchemas repos.  After this PR is merged and a the next release is
>> pubished those tests could be added to each of those repos.  I suppose the
>> embedded schematron schema could merged any time without the tests.  Those
>> repos would be a good context to continue and resolve the "best practices
>> in the schematron" discussions.
>>
>> On Tue, Dec 22, 2020 at 9:53 AM John Wass <jw...@gmail.com> wrote:
>>
>>>> The second one is similar to examples in the GIF schema
>>> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
>>> That schema can be added in the PR unit tests, to go along with the BMP and
>>> JPEG.
>>>
>>> Added the gif schema to the tests, looking good.  Specifically looked at
>>> rule `count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>>> ../number(Size_of_Global_Color_Table) + 1)`.
>>>
>>> Working on embedding the bmp schema now as the final integration test.
>>>
>>>
>>> On Mon, Dec 21, 2020 at 7:49 AM John Wass <jw...@gmail.com> wrote:
>>>
>>>>> Does the process create SVRL files when it completes?
>>>>
>>>> No, the svrl is consumed and converted into Daffodil diagnostics.
>>>>
>>>>
>>>>>  Is there a commandline option to direct the SVRL file to a specific
>>>> path and name?
>>>>
>>>> It doesn't, but is a good idea and certainly could.  Passing a flag
>>>> through the validator config could trigger writing the file.
>>>>
>>>> Probably be in a follow up PR.
>>>>
>>>>
>>>>> I'm curious of those type of tests will work with this process.
>>>>
>>>> They should.  The first can be checked in a unit test that matches a
>>>> byte.  The second one is similar to examples in the GIF schema
>>>> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
>>>> That schema can be added in the PR unit tests, to go along with the BMP and
>>>> JPEG.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com>
>>>> wrote:
>>>>
>>>>> I took a look at the sample xsd's with the imbedded schematron
>>>>> asserts.  It looks good.  Does the process create SVRL files when it
>>>>> completes?  Is there a commandline option to direct the SVRL file to a
>>>>> specific path and name?
>>>>>
>>>>> I was recently working with a modified daffodil GIF schema and
>>>>> schematron to report various findings with GIF files.  Several test
>>>>> involved testting that keyword were not in HEX blob fields.  I'm curious of
>>>>> those type of tests will work with this process.  This is a sample assert:
>>>>>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')]) and
>>>>> not(./LSD_Blob[matches(.,'53514C')])">
>>>>>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible
>>>>> malicious SQL reference between segemnts
>>>>>             </sch:assert>
>>>>>
>>>>> I've also done test to see if the count of bytes in one field matched
>>>>> the size of the field value from another field:
>>>>>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
>>>>> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>>>>> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
>>>>>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
>>>>> Size_of_Global_Color_Table RGB values.
>>>>>             </sch:assert>
>>>>>
>>>>>
>>>>>
>>>>> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
>>>>>> The Embedded Schematron PR is moving along, hoping to get it out of
>>>>> WIP
>>>>>> soon.  https://github.com/apache/incubator-daffodil/pull/463
>>>>>>
>>>>>> The JPEG and BMP schema repos are being used for testing now, and the
>>>>> PNG
>>>>>> looks like it would provide some great coverage.. maybe too great :/
>>>>> Any
>>>>>> other noteworthy sources of sch+data that might be beneficial to test
>>>>> with?
>>>>>>
>>>>>> Observations on embedding
>>>>>> - Behavior has been predictable, and errors have been clear
>>>>>> - There are multiple placement options for schematron rules in a
>>>>> schema
>>>>>> - The Validator API has held up well, but might be one issue to come
>>>>> out of
>>>>>> this effort
>>>>>>
>>>>>> Examples at
>>>>>>
>>>>> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
>>>>>>
>>>>>
>>>>
> 


Re: Embedded Schematron progress

Posted by John Wass <jw...@gmail.com>.
Writing some CLI tests showed a diagnostic warning has gone undetected in
the embedded schematron parse.  Using xs:appinfo to add schematron hits
[AnnotatedSchemaComponent#L403](
https://github.com/apache/incubator-daffodil/blob/master/daffodil-core/src/main/scala/org/apache/daffodil/dsom/AnnotatedSchemaComponent.scala#L403
).

`this.SDW(WarnID.AppinfoNoSource, """xs:appinfo without source attribute.
Is source="http://www.ogf.org/dfdl/" missing?""")`

Thoughts on siliencing or lessening significance of this (eg. log instead
of diagnostic)?


On Wed, Jan 6, 2021 at 9:19 AM John Wass <jw...@gmail.com> wrote:

> The schema and tests for BMP/GIF/JPEG were moved into branches on those
> DFDLSchemas repos.  After this PR is merged and a the next release is
> pubished those tests could be added to each of those repos.  I suppose the
> embedded schematron schema could merged any time without the tests.  Those
> repos would be a good context to continue and resolve the "best practices
> in the schematron" discussions.
>
> On Tue, Dec 22, 2020 at 9:53 AM John Wass <jw...@gmail.com> wrote:
>
>> > The second one is similar to examples in the GIF schema
>> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
>> That schema can be added in the PR unit tests, to go along with the BMP and
>> JPEG.
>>
>> Added the gif schema to the tests, looking good.  Specifically looked at
>> rule `count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>> ../number(Size_of_Global_Color_Table) + 1)`.
>>
>> Working on embedding the bmp schema now as the final integration test.
>>
>>
>> On Mon, Dec 21, 2020 at 7:49 AM John Wass <jw...@gmail.com> wrote:
>>
>>> > Does the process create SVRL files when it completes?
>>>
>>> No, the svrl is consumed and converted into Daffodil diagnostics.
>>>
>>>
>>> >  Is there a commandline option to direct the SVRL file to a specific
>>> path and name?
>>>
>>> It doesn't, but is a good idea and certainly could.  Passing a flag
>>> through the validator config could trigger writing the file.
>>>
>>> Probably be in a follow up PR.
>>>
>>>
>>> > I'm curious of those type of tests will work with this process.
>>>
>>> They should.  The first can be checked in a unit test that matches a
>>> byte.  The second one is similar to examples in the GIF schema
>>> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
>>> That schema can be added in the PR unit tests, to go along with the BMP and
>>> JPEG.
>>>
>>>
>>>
>>>
>>> On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com>
>>> wrote:
>>>
>>>> I took a look at the sample xsd's with the imbedded schematron
>>>> asserts.  It looks good.  Does the process create SVRL files when it
>>>> completes?  Is there a commandline option to direct the SVRL file to a
>>>> specific path and name?
>>>>
>>>> I was recently working with a modified daffodil GIF schema and
>>>> schematron to report various findings with GIF files.  Several test
>>>> involved testting that keyword were not in HEX blob fields.  I'm curious of
>>>> those type of tests will work with this process.  This is a sample assert:
>>>>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')]) and
>>>> not(./LSD_Blob[matches(.,'53514C')])">
>>>>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible
>>>> malicious SQL reference between segemnts
>>>>             </sch:assert>
>>>>
>>>> I've also done test to see if the count of bytes in one field matched
>>>> the size of the field value from another field:
>>>>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
>>>> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>>>> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
>>>>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
>>>> Size_of_Global_Color_Table RGB values.
>>>>             </sch:assert>
>>>>
>>>>
>>>>
>>>> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
>>>> > The Embedded Schematron PR is moving along, hoping to get it out of
>>>> WIP
>>>> > soon.  https://github.com/apache/incubator-daffodil/pull/463
>>>> >
>>>> > The JPEG and BMP schema repos are being used for testing now, and the
>>>> PNG
>>>> > looks like it would provide some great coverage.. maybe too great :/
>>>> Any
>>>> > other noteworthy sources of sch+data that might be beneficial to test
>>>> with?
>>>> >
>>>> > Observations on embedding
>>>> > - Behavior has been predictable, and errors have been clear
>>>> > - There are multiple placement options for schematron rules in a
>>>> schema
>>>> > - The Validator API has held up well, but might be one issue to come
>>>> out of
>>>> > this effort
>>>> >
>>>> > Examples at
>>>> >
>>>> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
>>>> >
>>>>
>>>

Re: Embedded Schematron progress

Posted by John Wass <jw...@gmail.com>.
The schema and tests for BMP/GIF/JPEG were moved into branches on those
DFDLSchemas repos.  After this PR is merged and a the next release is
pubished those tests could be added to each of those repos.  I suppose the
embedded schematron schema could merged any time without the tests.  Those
repos would be a good context to continue and resolve the "best practices
in the schematron" discussions.

On Tue, Dec 22, 2020 at 9:53 AM John Wass <jw...@gmail.com> wrote:

> > The second one is similar to examples in the GIF schema
> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
> That schema can be added in the PR unit tests, to go along with the BMP and
> JPEG.
>
> Added the gif schema to the tests, looking good.  Specifically looked at
> rule `count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
> ../number(Size_of_Global_Color_Table) + 1)`.
>
> Working on embedding the bmp schema now as the final integration test.
>
>
> On Mon, Dec 21, 2020 at 7:49 AM John Wass <jw...@gmail.com> wrote:
>
>> > Does the process create SVRL files when it completes?
>>
>> No, the svrl is consumed and converted into Daffodil diagnostics.
>>
>>
>> >  Is there a commandline option to direct the SVRL file to a specific
>> path and name?
>>
>> It doesn't, but is a good idea and certainly could.  Passing a flag
>> through the validator config could trigger writing the file.
>>
>> Probably be in a follow up PR.
>>
>>
>> > I'm curious of those type of tests will work with this process.
>>
>> They should.  The first can be checked in a unit test that matches a
>> byte.  The second one is similar to examples in the GIF schema
>> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
>> That schema can be added in the PR unit tests, to go along with the BMP and
>> JPEG.
>>
>>
>>
>>
>> On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com>
>> wrote:
>>
>>> I took a look at the sample xsd's with the imbedded schematron asserts.
>>> It looks good.  Does the process create SVRL files when it completes?  Is
>>> there a commandline option to direct the SVRL file to a specific path and
>>> name?
>>>
>>> I was recently working with a modified daffodil GIF schema and
>>> schematron to report various findings with GIF files.  Several test
>>> involved testting that keyword were not in HEX blob fields.  I'm curious of
>>> those type of tests will work with this process.  This is a sample assert:
>>>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')]) and
>>> not(./LSD_Blob[matches(.,'53514C')])">
>>>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible
>>> malicious SQL reference between segemnts
>>>             </sch:assert>
>>>
>>> I've also done test to see if the count of bytes in one field matched
>>> the size of the field value from another field:
>>>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
>>> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>>> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
>>>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
>>> Size_of_Global_Color_Table RGB values.
>>>             </sch:assert>
>>>
>>>
>>>
>>> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
>>> > The Embedded Schematron PR is moving along, hoping to get it out of WIP
>>> > soon.  https://github.com/apache/incubator-daffodil/pull/463
>>> >
>>> > The JPEG and BMP schema repos are being used for testing now, and the
>>> PNG
>>> > looks like it would provide some great coverage.. maybe too great :/
>>> Any
>>> > other noteworthy sources of sch+data that might be beneficial to test
>>> with?
>>> >
>>> > Observations on embedding
>>> > - Behavior has been predictable, and errors have been clear
>>> > - There are multiple placement options for schematron rules in a schema
>>> > - The Validator API has held up well, but might be one issue to come
>>> out of
>>> > this effort
>>> >
>>> > Examples at
>>> >
>>> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
>>> >
>>>
>>

Re: Embedded Schematron progress

Posted by John Wass <jw...@gmail.com>.
> The second one is similar to examples in the GIF schema
<https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
That schema can be added in the PR unit tests, to go along with the BMP and
JPEG.

Added the gif schema to the tests, looking good.  Specifically looked at
rule `count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
../number(Size_of_Global_Color_Table) + 1)`.

Working on embedding the bmp schema now as the final integration test.


On Mon, Dec 21, 2020 at 7:49 AM John Wass <jw...@gmail.com> wrote:

> > Does the process create SVRL files when it completes?
>
> No, the svrl is consumed and converted into Daffodil diagnostics.
>
>
> >  Is there a commandline option to direct the SVRL file to a specific
> path and name?
>
> It doesn't, but is a good idea and certainly could.  Passing a flag
> through the validator config could trigger writing the file.
>
> Probably be in a follow up PR.
>
>
> > I'm curious of those type of tests will work with this process.
>
> They should.  The first can be checked in a unit test that matches a
> byte.  The second one is similar to examples in the GIF schema
> <https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
> That schema can be added in the PR unit tests, to go along with the BMP and
> JPEG.
>
>
>
>
> On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com>
> wrote:
>
>> I took a look at the sample xsd's with the imbedded schematron asserts.
>> It looks good.  Does the process create SVRL files when it completes?  Is
>> there a commandline option to direct the SVRL file to a specific path and
>> name?
>>
>> I was recently working with a modified daffodil GIF schema and schematron
>> to report various findings with GIF files.  Several test involved testting
>> that keyword were not in HEX blob fields.  I'm curious of those type of
>> tests will work with this process.  This is a sample assert:
>>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')]) and
>> not(./LSD_Blob[matches(.,'53514C')])">
>>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible
>> malicious SQL reference between segemnts
>>             </sch:assert>
>>
>> I've also done test to see if the count of bytes in one field matched the
>> size of the field value from another field:
>>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
>> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
>> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
>>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
>> Size_of_Global_Color_Table RGB values.
>>             </sch:assert>
>>
>>
>>
>> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
>> > The Embedded Schematron PR is moving along, hoping to get it out of WIP
>> > soon.  https://github.com/apache/incubator-daffodil/pull/463
>> >
>> > The JPEG and BMP schema repos are being used for testing now, and the
>> PNG
>> > looks like it would provide some great coverage.. maybe too great :/
>> Any
>> > other noteworthy sources of sch+data that might be beneficial to test
>> with?
>> >
>> > Observations on embedding
>> > - Behavior has been predictable, and errors have been clear
>> > - There are multiple placement options for schematron rules in a schema
>> > - The Validator API has held up well, but might be one issue to come
>> out of
>> > this effort
>> >
>> > Examples at
>> >
>> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
>> >
>>
>

Re: Embedded Schematron progress

Posted by John Wass <jw...@gmail.com>.
> Does the process create SVRL files when it completes?

No, the svrl is consumed and converted into Daffodil diagnostics.


>  Is there a commandline option to direct the SVRL file to a specific path
and name?

It doesn't, but is a good idea and certainly could.  Passing a flag through
the validator config could trigger writing the file.

Probably be in a follow up PR.


> I'm curious of those type of tests will work with this process.

They should.  The first can be checked in a unit test that matches a byte.
The second one is similar to examples in the GIF schema
<https://github.com/DFDLSchemas/GIF/blob/master/src/main/resources/com/mitre/gif/sch/GIF.sch#L74-L76>.
That schema can be added in the PR unit tests, to go along with the BMP and
JPEG.




On Fri, Dec 18, 2020 at 2:43 PM Rege Nteligen <r....@gmail.com> wrote:

> I took a look at the sample xsd's with the imbedded schematron asserts.
> It looks good.  Does the process create SVRL files when it completes?  Is
> there a commandline option to direct the SVRL file to a specific path and
> name?
>
> I was recently working with a modified daffodil GIF schema and schematron
> to report various findings with GIF files.  Several test involved testting
> that keyword were not in HEX blob fields.  I'm curious of those type of
> tests will work with this process.  This is a sample assert:
>              <sch:assert test="not(./LSD_Blob[matches(.,'73716C')]) and
> not(./LSD_Blob[matches(.,'53514C')])">
>                 GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible malicious
> SQL reference between segemnts
>             </sch:assert>
>
> I've also done test to see if the count of bytes in one field matched the
> size of the field value from another field:
>             <sch:assert test="if(/GIF/Global_Color_Table/RGB) then
> count(/GIF/Global_Color_Table/RGB) eq math:pow(2,
> Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
>                 GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be
> Size_of_Global_Color_Table RGB values.
>             </sch:assert>
>
>
>
> On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote:
> > The Embedded Schematron PR is moving along, hoping to get it out of WIP
> > soon.  https://github.com/apache/incubator-daffodil/pull/463
> >
> > The JPEG and BMP schema repos are being used for testing now, and the PNG
> > looks like it would provide some great coverage.. maybe too great :/  Any
> > other noteworthy sources of sch+data that might be beneficial to test
> with?
> >
> > Observations on embedding
> > - Behavior has been predictable, and errors have been clear
> > - There are multiple placement options for schematron rules in a schema
> > - The Validator API has held up well, but might be one issue to come out
> of
> > this effort
> >
> > Examples at
> >
> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
> >
>

Re: Embedded Schematron progress

Posted by Rege Nteligen <r....@gmail.com>.
I took a look at the sample xsd's with the imbedded schematron asserts.  It looks good.  Does the process create SVRL files when it completes?  Is there a commandline option to direct the SVRL file to a specific path and name?

I was recently working with a modified daffodil GIF schema and schematron to report various findings with GIF files.  Several test involved testting that keyword were not in HEX blob fields.  I'm curious of those type of tests will work with this process.  This is a sample assert:
             <sch:assert test="not(./LSD_Blob[matches(.,'73716C')]) and not(./LSD_Blob[matches(.,'53514C')])">
                GIF: FAIL: LSD_Blob: AFTER-HDR-REF-SQL: Possible malicious SQL reference between segemnts
            </sch:assert>

I've also done test to see if the count of bytes in one field matched the size of the field value from another field:
            <sch:assert test="if(/GIF/Global_Color_Table/RGB) then count(/GIF/Global_Color_Table/RGB) eq math:pow(2, Packed_Byte/number(Size_of_Global_Color_Table) + 1) else true()">
                GIF: RED: LSG_GCL: GCL-RGB-CNT: There must be Size_of_Global_Color_Table RGB values.
            </sch:assert>



On 2020/12/18 17:21:02, John Wass <jw...@gmail.com> wrote: 
> The Embedded Schematron PR is moving along, hoping to get it out of WIP
> soon.  https://github.com/apache/incubator-daffodil/pull/463
> 
> The JPEG and BMP schema repos are being used for testing now, and the PNG
> looks like it would provide some great coverage.. maybe too great :/  Any
> other noteworthy sources of sch+data that might be beneficial to test with?
> 
> Observations on embedding
> - Behavior has been predictable, and errors have been clear
> - There are multiple placement options for schematron rules in a schema
> - The Validator API has held up well, but might be one issue to come out of
> this effort
> 
> Examples at
> https://github.com/jw3/incubator-daffodil/tree/validator_spi/embedded_schematron/daffodil-schematron/src/test/resources/xsd
>