You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Sloane, Brandon" <bs...@tresys.com> on 2019/04/05 21:12:26 UTC

Exposing latent SDEs

This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].


However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.


For instance, in section05/facets/Facets.tdml we have the following schema:

4856     <xs:simpleType name="enum_st1">
4857       <xs:restriction base="xs:string">
4858         <xs:enumeration value="Trout" />
4859         <xs:enumeration value="Bass" />
4860         <xs:enumeration value="Catfish" />
4861       </xs:restriction>
4862     </xs:simpleType>


4880     <xs:simpleType name="enum_st4">
4881       <xs:restriction base="ex:enum_st1">
4882         <xs:enumeration value="Trout" />
4883         <xs:enumeration value="Bass" />
4884         <xs:enumeration value="Carp" />
4885       </xs:restriction>
4886     </xs:simpleType>

As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"

The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.

Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.


[0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.


Regards,


Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com

Re: Exposing latent SDEs

Posted by "Beckerle, Mike" <mb...@tresys.com>.

To achieve Seq[=> T] You will need a call-by-name constructor. Traditionally these are named:

class Delay[T](byNameValue: => T) {
   lazy val value = byNameValue
   def get = value
}

So you need to create a Seq[Delay[T]] not a Seq[=> T]

To force the evaluation you just call get or value on the delay object.

The import/include logic in Daffodil uses this trick. So there's precedent.

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Monday, April 8, 2019 6:29 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

It apears that hacking around this is not as simple as I would loke. The problem is being triggered from SchemaSet, when we evaulate the line:


  *   lazy val globalSimpleTypeDefs: Seq[GlobalSimpleTypeDefFactory] = schemas.flatMap(_.globalSimpleTypeDefs)


To surpress errors from unused SimpleTypes as suggested, we would want the resulting type to be:

  *   Seq[ => GlobalSimpleTypeDefFactory]

Which does not appear to be something that we can do (Scala does not recognize this signature as syntactically valid).

We could be even more explicit about it, and use the type:

  *   Seq[ () => GlobalSimpleTypeDefFactory]

Which should work, but seems even more hacky. (In particular, we would need to be careful that we actually cache the values if we want to maintain the at-most-once sementics we expect from lazy values)


If you are curious, the actual issue (at least in the case I am looking at now), is being triggered by the "requiredEvaluations(defaultPropertySources)" line of AnnotatedSchemaComponant, which is a trait of GlobalSimpleTypeDefFactory (Now that we are actually computing things on the factory, it needs access to some of the annotations)


I don't really understand what the purpose of requiredEvaluation is, so I don't want to remove it.


Again, the only time this would be an issue is when we have schema which A) contains an error but B) happens to work if we ignore the error.


Given A), I would like to once again ask if it is acceptable to change our behavior to reject such schemas. This will involve refactoring a number of tests which deliberately include broken schema to test for error messages.

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 6:19:46 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

The issue is that we need to compile the map of GlobalSimpleTypeFactories, as that is the data structure that the compiler uses whenever it needs to look up a type by qname.


I suppose we could change the type of that data structure from (guessing at what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we want, but we are then relying on lazyness for our program to be correct, which always makes me a bit nervous.


The only thing this gets us is the ability to compile broken schema so long as the broken part is not being used. Apart from backwards compatibility concerns, I am not sure we are doing anyone any favors by allowing this.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Friday, April 5, 2019 5:59:12 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

Do we have to compile simple types even if unused? Cant we compile them lazily if used.

I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 5:12:26 PM
To: dev@daffodil.apache.org
Subject: Exposing latent SDEs

This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].


However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.


For instance, in section05/facets/Facets.tdml we have the following schema:

4856     <xs:simpleType name="enum_st1">
4857       <xs:restriction base="xs:string">
4858         <xs:enumeration value="Trout" />
4859         <xs:enumeration value="Bass" />
4860         <xs:enumeration value="Catfish" />
4861       </xs:restriction>
4862     </xs:simpleType>


4880     <xs:simpleType name="enum_st4">
4881       <xs:restriction base="ex:enum_st1">
4882         <xs:enumeration value="Trout" />
4883         <xs:enumeration value="Bass" />
4884         <xs:enumeration value="Carp" />
4885       </xs:restriction>
4886     </xs:simpleType>

As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"

The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.

Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.


[0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.


Regards,


Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com

Re: Exposing latent SDEs

Posted by "Sloane, Brandon" <bs...@tresys.com>.

Overall, I suspect that any exactly-once work we can do would be a net performance increase over our current behavior (which will repeat the work for every usage) in most cases, even for schemas that define many unused types (since, in many cases, there would also be a lot of generated schema that uses some of those types many times).

Switching to at most once semantics would probably be a performance improvement in most cases. All of the fields of GlobalSimpleTypeDefFactory are lazy, and I was able to move the requiredEvaluations off of the factory classes without breaking any tests which seems to have solved some of the problems.

The only remaining issue we need to decide about is related to the actual typeCalculator implementation.

In theory, it is possible for someone to write silly expressions, such as { dfdl:typeInputCalcInt(../functionName, ../functionValue) }

which would require us loading the type calculator for all globalSimpleTypes which define one (and therefore evaluating enough of all globalSimpleTypes to determine if they have one). Currently, this should be what we are doing, as nothing else evaulates unused simple types, but I suspect the process of determing if a type defines a typeCalc involves unnecessarily computing much of what the typeCalc would be (which should be fixable, but it is likely we would accidentally reintroduce a datadependency without noticing at some point in the future)

We could also, as Mike has suggested previously, not allow cases like the above, and insist that the function name parameter is always a constant. This would be a fair bit more work to implement in the compiler that I would prefer to avoid doing.

I think it is not unreasonable for a compiler to spend time looking at dead code. If there is a schema where that is a significant issue, they can add a pre-compilation step to strip out unused types. But I suspect the time spent partially analyzing all global types is not going to be significant. I would prefer to see profiling data to the contrary before spending time worrying about it.

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Tuesday, April 9, 2019 7:42:10 AM
To: dev@daffodil.apache.org; Sloane, Brandon
Subject: Re: Exposing latent SDEs

I personally don't have any problem with detecting errors even those
elements aren't used. It is a backwards incompatible change, but no one
should complain about improved error detection.

I do have a concern that this is potentialy a big compile time
performance hit. I easily can imagine cases were someone generates a
large set of enumerations based on some specification but only some
small set are actually used in someones use-case. In this case, we
probably want to avoid compiling/checking every single enumeration if
we're never going to use it. Doing things lazy should avoid that. It
might make sense to have an option to allow checking everything, even
things not used, but I would prefer that option to default to off.

- Steve

On 4/8/19 6:29 PM, Sloane, Brandon wrote:
> It apears that hacking around this is not as simple as I would loke. The problem is being triggered from SchemaSet, when we evaulate the line:
>
>
>   *   lazy val globalSimpleTypeDefs: Seq[GlobalSimpleTypeDefFactory] = schemas.flatMap(_.globalSimpleTypeDefs)
>
>
> To surpress errors from unused SimpleTypes as suggested, we would want the resulting type to be:
>
>   *   Seq[ => GlobalSimpleTypeDefFactory]
>
> Which does not appear to be something that we can do (Scala does not recognize this signature as syntactically valid).
>
> We could be even more explicit about it, and use the type:
>
>   *   Seq[ () => GlobalSimpleTypeDefFactory]
>
> Which should work, but seems even more hacky. (In particular, we would need to be careful that we actually cache the values if we want to maintain the at-most-once sementics we expect from lazy values)
>
>
> If you are curious, the actual issue (at least in the case I am looking at now), is being triggered by the "requiredEvaluations(defaultPropertySources)" line of AnnotatedSchemaComponant, which is a trait of GlobalSimpleTypeDefFactory (Now that we are actually computing things on the factory, it needs access to some of the annotations)
>
>
> I don't really understand what the purpose of requiredEvaluation is, so I don't want to remove it.
>
>
> Again, the only time this would be an issue is when we have schema which A) contains an error but B) happens to work if we ignore the error.
>
>
> Given A), I would like to once again ask if it is acceptable to change our behavior to reject such schemas. This will involve refactoring a number of tests which deliberately include broken schema to test for error messages.
>
> ________________________________
> From: Sloane, Brandon <bs...@tresys.com>
> Sent: Friday, April 5, 2019 6:19:46 PM
> To: dev@daffodil.apache.org
> Subject: Re: Exposing latent SDEs
>
> The issue is that we need to compile the map of GlobalSimpleTypeFactories, as that is the data structure that the compiler uses whenever it needs to look up a type by qname.
>
>
> I suppose we could change the type of that data structure from (guessing at what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we want, but we are then relying on lazyness for our program to be correct, which always makes me a bit nervous.
>
>
> The only thing this gets us is the ability to compile broken schema so long as the broken part is not being used. Apart from backwards compatibility concerns, I am not sure we are doing anyone any favors by allowing this.
>
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Friday, April 5, 2019 5:59:12 PM
> To: dev@daffodil.apache.org
> Subject: Re: Exposing latent SDEs
>
> Do we have to compile simple types even if unused? Cant we compile them lazily if used.
>
> I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> ________________________________
> From: Sloane, Brandon <bs...@tresys.com>
> Sent: Friday, April 5, 2019 5:12:26 PM
> To: dev@daffodil.apache.org
> Subject: Exposing latent SDEs
>
> This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].
>
>
> However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.
>
>
> For instance, in section05/facets/Facets.tdml we have the following schema:
>
> 4856     <xs:simpleType name="enum_st1">
> 4857       <xs:restriction base="xs:string">
> 4858         <xs:enumeration value="Trout" />
> 4859         <xs:enumeration value="Bass" />
> 4860         <xs:enumeration value="Catfish" />
> 4861       </xs:restriction>
> 4862     </xs:simpleType>
>
>
> 4880     <xs:simpleType name="enum_st4">
> 4881       <xs:restriction base="ex:enum_st1">
> 4882         <xs:enumeration value="Trout" />
> 4883         <xs:enumeration value="Bass" />
> 4884         <xs:enumeration value="Carp" />
> 4885       </xs:restriction>
> 4886     </xs:simpleType>
>
> As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"
>
> The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.
>
> Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.
>
>
> [0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.
>
>
> Regards,
>
>
> Brandon T. Sloane
>
> Associate, Services
>
> bsloane@tresys.com | tresys.com
>

Re: Exposing latent SDEs

Posted by Steve Lawrence <sl...@apache.org>.

I personally don't have any problem with detecting errors even those
elements aren't used. It is a backwards incompatible change, but no one
should complain about improved error detection.

I do have a concern that this is potentialy a big compile time
performance hit. I easily can imagine cases were someone generates a
large set of enumerations based on some specification but only some
small set are actually used in someones use-case. In this case, we
probably want to avoid compiling/checking every single enumeration if
we're never going to use it. Doing things lazy should avoid that. It
might make sense to have an option to allow checking everything, even
things not used, but I would prefer that option to default to off.

- Steve



On 4/8/19 6:29 PM, Sloane, Brandon wrote:
> It apears that hacking around this is not as simple as I would loke. The problem is being triggered from SchemaSet, when we evaulate the line:
> 
> 
>   *   lazy val globalSimpleTypeDefs: Seq[GlobalSimpleTypeDefFactory] = schemas.flatMap(_.globalSimpleTypeDefs)
> 
> 
> To surpress errors from unused SimpleTypes as suggested, we would want the resulting type to be:
> 
>   *   Seq[ => GlobalSimpleTypeDefFactory]
> 
> Which does not appear to be something that we can do (Scala does not recognize this signature as syntactically valid).
> 
> We could be even more explicit about it, and use the type:
> 
>   *   Seq[ () => GlobalSimpleTypeDefFactory]
> 
> Which should work, but seems even more hacky. (In particular, we would need to be careful that we actually cache the values if we want to maintain the at-most-once sementics we expect from lazy values)
> 
> 
> If you are curious, the actual issue (at least in the case I am looking at now), is being triggered by the "requiredEvaluations(defaultPropertySources)" line of AnnotatedSchemaComponant, which is a trait of GlobalSimpleTypeDefFactory (Now that we are actually computing things on the factory, it needs access to some of the annotations)
> 
> 
> I don't really understand what the purpose of requiredEvaluation is, so I don't want to remove it.
> 
> 
> Again, the only time this would be an issue is when we have schema which A) contains an error but B) happens to work if we ignore the error.
> 
> 
> Given A), I would like to once again ask if it is acceptable to change our behavior to reject such schemas. This will involve refactoring a number of tests which deliberately include broken schema to test for error messages.
> 
> ________________________________
> From: Sloane, Brandon <bs...@tresys.com>
> Sent: Friday, April 5, 2019 6:19:46 PM
> To: dev@daffodil.apache.org
> Subject: Re: Exposing latent SDEs
> 
> The issue is that we need to compile the map of GlobalSimpleTypeFactories, as that is the data structure that the compiler uses whenever it needs to look up a type by qname.
> 
> 
> I suppose we could change the type of that data structure from (guessing at what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we want, but we are then relying on lazyness for our program to be correct, which always makes me a bit nervous.
> 
> 
> The only thing this gets us is the ability to compile broken schema so long as the broken part is not being used. Apart from backwards compatibility concerns, I am not sure we are doing anyone any favors by allowing this.
> 
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Friday, April 5, 2019 5:59:12 PM
> To: dev@daffodil.apache.org
> Subject: Re: Exposing latent SDEs
> 
> Do we have to compile simple types even if unused? Cant we compile them lazily if used.
> 
> I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.
> 
> Get Outlook for Android<https://aka.ms/ghei36>
> 
> ________________________________
> From: Sloane, Brandon <bs...@tresys.com>
> Sent: Friday, April 5, 2019 5:12:26 PM
> To: dev@daffodil.apache.org
> Subject: Exposing latent SDEs
> 
> This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].
> 
> 
> However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.
> 
> 
> For instance, in section05/facets/Facets.tdml we have the following schema:
> 
> 4856     <xs:simpleType name="enum_st1">
> 4857       <xs:restriction base="xs:string">
> 4858         <xs:enumeration value="Trout" />
> 4859         <xs:enumeration value="Bass" />
> 4860         <xs:enumeration value="Catfish" />
> 4861       </xs:restriction>
> 4862     </xs:simpleType>
> 
> 
> 4880     <xs:simpleType name="enum_st4">
> 4881       <xs:restriction base="ex:enum_st1">
> 4882         <xs:enumeration value="Trout" />
> 4883         <xs:enumeration value="Bass" />
> 4884         <xs:enumeration value="Carp" />
> 4885       </xs:restriction>
> 4886     </xs:simpleType>
> 
> As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"
> 
> The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.
> 
> Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.
> 
> 
> [0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.
> 
> 
> Regards,
> 
> 
> Brandon T. Sloane
> 
> Associate, Services
> 
> bsloane@tresys.com | tresys.com
>

Re: Exposing latent SDEs

Posted by "Sloane, Brandon" <bs...@tresys.com>.

It apears that hacking around this is not as simple as I would loke. The problem is being triggered from SchemaSet, when we evaulate the line:


  *   lazy val globalSimpleTypeDefs: Seq[GlobalSimpleTypeDefFactory] = schemas.flatMap(_.globalSimpleTypeDefs)


To surpress errors from unused SimpleTypes as suggested, we would want the resulting type to be:

  *   Seq[ => GlobalSimpleTypeDefFactory]

Which does not appear to be something that we can do (Scala does not recognize this signature as syntactically valid).

We could be even more explicit about it, and use the type:

  *   Seq[ () => GlobalSimpleTypeDefFactory]

Which should work, but seems even more hacky. (In particular, we would need to be careful that we actually cache the values if we want to maintain the at-most-once sementics we expect from lazy values)


If you are curious, the actual issue (at least in the case I am looking at now), is being triggered by the "requiredEvaluations(defaultPropertySources)" line of AnnotatedSchemaComponant, which is a trait of GlobalSimpleTypeDefFactory (Now that we are actually computing things on the factory, it needs access to some of the annotations)


I don't really understand what the purpose of requiredEvaluation is, so I don't want to remove it.


Again, the only time this would be an issue is when we have schema which A) contains an error but B) happens to work if we ignore the error.


Given A), I would like to once again ask if it is acceptable to change our behavior to reject such schemas. This will involve refactoring a number of tests which deliberately include broken schema to test for error messages.

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 6:19:46 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

The issue is that we need to compile the map of GlobalSimpleTypeFactories, as that is the data structure that the compiler uses whenever it needs to look up a type by qname.


I suppose we could change the type of that data structure from (guessing at what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we want, but we are then relying on lazyness for our program to be correct, which always makes me a bit nervous.


The only thing this gets us is the ability to compile broken schema so long as the broken part is not being used. Apart from backwards compatibility concerns, I am not sure we are doing anyone any favors by allowing this.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Friday, April 5, 2019 5:59:12 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

Do we have to compile simple types even if unused? Cant we compile them lazily if used.

I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 5:12:26 PM
To: dev@daffodil.apache.org
Subject: Exposing latent SDEs

This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].


However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.


For instance, in section05/facets/Facets.tdml we have the following schema:

4856     <xs:simpleType name="enum_st1">
4857       <xs:restriction base="xs:string">
4858         <xs:enumeration value="Trout" />
4859         <xs:enumeration value="Bass" />
4860         <xs:enumeration value="Catfish" />
4861       </xs:restriction>
4862     </xs:simpleType>


4880     <xs:simpleType name="enum_st4">
4881       <xs:restriction base="ex:enum_st1">
4882         <xs:enumeration value="Trout" />
4883         <xs:enumeration value="Bass" />
4884         <xs:enumeration value="Carp" />
4885       </xs:restriction>
4886     </xs:simpleType>

As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"

The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.

Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.


[0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.


Regards,


Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com

Re: Exposing latent SDEs

Posted by "Beckerle, Mike" <mb...@tresys.com>.

We rely on laziness for correctness all over the place in Daffodil. So I wouldn't worry about that.

There are tons of lazy "attribute" calculations which only make sense if a wide variety of invariants hold, and we're depending on the fact that if those invariants do NOT hold, nobody will demand this lazy value. E.g., there are tons of pieces of code that only make sense for simple types which we depend on the fact that they're never requested when the element is of complex type.

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 6:19 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

The issue is that we need to compile the map of GlobalSimpleTypeFactories, as that is the data structure that the compiler uses whenever it needs to look up a type by qname.

I suppose we could change the type of that data structure from (guessing at what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we want, but we are then relying on lazyness for our program to be correct, which always makes me a bit nervous.

The only thing this gets us is the ability to compile broken schema so long as the broken part is not being used. Apart from backwards compatibility concerns, I am not sure we are doing anyone any favors by allowing this.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Friday, April 5, 2019 5:59:12 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

Do we have to compile simple types even if unused? Cant we compile them lazily if used.

I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 5:12:26 PM
To: dev@daffodil.apache.org
Subject: Exposing latent SDEs

This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].

However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.

For instance, in section05/facets/Facets.tdml we have the following schema:

4856     <xs:simpleType name="enum_st1">
4857       <xs:restriction base="xs:string">
4858         <xs:enumeration value="Trout" />
4859         <xs:enumeration value="Bass" />
4860         <xs:enumeration value="Catfish" />
4861       </xs:restriction>
4862     </xs:simpleType>

4880     <xs:simpleType name="enum_st4">
4881       <xs:restriction base="ex:enum_st1">
4882         <xs:enumeration value="Trout" />
4883         <xs:enumeration value="Bass" />
4884         <xs:enumeration value="Carp" />
4885       </xs:restriction>
4886     </xs:simpleType>

As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"

The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.

Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.

[0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.

Regards,

Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com

Re: Exposing latent SDEs

Posted by "Sloane, Brandon" <bs...@tresys.com>.

The issue is that we need to compile the map of GlobalSimpleTypeFactories, as that is the data structure that the compiler uses whenever it needs to look up a type by qname.


I suppose we could change the type of that data structure from (guessing at what the original structure looks like) Map[QName, GlobalSimpleTypeFactory] tp Map[QName, => GlobalSimpleTypeFactory], which probably will do what we want, but we are then relying on lazyness for our program to be correct, which always makes me a bit nervous.


The only thing this gets us is the ability to compile broken schema so long as the broken part is not being used. Apart from backwards compatibility concerns, I am not sure we are doing anyone any favors by allowing this.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Friday, April 5, 2019 5:59:12 PM
To: dev@daffodil.apache.org
Subject: Re: Exposing latent SDEs

Do we have to compile simple types even if unused? Cant we compile them lazily if used.

I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 5:12:26 PM
To: dev@daffodil.apache.org
Subject: Exposing latent SDEs

This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].


However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.


For instance, in section05/facets/Facets.tdml we have the following schema:

4856     <xs:simpleType name="enum_st1">
4857       <xs:restriction base="xs:string">
4858         <xs:enumeration value="Trout" />
4859         <xs:enumeration value="Bass" />
4860         <xs:enumeration value="Catfish" />
4861       </xs:restriction>
4862     </xs:simpleType>


4880     <xs:simpleType name="enum_st4">
4881       <xs:restriction base="ex:enum_st1">
4882         <xs:enumeration value="Trout" />
4883         <xs:enumeration value="Bass" />
4884         <xs:enumeration value="Carp" />
4885       </xs:restriction>
4886     </xs:simpleType>

As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"

The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.

Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.


[0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.


Regards,


Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com

Re: Exposing latent SDEs

Posted by "Beckerle, Mike" <mb...@tresys.com>.

Do we have to compile simple types even if unused? Cant we compile them lazily if used.

I am very happy to restrict expressions that use simple type qnames for them to have to be literal constants. Then compiling the expressions would provide the qnames of the types actually being used.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Sloane, Brandon <bs...@tresys.com>
Sent: Friday, April 5, 2019 5:12:26 PM
To: dev@daffodil.apache.org
Subject: Exposing latent SDEs

This is related to the previous thread with the subject "Further design difficulties with TypeValueCalculators". I believe I have solved the main issue of that thread by computing attributes that do not depend on the context in the SimpleTypeDefFactory instead of the instance class [0].

However, there is still an issue where I am changing the behaviour of Daffodil to compile aspects simpleTypes regardless of if they are used or not. We avoid the previous problem by making these aspects only those whose correctness does not depend on the local context. However, there is still an issue where if an unused simpleType is just plain broken, it will now emit an SDE.

For instance, in section05/facets/Facets.tdml we have the following schema:

4856     <xs:simpleType name="enum_st1">
4857       <xs:restriction base="xs:string">
4858         <xs:enumeration value="Trout" />
4859         <xs:enumeration value="Bass" />
4860         <xs:enumeration value="Catfish" />
4861       </xs:restriction>
4862     </xs:simpleType>

4880     <xs:simpleType name="enum_st4">
4881       <xs:restriction base="ex:enum_st1">
4882         <xs:enumeration value="Trout" />
4883         <xs:enumeration value="Bass" />
4884         <xs:enumeration value="Carp" />
4885       </xs:restriction>
4886     </xs:simpleType>

As test case facetEnum06 verifies, enum_st4 is broken because "Local enumerations must be a subset of base enumerations"

The issue I am now running into is that all tests that use that schema are now failing due to this, even if they do not actually use enum_st4.

Abstractly, I don't mind calling this acceptable behaviour, as there is an SDE in any schema containing enum_st4, even if the original implementation ignored it; and I don't mind updating the relevent test files to isolate these broken types in their own schema, but I wanted to verify that it is okay to make this sort of backwards incompatible change.

[0] This involved a fair amount of refactoring. There is more refactoring that can be done along these lines (which I believe will help with our performance issue), but I only did what was needed to support the functionality I am adding.

Regards,

Brandon T. Sloane

Associate, Services

bsloane@tresys.com | tresys.com