You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Beckerle, Mike" <mb...@tresys.com> on 2020/03/25 14:55:24 UTC

Compiler.setExternalDFDLVariable(s) considered challenged

Why does the API for Daffodil have

Compiler.setExternalDFDLVariable(...)
and
Compiler.setExternalDFDLVariables(...)

on it.

I believe we should deprecate this.

Compilers are parameterized by some of the tunables I understand.

But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.

But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.

Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)

I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.

The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.

Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:

so on main thread.....
     dp.setExternalVariables(...bindings 1...)
     spawn the thread 1
on the thread 1
     dp.parse(....)
back on main thread
     dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
     dp.setExternalVariables(...bindings 2....)
    .....

However, if we make the external variable bindings an argument to parse, we avoid all of this.

Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.


Comments?





Mike Beckerle | Principal Engineer
[Owl Cyber Defense]<http://owlcyberdefense.com>
[cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
P +1-781-330-0412
W owlcyberdefense.com<http://www.owlcyberdefense.com>

Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by Steve Lawrence <st...@gmail.com>.
Agreed on deprecating setting variables in the Compiler. It really just
doesn't make sense to be doing that.

Definitely seems reasonable to add the ability to set variables when
calling parse/unparse as that's the most threadsafe way to handle it,
and feels natural to me. Note that we still have other mutable state in
a DataProcessor like tuanbles, validationeMode, debbuging, that we might
want to think about as well. Though those do feel a little different
than variables.

The idea of having to call a clone() method to change a DataProcessors
variables seems a little unintuitive to me. But if we take a more
functional approach and make it entirely non-mutable, it maybe seems
better, something like:

Any thoughts on what the API might like going the parse route vs the
DataProcessor route?


On 3/25/20 10:55 AM, Beckerle, Mike wrote:
> Why does the API for Daffodil have
> 
> Compiler.setExternalDFDLVariable(...)
> and
> Compiler.setExternalDFDLVariables(...)
> 
> on it.
> 
> I believe we should deprecate this.
> 
> Compilers are parameterized by some of the tunables I understand.
> 
> But the external DFDL variables? These cannot affect compilation. The schema 
> compiler needs to know statically the information about variables found in the 
> schema itself in the dfdl:defineVariable statement annotations.
> 
> But the compiler doesn't need external variable bindings. In fact if it did know 
> and use them, it would be building assumptions into the compiled schema that it 
> shouldn't be building in.
> 
> Setting external var bindings on the Compiler just causes problems when that 
> compiler instance is reused in a context where those settings aren't 
> appropriate. (JIRA DAFFODIL-2302 is one such problem)
> 
> I believe setExternalDFDLVariable(s) methods should be deprecated, and external 
> variables bindings should be an optional argument to the parse/unparse methods.
> 
> The setters cause thread safety issues because the DP is stateful, even though 
> we want multiple calls to parse/unparse to be executable on different threads.
> 
> Consider: if we allow ordinary setExternalDFDLVariables and add a 
> resetExternalDFDLVariables to clear them, then imagine one wants to make two 
> parse calls on separate threads with different external variables bindings:
> 
> so on main thread.....
>       dp.setExternalVariables(...bindings 1...)
>       spawn the thread 1
> on the thread 1
>       dp.parse(....)
> back on main thread
>       dp.resetExternalVariables() // race condition. Did the parse call read the 
> external variables before this reset or not?
>       dp.setExternalVariables(...bindings 2....)
>      .....
> 
> However, if we make the external variable bindings an argument to parse, we 
> avoid all of this.
> 
> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit 
> multiple calls on the same DataProcessor object simultaneously. We can provide a 
> clone() method that preserves the loaded/reloaded processor, but constructs 
> another DataProcessor object, thereby allowing separate external variables state 
> per DataProcessor instance.
> 
> 
> Comments?
> 
> 
> 
> 
> 
> Mike Beckerle | Principal Engineer
> Owl Cyber Defense <http://owlcyberdefense.com>
> /is now a part of Owl 
> <https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>/
> P +1-781-330-0412
> W owlcyberdefense.com <http://www.owlcyberdefense.com>
> 


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by Steve Lawrence <sl...@apache.org>.
Sounds good to me. +1

On 3/26/20 11:52 AM, Beckerle, Mike wrote:
> You definitely cannot share a dp across threads this way.
> 
> I propose just deprecating the setExternalVariable API (along with the other setters) and that pattern generally, and a new more functional style API  for establishing tunables, external variables, and validation modes, that will keep us out of trouble.
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, March 26, 2020 7:39 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> Even if we did add the locking, it's still possible to use these
> incorrectly, right? For example, say we had two threads like this:
> 
>   Thread {
>     dp.setExternalVariable("foo", 1)
>     dp.parse(foo1)
>   }
> 
>   Thread {
>     dp.setExternalVariable("foo", 2)
>     dp.parse(foo2)
>   }
> 
> The order of evaluation could be:
> 
>   dp.setExternalVariable("foo", 1)
>   dp.setExternalVariable("foo", 2)
>   dp.parse(foo1)
>   dp.parse(foo2)
> 
> So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
> use of setExternalVariable within threads can cause unexpected behavior.
> Locking needs to happen outside of Daffodil's control to make it work as
> expected. And if someone adds their own locking around everything, then
> our internal locking isn't necessary.
> 
> It feels to me like setExternalVariable is just fundamentally broken,
> and so trying to fix it is just going to add complication that still
> doesn't solve the fundamental problems.
> 
> 
> On 3/25/20 3:50 PM, Beckerle, Mike wrote:
>> It is possible to modify in a way that is 100% compatible with existing "correct" practice, which is to lock and unlock the object. Lock the state when any parse/unparse call is in flight, unlock when none are in flight.
>>
>> setters can then block waiting for the lock to be freed.
>>
>> That's guaranteed to be compatible with any correct current practice.
>>
>> I just hate to put all that complexity in there.
>> ________________________________
>> From: Steve Lawrence <st...@gmail.com>
>> Sent: Wednesday, March 25, 2020 3:44 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> My one minor concern about the locking after parse/unparse is that it
>> changes behavior, and could potentially break things (I assume use after
>> lock is an assertion of some kind?).
>>
>> It's definitely possible to use the DataProcessor in a safe manner while
>> mutating the state currently if someone is careful, so to not allow
>> mutating it after calling parse/unparse could potentially break someones
>> existing use of Daffodil.
>>
>> What are your thoughts on @deprecating those functions, the deprecation
>> message can be something like "This is not thread-safe, be very careful
>> if you do not use it". And rather than locking and asserting, we could
>> just create a warning and say "You're using this in a potentially
>> dangerous manner, you should use the new API"?
>>
>> - Steve
>>
>> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>>> A fully functional API is probably good to have, and I'd like to define that also and cut over to using that within Daffodil.
>>>
>>> We would need a dataProcessor.reset() method that creates a new dp which has had all external vars, tunables, basically any of those formerly settable things, forgotten so you can start clean.
>>>
>>> (Would clean() be a better name than reset() ?)
>>>
>>> This reset() subsumes my clone() method idea.
>>>
>>> I want to deprecate (literally with "@deprecated" annotations) the setters generally, but my compromise is once you parse/unparse the object becomes immutable permanently. So the setters are just a different way to parameterize the object.
>>>
>>> The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL with them.  That's ok for now. I can retrofit that later. It will still work, just get deprecation warnings.
>>>
>>> ________________________________
>>> From: Steve Lawrence <st...@gmail.com>
>>> Sent: Wednesday, March 25, 2020 12:05 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>
>>> This feels like the right approach. Thoughts on instead of having to
>>> manually clone() and having some sort of locking mechanism when parse is
>>> called, we just make a DataProcessor completely immutable, and setting a
>>> variable will implicitly create a copy with modified state, something like:
>>>
>>> val dp = dataProcessor
>>>            .withTunable("tunableFoo", "1")
>>>            .withVariable("variableFoo", 2")
>>>            .withVariable("variableBar", 2")
>>>
>>> It's not as java-y, but feels safer. I can imagine a case where someone
>>> sets a bunch of state, calls parse and locks things, and then way down
>>> the line they get a "data processor is locked" error because they tried
>>> to change some state.
>>>
>>>
>>> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>>>> Here's my concrete proposal to fix DataProcesor's API.
>>>>
>>>> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>>>
>>>> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
>>>>
>>>> You set state.
>>>> You can call parse/unparse on various threads
>>>>
>>>> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>>>
>>>> Then you set state
>>>> You can call parse/unparse on various threads.
>>>>
>>>> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
>>>>
>>>> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
>>>>
>>>> ________________________________
>>>> From: Beckerle, Mike <mb...@tresys.com>
>>>> Sent: Wednesday, March 25, 2020 11:29 AM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>>
>>>> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
>>>>
>>>> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
>>>>
>>>> But this is just asking for trouble IMHO.
>>>>
>>>> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
>>>>
>>>> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
>>>>
>>>> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
>>>>
>>>> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
>>>>
>>>> ________________________________
>>>> From: Beckerle, Mike <mb...@tresys.com>
>>>> Sent: Wednesday, March 25, 2020 10:55 AM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>>>
>>>> Why does the API for Daffodil have
>>>>
>>>> Compiler.setExternalDFDLVariable(...)
>>>> and
>>>> Compiler.setExternalDFDLVariables(...)
>>>>
>>>> on it.
>>>>
>>>> I believe we should deprecate this.
>>>>
>>>> Compilers are parameterized by some of the tunables I understand.
>>>>
>>>> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
>>>>
>>>> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
>>>>
>>>> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>>>
>>>> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
>>>>
>>>> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
>>>>
>>>> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
>>>>
>>>> so on main thread.....
>>>>      dp.setExternalVariables(...bindings 1...)
>>>>      spawn the thread 1
>>>> on the thread 1
>>>>      dp.parse(....)
>>>> back on main thread
>>>>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>>>>      dp.setExternalVariables(...bindings 2....)
>>>>     .....
>>>>
>>>> However, if we make the external variable bindings an argument to parse, we avoid all of this.
>>>>
>>>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
>>>>
>>>>
>>>> Comments?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mike Beckerle | Principal Engineer
>>>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>>>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>>>> P +1-781-330-0412
>>>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>>>
>>>
>>>
>>
>>
> 
> 


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by "Beckerle, Mike" <mb...@tresys.com>.
You definitely cannot share a dp across threads this way.

I propose just deprecating the setExternalVariable API (along with the other setters) and that pattern generally, and a new more functional style API  for establishing tunables, external variables, and validation modes, that will keep us out of trouble.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, March 26, 2020 7:39 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

Even if we did add the locking, it's still possible to use these
incorrectly, right? For example, say we had two threads like this:

  Thread {
    dp.setExternalVariable("foo", 1)
    dp.parse(foo1)
  }

  Thread {
    dp.setExternalVariable("foo", 2)
    dp.parse(foo2)
  }

The order of evaluation could be:

  dp.setExternalVariable("foo", 1)
  dp.setExternalVariable("foo", 2)
  dp.parse(foo1)
  dp.parse(foo2)

So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
use of setExternalVariable within threads can cause unexpected behavior.
Locking needs to happen outside of Daffodil's control to make it work as
expected. And if someone adds their own locking around everything, then
our internal locking isn't necessary.

It feels to me like setExternalVariable is just fundamentally broken,
and so trying to fix it is just going to add complication that still
doesn't solve the fundamental problems.


On 3/25/20 3:50 PM, Beckerle, Mike wrote:
> It is possible to modify in a way that is 100% compatible with existing "correct" practice, which is to lock and unlock the object. Lock the state when any parse/unparse call is in flight, unlock when none are in flight.
>
> setters can then block waiting for the lock to be freed.
>
> That's guaranteed to be compatible with any correct current practice.
>
> I just hate to put all that complexity in there.
> ________________________________
> From: Steve Lawrence <st...@gmail.com>
> Sent: Wednesday, March 25, 2020 3:44 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> My one minor concern about the locking after parse/unparse is that it
> changes behavior, and could potentially break things (I assume use after
> lock is an assertion of some kind?).
>
> It's definitely possible to use the DataProcessor in a safe manner while
> mutating the state currently if someone is careful, so to not allow
> mutating it after calling parse/unparse could potentially break someones
> existing use of Daffodil.
>
> What are your thoughts on @deprecating those functions, the deprecation
> message can be something like "This is not thread-safe, be very careful
> if you do not use it". And rather than locking and asserting, we could
> just create a warning and say "You're using this in a potentially
> dangerous manner, you should use the new API"?
>
> - Steve
>
> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>> A fully functional API is probably good to have, and I'd like to define that also and cut over to using that within Daffodil.
>>
>> We would need a dataProcessor.reset() method that creates a new dp which has had all external vars, tunables, basically any of those formerly settable things, forgotten so you can start clean.
>>
>> (Would clean() be a better name than reset() ?)
>>
>> This reset() subsumes my clone() method idea.
>>
>> I want to deprecate (literally with "@deprecated" annotations) the setters generally, but my compromise is once you parse/unparse the object becomes immutable permanently. So the setters are just a different way to parameterize the object.
>>
>> The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL with them.  That's ok for now. I can retrofit that later. It will still work, just get deprecation warnings.
>>
>> ________________________________
>> From: Steve Lawrence <st...@gmail.com>
>> Sent: Wednesday, March 25, 2020 12:05 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> This feels like the right approach. Thoughts on instead of having to
>> manually clone() and having some sort of locking mechanism when parse is
>> called, we just make a DataProcessor completely immutable, and setting a
>> variable will implicitly create a copy with modified state, something like:
>>
>> val dp = dataProcessor
>>            .withTunable("tunableFoo", "1")
>>            .withVariable("variableFoo", 2")
>>            .withVariable("variableBar", 2")
>>
>> It's not as java-y, but feels safer. I can imagine a case where someone
>> sets a bunch of state, calls parse and locks things, and then way down
>> the line they get a "data processor is locked" error because they tried
>> to change some state.
>>
>>
>> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>>> Here's my concrete proposal to fix DataProcesor's API.
>>>
>>> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>>
>>> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
>>>
>>> You set state.
>>> You can call parse/unparse on various threads
>>>
>>> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>>
>>> Then you set state
>>> You can call parse/unparse on various threads.
>>>
>>> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
>>>
>>> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
>>>
>>> ________________________________
>>> From: Beckerle, Mike <mb...@tresys.com>
>>> Sent: Wednesday, March 25, 2020 11:29 AM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>
>>> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
>>>
>>> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
>>>
>>> But this is just asking for trouble IMHO.
>>>
>>> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
>>>
>>> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
>>>
>>> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
>>>
>>> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
>>>
>>> ________________________________
>>> From: Beckerle, Mike <mb...@tresys.com>
>>> Sent: Wednesday, March 25, 2020 10:55 AM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>>
>>> Why does the API for Daffodil have
>>>
>>> Compiler.setExternalDFDLVariable(...)
>>> and
>>> Compiler.setExternalDFDLVariables(...)
>>>
>>> on it.
>>>
>>> I believe we should deprecate this.
>>>
>>> Compilers are parameterized by some of the tunables I understand.
>>>
>>> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
>>>
>>> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
>>>
>>> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>>
>>> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
>>>
>>> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
>>>
>>> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
>>>
>>> so on main thread.....
>>>      dp.setExternalVariables(...bindings 1...)
>>>      spawn the thread 1
>>> on the thread 1
>>>      dp.parse(....)
>>> back on main thread
>>>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>>>      dp.setExternalVariables(...bindings 2....)
>>>     .....
>>>
>>> However, if we make the external variable bindings an argument to parse, we avoid all of this.
>>>
>>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
>>>
>>>
>>> Comments?
>>>
>>>
>>>
>>>
>>>
>>> Mike Beckerle | Principal Engineer
>>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>>> P +1-781-330-0412
>>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>>
>>
>>
>
>


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by Steve Lawrence <sl...@apache.org>.
Even if we did add the locking, it's still possible to use these
incorrectly, right? For example, say we had two threads like this:

  Thread {
    dp.setExternalVariable("foo", 1)
    dp.parse(foo1)
  }

  Thread {
    dp.setExternalVariable("foo", 2)
    dp.parse(foo2)
  }

The order of evaluation could be:

  dp.setExternalVariable("foo", 1)
  dp.setExternalVariable("foo", 2)
  dp.parse(foo1)
  dp.parse(foo2)

So foo1 is parsed with foo=2. Even if we add locking to Daffodil, the
use of setExternalVariable within threads can cause unexpected behavior.
Locking needs to happen outside of Daffodil's control to make it work as
expected. And if someone adds their own locking around everything, then
our internal locking isn't necessary.

It feels to me like setExternalVariable is just fundamentally broken,
and so trying to fix it is just going to add complication that still
doesn't solve the fundamental problems.


On 3/25/20 3:50 PM, Beckerle, Mike wrote:
> It is possible to modify in a way that is 100% compatible with existing "correct" practice, which is to lock and unlock the object. Lock the state when any parse/unparse call is in flight, unlock when none are in flight.
> 
> setters can then block waiting for the lock to be freed.
> 
> That's guaranteed to be compatible with any correct current practice.
> 
> I just hate to put all that complexity in there.
> ________________________________
> From: Steve Lawrence <st...@gmail.com>
> Sent: Wednesday, March 25, 2020 3:44 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> My one minor concern about the locking after parse/unparse is that it
> changes behavior, and could potentially break things (I assume use after
> lock is an assertion of some kind?).
> 
> It's definitely possible to use the DataProcessor in a safe manner while
> mutating the state currently if someone is careful, so to not allow
> mutating it after calling parse/unparse could potentially break someones
> existing use of Daffodil.
> 
> What are your thoughts on @deprecating those functions, the deprecation
> message can be something like "This is not thread-safe, be very careful
> if you do not use it". And rather than locking and asserting, we could
> just create a warning and say "You're using this in a potentially
> dangerous manner, you should use the new API"?
> 
> - Steve
> 
> On 3/25/20 3:35 PM, Beckerle, Mike wrote:
>> A fully functional API is probably good to have, and I'd like to define that also and cut over to using that within Daffodil.
>>
>> We would need a dataProcessor.reset() method that creates a new dp which has had all external vars, tunables, basically any of those formerly settable things, forgotten so you can start clean.
>>
>> (Would clean() be a better name than reset() ?)
>>
>> This reset() subsumes my clone() method idea.
>>
>> I want to deprecate (literally with "@deprecated" annotations) the setters generally, but my compromise is once you parse/unparse the object becomes immutable permanently. So the setters are just a different way to parameterize the object.
>>
>> The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL with them.  That's ok for now. I can retrofit that later. It will still work, just get deprecation warnings.
>>
>> ________________________________
>> From: Steve Lawrence <st...@gmail.com>
>> Sent: Wednesday, March 25, 2020 12:05 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> This feels like the right approach. Thoughts on instead of having to
>> manually clone() and having some sort of locking mechanism when parse is
>> called, we just make a DataProcessor completely immutable, and setting a
>> variable will implicitly create a copy with modified state, something like:
>>
>> val dp = dataProcessor
>>            .withTunable("tunableFoo", "1")
>>            .withVariable("variableFoo", 2")
>>            .withVariable("variableBar", 2")
>>
>> It's not as java-y, but feels safer. I can imagine a case where someone
>> sets a bunch of state, calls parse and locks things, and then way down
>> the line they get a "data processor is locked" error because they tried
>> to change some state.
>>
>>
>> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>>> Here's my concrete proposal to fix DataProcesor's API.
>>>
>>> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>>
>>> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
>>>
>>> You set state.
>>> You can call parse/unparse on various threads
>>>
>>> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>>
>>> Then you set state
>>> You can call parse/unparse on various threads.
>>>
>>> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
>>>
>>> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
>>>
>>> ________________________________
>>> From: Beckerle, Mike <mb...@tresys.com>
>>> Sent: Wednesday, March 25, 2020 11:29 AM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>>
>>> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
>>>
>>> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
>>>
>>> But this is just asking for trouble IMHO.
>>>
>>> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
>>>
>>> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
>>>
>>> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
>>>
>>> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
>>>
>>> ________________________________
>>> From: Beckerle, Mike <mb...@tresys.com>
>>> Sent: Wednesday, March 25, 2020 10:55 AM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>>
>>> Why does the API for Daffodil have
>>>
>>> Compiler.setExternalDFDLVariable(...)
>>> and
>>> Compiler.setExternalDFDLVariables(...)
>>>
>>> on it.
>>>
>>> I believe we should deprecate this.
>>>
>>> Compilers are parameterized by some of the tunables I understand.
>>>
>>> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
>>>
>>> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
>>>
>>> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>>
>>> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
>>>
>>> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
>>>
>>> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
>>>
>>> so on main thread.....
>>>      dp.setExternalVariables(...bindings 1...)
>>>      spawn the thread 1
>>> on the thread 1
>>>      dp.parse(....)
>>> back on main thread
>>>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>>>      dp.setExternalVariables(...bindings 2....)
>>>     .....
>>>
>>> However, if we make the external variable bindings an argument to parse, we avoid all of this.
>>>
>>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
>>>
>>>
>>> Comments?
>>>
>>>
>>>
>>>
>>>
>>> Mike Beckerle | Principal Engineer
>>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>>> P +1-781-330-0412
>>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>>
>>
>>
> 
> 


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by "Beckerle, Mike" <mb...@tresys.com>.
It is possible to modify in a way that is 100% compatible with existing "correct" practice, which is to lock and unlock the object. Lock the state when any parse/unparse call is in flight, unlock when none are in flight.

setters can then block waiting for the lock to be freed.

That's guaranteed to be compatible with any correct current practice.

I just hate to put all that complexity in there.
________________________________
From: Steve Lawrence <st...@gmail.com>
Sent: Wednesday, March 25, 2020 3:44 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

My one minor concern about the locking after parse/unparse is that it
changes behavior, and could potentially break things (I assume use after
lock is an assertion of some kind?).

It's definitely possible to use the DataProcessor in a safe manner while
mutating the state currently if someone is careful, so to not allow
mutating it after calling parse/unparse could potentially break someones
existing use of Daffodil.

What are your thoughts on @deprecating those functions, the deprecation
message can be something like "This is not thread-safe, be very careful
if you do not use it". And rather than locking and asserting, we could
just create a warning and say "You're using this in a potentially
dangerous manner, you should use the new API"?

- Steve

On 3/25/20 3:35 PM, Beckerle, Mike wrote:
> A fully functional API is probably good to have, and I'd like to define that also and cut over to using that within Daffodil.
>
> We would need a dataProcessor.reset() method that creates a new dp which has had all external vars, tunables, basically any of those formerly settable things, forgotten so you can start clean.
>
> (Would clean() be a better name than reset() ?)
>
> This reset() subsumes my clone() method idea.
>
> I want to deprecate (literally with "@deprecated" annotations) the setters generally, but my compromise is once you parse/unparse the object becomes immutable permanently. So the setters are just a different way to parameterize the object.
>
> The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL with them.  That's ok for now. I can retrofit that later. It will still work, just get deprecation warnings.
>
> ________________________________
> From: Steve Lawrence <st...@gmail.com>
> Sent: Wednesday, March 25, 2020 12:05 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> This feels like the right approach. Thoughts on instead of having to
> manually clone() and having some sort of locking mechanism when parse is
> called, we just make a DataProcessor completely immutable, and setting a
> variable will implicitly create a copy with modified state, something like:
>
> val dp = dataProcessor
>            .withTunable("tunableFoo", "1")
>            .withVariable("variableFoo", 2")
>            .withVariable("variableBar", 2")
>
> It's not as java-y, but feels safer. I can imagine a case where someone
> sets a bunch of state, calls parse and locks things, and then way down
> the line they get a "data processor is locked" error because they tried
> to change some state.
>
>
> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>> Here's my concrete proposal to fix DataProcesor's API.
>>
>> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>
>> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
>>
>> You set state.
>> You can call parse/unparse on various threads
>>
>> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>
>> Then you set state
>> You can call parse/unparse on various threads.
>>
>> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
>>
>> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
>>
>> ________________________________
>> From: Beckerle, Mike <mb...@tresys.com>
>> Sent: Wednesday, March 25, 2020 11:29 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
>>
>> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
>>
>> But this is just asking for trouble IMHO.
>>
>> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
>>
>> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
>>
>> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
>>
>> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
>>
>> ________________________________
>> From: Beckerle, Mike <mb...@tresys.com>
>> Sent: Wednesday, March 25, 2020 10:55 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> Why does the API for Daffodil have
>>
>> Compiler.setExternalDFDLVariable(...)
>> and
>> Compiler.setExternalDFDLVariables(...)
>>
>> on it.
>>
>> I believe we should deprecate this.
>>
>> Compilers are parameterized by some of the tunables I understand.
>>
>> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
>>
>> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
>>
>> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>
>> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
>>
>> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
>>
>> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
>>
>> so on main thread.....
>>      dp.setExternalVariables(...bindings 1...)
>>      spawn the thread 1
>> on the thread 1
>>      dp.parse(....)
>> back on main thread
>>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>>      dp.setExternalVariables(...bindings 2....)
>>     .....
>>
>> However, if we make the external variable bindings an argument to parse, we avoid all of this.
>>
>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
>>
>>
>> Comments?
>>
>>
>>
>>
>>
>> Mike Beckerle | Principal Engineer
>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>> P +1-781-330-0412
>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>
>
>


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by Steve Lawrence <st...@gmail.com>.
My one minor concern about the locking after parse/unparse is that it
changes behavior, and could potentially break things (I assume use after
lock is an assertion of some kind?).

It's definitely possible to use the DataProcessor in a safe manner while
mutating the state currently if someone is careful, so to not allow
mutating it after calling parse/unparse could potentially break someones
existing use of Daffodil.

What are your thoughts on @deprecating those functions, the deprecation
message can be something like "This is not thread-safe, be very careful
if you do not use it". And rather than locking and asserting, we could
just create a warning and say "You're using this in a potentially
dangerous manner, you should use the new API"?

- Steve

On 3/25/20 3:35 PM, Beckerle, Mike wrote:
> A fully functional API is probably good to have, and I'd like to define that also and cut over to using that within Daffodil.
> 
> We would need a dataProcessor.reset() method that creates a new dp which has had all external vars, tunables, basically any of those formerly settable things, forgotten so you can start clean.
> 
> (Would clean() be a better name than reset() ?)
> 
> This reset() subsumes my clone() method idea.
> 
> I want to deprecate (literally with "@deprecated" annotations) the setters generally, but my compromise is once you parse/unparse the object becomes immutable permanently. So the setters are just a different way to parameterize the object.
> 
> The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL with them.  That's ok for now. I can retrofit that later. It will still work, just get deprecation warnings.
> 
> ________________________________
> From: Steve Lawrence <st...@gmail.com>
> Sent: Wednesday, March 25, 2020 12:05 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> This feels like the right approach. Thoughts on instead of having to
> manually clone() and having some sort of locking mechanism when parse is
> called, we just make a DataProcessor completely immutable, and setting a
> variable will implicitly create a copy with modified state, something like:
> 
> val dp = dataProcessor
>            .withTunable("tunableFoo", "1")
>            .withVariable("variableFoo", 2")
>            .withVariable("variableBar", 2")
> 
> It's not as java-y, but feels safer. I can imagine a case where someone
> sets a bunch of state, calls parse and locks things, and then way down
> the line they get a "data processor is locked" error because they tried
> to change some state.
> 
> 
> On 3/25/20 11:50 AM, Beckerle, Mike wrote:
>> Here's my concrete proposal to fix DataProcesor's API.
>>
>> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>
>> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
>>
>> You set state.
>> You can call parse/unparse on various threads
>>
>> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>>
>> Then you set state
>> You can call parse/unparse on various threads.
>>
>> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
>>
>> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
>>
>> ________________________________
>> From: Beckerle, Mike <mb...@tresys.com>
>> Sent: Wednesday, March 25, 2020 11:29 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
>>
>> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
>>
>> But this is just asking for trouble IMHO.
>>
>> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
>>
>> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
>>
>> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
>>
>> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
>>
>> ________________________________
>> From: Beckerle, Mike <mb...@tresys.com>
>> Sent: Wednesday, March 25, 2020 10:55 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>>
>> Why does the API for Daffodil have
>>
>> Compiler.setExternalDFDLVariable(...)
>> and
>> Compiler.setExternalDFDLVariables(...)
>>
>> on it.
>>
>> I believe we should deprecate this.
>>
>> Compilers are parameterized by some of the tunables I understand.
>>
>> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
>>
>> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
>>
>> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
>>
>> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
>>
>> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
>>
>> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
>>
>> so on main thread.....
>>      dp.setExternalVariables(...bindings 1...)
>>      spawn the thread 1
>> on the thread 1
>>      dp.parse(....)
>> back on main thread
>>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>>      dp.setExternalVariables(...bindings 2....)
>>     .....
>>
>> However, if we make the external variable bindings an argument to parse, we avoid all of this.
>>
>> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
>>
>>
>> Comments?
>>
>>
>>
>>
>>
>> Mike Beckerle | Principal Engineer
>> [Owl Cyber Defense]<http://owlcyberdefense.com>
>> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
>> P +1-781-330-0412
>> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>>
> 
> 


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by "Beckerle, Mike" <mb...@tresys.com>.
A fully functional API is probably good to have, and I'd like to define that also and cut over to using that within Daffodil.

We would need a dataProcessor.reset() method that creates a new dp which has had all external vars, tunables, basically any of those formerly settable things, forgotten so you can start clean.

(Would clean() be a better name than reset() ?)

This reset() subsumes my clone() method idea.

I want to deprecate (literally with "@deprecated" annotations) the setters generally, but my compromise is once you parse/unparse the object becomes immutable permanently. So the setters are just a different way to parameterize the object.

The other thing that is impacted by this change it the OpenDFDL/ibmCrossTester that implements these same APIs but drives IBM DFDL with them.  That's ok for now. I can retrofit that later. It will still work, just get deprecation warnings.

________________________________
From: Steve Lawrence <st...@gmail.com>
Sent: Wednesday, March 25, 2020 12:05 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

This feels like the right approach. Thoughts on instead of having to
manually clone() and having some sort of locking mechanism when parse is
called, we just make a DataProcessor completely immutable, and setting a
variable will implicitly create a copy with modified state, something like:

val dp = dataProcessor
           .withTunable("tunableFoo", "1")
           .withVariable("variableFoo", 2")
           .withVariable("variableBar", 2")

It's not as java-y, but feels safer. I can imagine a case where someone
sets a bunch of state, calls parse and locks things, and then way down
the line they get a "data processor is locked" error because they tried
to change some state.


On 3/25/20 11:50 AM, Beckerle, Mike wrote:
> Here's my concrete proposal to fix DataProcesor's API.
>
> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>
> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
>
> You set state.
> You can call parse/unparse on various threads
>
> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
>
> Then you set state
> You can call parse/unparse on various threads.
>
> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
>
> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
>
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Wednesday, March 25, 2020 11:29 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
>
> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
>
> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
>
> But this is just asking for trouble IMHO.
>
> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
>
> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
>
> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
>
> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
>
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Wednesday, March 25, 2020 10:55 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
>
> Why does the API for Daffodil have
>
> Compiler.setExternalDFDLVariable(...)
> and
> Compiler.setExternalDFDLVariables(...)
>
> on it.
>
> I believe we should deprecate this.
>
> Compilers are parameterized by some of the tunables I understand.
>
> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
>
> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
>
> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
>
> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
>
> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
>
> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
>
> so on main thread.....
>      dp.setExternalVariables(...bindings 1...)
>      spawn the thread 1
> on the thread 1
>      dp.parse(....)
> back on main thread
>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>      dp.setExternalVariables(...bindings 2....)
>     .....
>
> However, if we make the external variable bindings an argument to parse, we avoid all of this.
>
> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
>
>
> Comments?
>
>
>
>
>
> Mike Beckerle | Principal Engineer
> [Owl Cyber Defense]<http://owlcyberdefense.com>
> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
> P +1-781-330-0412
> W owlcyberdefense.com<http://www.owlcyberdefense.com>
>


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by Steve Lawrence <st...@gmail.com>.
This feels like the right approach. Thoughts on instead of having to
manually clone() and having some sort of locking mechanism when parse is
called, we just make a DataProcessor completely immutable, and setting a
variable will implicitly create a copy with modified state, something like:

val dp = dataProcessor
           .withTunable("tunableFoo", "1")
           .withVariable("variableFoo", 2")
           .withVariable("variableBar", 2")

It's not as java-y, but feels safer. I can imagine a case where someone
sets a bunch of state, calls parse and locks things, and then way down
the line they get a "data processor is locked" error because they tried
to change some state.


On 3/25/20 11:50 AM, Beckerle, Mike wrote:
> Here's my concrete proposal to fix DataProcesor's API.
> 
> We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
> 
> So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().
> 
> You set state.
> You can call parse/unparse on various threads
> 
> If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.
> 
> Then you set state
> You can call parse/unparse on various threads.
> 
> I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.
> 
> This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.
> 
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Wednesday, March 25, 2020 11:29 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.
> 
> I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.
> 
> But this is just asking for trouble IMHO.
> 
> I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.
> 
> The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.
> 
> Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.
> 
> Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.
> 
> ________________________________
> From: Beckerle, Mike <mb...@tresys.com>
> Sent: Wednesday, March 25, 2020 10:55 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Compiler.setExternalDFDLVariable(s) considered challenged
> 
> Why does the API for Daffodil have
> 
> Compiler.setExternalDFDLVariable(...)
> and
> Compiler.setExternalDFDLVariables(...)
> 
> on it.
> 
> I believe we should deprecate this.
> 
> Compilers are parameterized by some of the tunables I understand.
> 
> But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.
> 
> But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.
> 
> Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)
> 
> I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.
> 
> The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.
> 
> Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:
> 
> so on main thread.....
>      dp.setExternalVariables(...bindings 1...)
>      spawn the thread 1
> on the thread 1
>      dp.parse(....)
> back on main thread
>      dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
>      dp.setExternalVariables(...bindings 2....)
>     .....
> 
> However, if we make the external variable bindings an argument to parse, we avoid all of this.
> 
> Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.
> 
> 
> Comments?
> 
> 
> 
> 
> 
> Mike Beckerle | Principal Engineer
> [Owl Cyber Defense]<http://owlcyberdefense.com>
> [cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
> P +1-781-330-0412
> W owlcyberdefense.com<http://www.owlcyberdefense.com>
> 


Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by "Beckerle, Mike" <mb...@tresys.com>.
Here's my concrete proposal to fix DataProcesor's API.

We add a clone() method. It copies the object sharing the big/expensive stuff like the compiled schema but gives you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.

So you get a DataProcessor either from a ProcessorFactory, or from Compiler.reload().

You set state.
You can call parse/unparse on various threads

If you want to change state settings for another parse call, you must call clone() to get another instance of the DataProcessor. This will share the expensive/big stuff like the compiled schema, but give you a separate copy of all the state that is settable via setValidationMode, setExternalDFDLVariable, and setTunable.

Then you set state
You can call parse/unparse on various threads.

I hate to get into locking and such, but once you call parse/unparse, that should lock the object state, and any further setter calls should fail/throw. You should have to clone it to "change" the settings.

This makes the use of the setters a java-ish way of achieving the same reentrancy one would get if there were no setters and simply more arguments to the parse/unparse calls.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Wednesday, March 25, 2020 11:29 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Compiler.setExternalDFDLVariable(s) considered challenged

To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.

I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.

But this is just asking for trouble IMHO.

I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.

The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.

Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.

Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Wednesday, March 25, 2020 10:55 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Compiler.setExternalDFDLVariable(s) considered challenged

Why does the API for Daffodil have

Compiler.setExternalDFDLVariable(...)
and
Compiler.setExternalDFDLVariables(...)

on it.

I believe we should deprecate this.

Compilers are parameterized by some of the tunables I understand.

But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.

But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.

Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)

I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.

The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.

Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:

so on main thread.....
     dp.setExternalVariables(...bindings 1...)
     spawn the thread 1
on the thread 1
     dp.parse(....)
back on main thread
     dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
     dp.setExternalVariables(...bindings 2....)
    .....

However, if we make the external variable bindings an argument to parse, we avoid all of this.

Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.


Comments?





Mike Beckerle | Principal Engineer
[Owl Cyber Defense]<http://owlcyberdefense.com>
[cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
P +1-781-330-0412
W owlcyberdefense.com<http://www.owlcyberdefense.com>

Re: Compiler.setExternalDFDLVariable(s) considered challenged

Posted by "Beckerle, Mike" <mb...@tresys.com>.
To respond to my own thread here, I think given that we allow multiple simultaneous calls to parse/unparse from different threads, we must say that the DataProcessor object is immutable once parse or unparse are called.

I suppose we could say that it is mutable, but behavior is undetermined if any parse or unparse calls are active on any thread.

But this is just asking for trouble IMHO.

I think we started out with a stateful, non-thread capable API. The idea is that one thread would be invoking a data processor at a time. A data procesor was the state-block of an execution.

The need to share compiled processor reloads, because the compile schemas were expensive to create, tempted us to allow multiple parse/unparse calls on different threads.

Fact is, I think we should have said no to this, provided a DataProcessor.clone() to create instances that shared the reloaded compiled schema binary, but otherwise had separate state, and said that parse/unparse were synchronized methods on their DP instance.

Instead we're in a 1/2 way world where we don't have a thread-reasonable API due to mutable state in what turns out to be a cross-thread shared object.

________________________________
From: Beckerle, Mike <mb...@tresys.com>
Sent: Wednesday, March 25, 2020 10:55 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Compiler.setExternalDFDLVariable(s) considered challenged

Why does the API for Daffodil have

Compiler.setExternalDFDLVariable(...)
and
Compiler.setExternalDFDLVariables(...)

on it.

I believe we should deprecate this.

Compilers are parameterized by some of the tunables I understand.

But the external DFDL variables? These cannot affect compilation. The schema compiler needs to know statically the information about variables found in the schema itself in the dfdl:defineVariable statement annotations.

But the compiler doesn't need external variable bindings. In fact if it did know and use them, it would be building assumptions into the compiled schema that it shouldn't be building in.

Setting external var bindings on the Compiler just causes problems when that compiler instance is reused in a context where those settings aren't appropriate. (JIRA DAFFODIL-2302 is one such problem)

I believe setExternalDFDLVariable(s) methods should be deprecated, and external variables bindings should be an optional argument to the parse/unparse methods.

The setters cause thread safety issues because the DP is stateful, even though we want multiple calls to parse/unparse to be executable on different threads.

Consider: if we allow ordinary setExternalDFDLVariables and add a resetExternalDFDLVariables to clear them, then imagine one wants to make two parse calls on separate threads with different external variables bindings:

so on main thread.....
     dp.setExternalVariables(...bindings 1...)
     spawn the thread 1
on the thread 1
     dp.parse(....)
back on main thread
     dp.resetExternalVariables() // race condition. Did the parse call read the external variables before this reset or not?
     dp.setExternalVariables(...bindings 2....)
    .....

However, if we make the external variable bindings an argument to parse, we avoid all of this.

Alternatively, since DataProcessor has setExternalDFDLVariable, we can prohibit multiple calls on the same DataProcessor object simultaneously. We can provide a clone() method that preserves the loaded/reloaded processor, but constructs another DataProcessor object, thereby allowing separate external variables state per DataProcessor instance.


Comments?





Mike Beckerle | Principal Engineer
[Owl Cyber Defense]<http://owlcyberdefense.com>
[cid:2a423dec-0558-414e-b369-14a24becc40f] is now a part of Owl<https://owlcyberdefense.com/news/owl-cyber-defense-tresys-technology-announce-merger/>
P +1-781-330-0412
W owlcyberdefense.com<http://www.owlcyberdefense.com>