You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Mario Juric <mj...@unsilo.ai> on 2017/10/29 16:49:03 UTC

Erratic block variable behaviour in Ruta

Hi Peter,

We encountered a problem with a Ruta rule behaving erratically in a multithreaded environment. We isolated the problem to the following rule shown in pseudo form:
BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
    BOOLEAN ignore = false;
    EnclosedAnnotation.property==“something else"{FEATURE("value", “ignorable") -> ASSIGN(ignore, true)};
    EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> CREATE(AnotherAnnotation, “name" = “World")};
}
We identified about 1000 documents where “AnotherAnnotation” above should be created, and we reprocessed them several times on EC2 using Oracle JDK build 1.8.0_151
with both Ruta 2.5 and UIMA 2.9 as well as Ruta 2.6.1 and UIMA 2.10.1. The number of inconsistencies in rule firing over many runs of the 1K appears erratic between approximately 16% down to approximately 0,5%, but there was always inconsistencies in every run. Removing the ignore condition made of course the issue disappear entirely, e.g.

BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
    EnclosingAnnotation.name==“Hello"{ -> CREATE(AnotherAnnotation, “name" = “World")};
}
We haven’t experienced the issue in a single threaded environment yet, but we are not entirely sure whether it is related to multithreading, although the nature of the problem could point in the direction of some thread-safety issues around shared data inside Ruta, but that is just guessing. However, the workaround in our case was too rewrite the rule as follows:
BOOLEAN ignore = false;
BLOCK(ForEach) EnclosingAnnotation.property==“something" {-> ASSIGN(ignore, false)} {
    EnclosedAnnotation.property==“something else"{FEATURE("value", “ignorable") -> ASSIGN(ignore, true)};
    EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> CREATE(AnotherAnnotation, “name" = “World")};
}
I assume the BLOCK(ForEach) action happen for every occurrence, but I haven’t actually verified that yet since there is usually only one occurrence in this particular case, but I was hoping you might be able to shed some light on this, and the problems we experienced with the variable declaration inside the block.

Thanks
Mario













Re: Erratic block variable behaviour in Ruta

Posted by Mario Juric <mj...@unsilo.ai>.
Hi Peter,

Thanks for the explanation, and no problem with the delayed respone. I’ll let you know about our outcome of the change as soon as possible, but I have the feeling that your suggestion will probably work as expected.

Best
Mario








> On 6 Nov 2017, at 17:01 , Peter Klügl <pe...@averbis.com> wrote:
> 
> Hi Mario,
> 
> 
> sorry for the delayed response... I was travelling.
> 
> 
> First of all, there should be no multithreading issues in ruta (in
> normal usage), at least, I am quite confident about that.
> 
> 
> My first guess would be that the problem is caused by the nature of
> variables and their initialization in ruta.
> 
> The initialization of variables with values (e.g., BOOLEAN ignore =
> false;) does not reset its actual value during a loop like BLOCK as the
> variables are declared only once and because they are always global. The
> value only defines the initial value of the variable to which it is
> reset when the complete environment is reset (e.g., different CAS). The
> declaration is actually ignored in the execution of the block.
> 
> So, you need to reset the value to false for each iteration in BLOCK. I
> wonder if your solution with the ASSIGN in the head rule of the block
> will work. The rule is applied in order to get a list of annotations
> (windows for the block), and so the action is already applied before the
> actual iteration starts.
> 
> Could you try something like that:
> 
> 
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
>     BOOLEAN ignore = false;
>     ASSIGN(ignore, false);
>     EnclosedAnnotation.property==“something else"{FEATURE("value",
> “ignorable") -> ASSIGN(ignore, true)};
>     EnclosingAnnotation.name==“Hello"{IF(ignore == false) ->
> CREATE(AnotherAnnotation, “name" = “World")};
> }
> 
> 
> 
> Best,
> 
> Peter
> 
> 
> Am 29.10.2017 um 17:49 schrieb Mario Juric:
>> Hi Peter,
>> 
>> We encountered a problem with a Ruta rule behaving erratically in a multithreaded environment. We isolated the problem to the following rule shown in pseudo form:
>> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
>>    BOOLEAN ignore = false;
>>    EnclosedAnnotation.property==“something else"{FEATURE("value", “ignorable") -> ASSIGN(ignore, true)};
>>    EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> CREATE(AnotherAnnotation, “name" = “World")};
>> }
>> We identified about 1000 documents where “AnotherAnnotation” above should be created, and we reprocessed them several times on EC2 using Oracle JDK build 1.8.0_151
>> with both Ruta 2.5 and UIMA 2.9 as well as Ruta 2.6.1 and UIMA 2.10.1. The number of inconsistencies in rule firing over many runs of the 1K appears erratic between approximately 16% down to approximately 0,5%, but there was always inconsistencies in every run. Removing the ignore condition made of course the issue disappear entirely, e.g.
>> 
>> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
>>    EnclosingAnnotation.name==“Hello"{ -> CREATE(AnotherAnnotation, “name" = “World")};
>> }
>> We haven’t experienced the issue in a single threaded environment yet, but we are not entirely sure whether it is related to multithreading, although the nature of the problem could point in the direction of some thread-safety issues around shared data inside Ruta, but that is just guessing. However, the workaround in our case was too rewrite the rule as follows:
>> BOOLEAN ignore = false;
>> BLOCK(ForEach) EnclosingAnnotation.property==“something" {-> ASSIGN(ignore, false)} {
>>    EnclosedAnnotation.property==“something else"{FEATURE("value", “ignorable") -> ASSIGN(ignore, true)};
>>    EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> CREATE(AnotherAnnotation, “name" = “World")};
>> }
>> I assume the BLOCK(ForEach) action happen for every occurrence, but I haven’t actually verified that yet since there is usually only one occurrence in this particular case, but I was hoping you might be able to shed some light on this, and the problems we experienced with the variable declaration inside the block.
>> 
>> Thanks
>> Mario
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 


Re: Erratic block variable behaviour in Ruta

Posted by Peter Klügl <pe...@averbis.com>.
Hi Mario,


sorry for the delayed response... I was travelling.


First of all, there should be no multithreading issues in ruta (in
normal usage), at least, I am quite confident about that.


My first guess would be that the problem is caused by the nature of
variables and their initialization in ruta.

The initialization of variables with values (e.g., BOOLEAN ignore =
false;) does not reset its actual value during a loop like BLOCK as the
variables are declared only once and because they are always global. The
value only defines the initial value of the variable to which it is
reset when the complete environment is reset (e.g., different CAS). The
declaration is actually ignored in the execution of the block.

So, you need to reset the value to false for each iteration in BLOCK. I
wonder if your solution with the ASSIGN in the head rule of the block
will work. The rule is applied in order to get a list of annotations
(windows for the block), and so the action is already applied before the
actual iteration starts.

Could you try something like that:


BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
    BOOLEAN ignore = false;
    ASSIGN(ignore, false);
    EnclosedAnnotation.property==“something else"{FEATURE("value",
“ignorable") -> ASSIGN(ignore, true)};
    EnclosingAnnotation.name==“Hello"{IF(ignore == false) ->
CREATE(AnotherAnnotation, “name" = “World")};
}



Best,

Peter


Am 29.10.2017 um 17:49 schrieb Mario Juric:
> Hi Peter,
>
> We encountered a problem with a Ruta rule behaving erratically in a multithreaded environment. We isolated the problem to the following rule shown in pseudo form:
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
>     BOOLEAN ignore = false;
>     EnclosedAnnotation.property==“something else"{FEATURE("value", “ignorable") -> ASSIGN(ignore, true)};
>     EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> CREATE(AnotherAnnotation, “name" = “World")};
> }
> We identified about 1000 documents where “AnotherAnnotation” above should be created, and we reprocessed them several times on EC2 using Oracle JDK build 1.8.0_151
> with both Ruta 2.5 and UIMA 2.9 as well as Ruta 2.6.1 and UIMA 2.10.1. The number of inconsistencies in rule firing over many runs of the 1K appears erratic between approximately 16% down to approximately 0,5%, but there was always inconsistencies in every run. Removing the ignore condition made of course the issue disappear entirely, e.g.
>
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {} {
>     EnclosingAnnotation.name==“Hello"{ -> CREATE(AnotherAnnotation, “name" = “World")};
> }
> We haven’t experienced the issue in a single threaded environment yet, but we are not entirely sure whether it is related to multithreading, although the nature of the problem could point in the direction of some thread-safety issues around shared data inside Ruta, but that is just guessing. However, the workaround in our case was too rewrite the rule as follows:
> BOOLEAN ignore = false;
> BLOCK(ForEach) EnclosingAnnotation.property==“something" {-> ASSIGN(ignore, false)} {
>     EnclosedAnnotation.property==“something else"{FEATURE("value", “ignorable") -> ASSIGN(ignore, true)};
>     EnclosingAnnotation.name==“Hello"{IF(ignore == false) -> CREATE(AnotherAnnotation, “name" = “World")};
> }
> I assume the BLOCK(ForEach) action happen for every occurrence, but I haven’t actually verified that yet since there is usually only one occurrence in this particular case, but I was hoping you might be able to shed some light on this, and the problems we experienced with the variable declaration inside the block.
>
> Thanks
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>