You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Erik Fäßler <er...@uni-jena.de> on 2021/01/06 07:29:26 UTC

RUTA: Copy features into new annotation

Hello everyone (and a happy new year :-)),

I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.

I have rules like this:

(Person (
    ",” (Person)
     ","? PennBioIEPOSTag.value=="CC"
 Person
) {->MARK(PersonEnumeration)};

So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?

Best,

Erik

Re: RUTA: Copy features into new annotation

Posted by Peter Klügl <pe...@averbis.com>.

Hi,

Am 13.01.2021 um 12:04 schrieb Erik Fäßler:
>> :-)
>>
>> I was looking the the Person definition there, but didn't find matching
>> features.
> Oh, sorry, I did not articulate myself clear enough: In my real case work I don’t have Person annotations but Organism annotation which are derived from ConceptMentions. And ConceptMentions have the resourceEntryList feature.
> I apologize for the confusion. For the matter of simplicity I made up the Person example in my initial E-Mail and now and bit me in the a** ;-)


Ah no, all fine. When I prepared the first exemplary rules, I wondered
about the type range of the id feature. As I assumed you were using the
JCore type systems as your question indicated some non-trivial real
world use case. I have a quick look (1min) if I can identify the range
for the ids Person annotations in these type systems but failed... so I
simply used String as range :-)



>>
>> In general, I find it better to create additional annotations for
>> complex structures instead of merging the information in an existing
>> annotation, simple due to maintainability reasons. It's easier to
>> inspect unintended behavior several month later that way ...
> Great, I am with you here, feels like I did it the recommended way.
>>
>>> So actually, there is one step missing now: I need to replace merged Organism entries with the covering OrganismEnumeration (Person and PersonEnumeration in my example).
>>
>> I am not sure what the input/output behavior should be. Don't you have
>> two separate annotations and isn't the enum the merge of the semantic?
> You’re right. And I think I will leave it this way. I’m thinking too complicated.
>>
>> Labels and inlined rules are the two best language features I added in
>> Ruta, really useful. Let me know if you want to learn more about them
>> and if there is information missing in the documentation.
>>
> No, it’s all great. It’s just not that trivial and, honestly, while I had a look at the base syntax, I came quite far with cherry-picking from the documentation what I needed. I did not study the syntax in great detail because I could always make it work with doing it. That’s my bad. But this time I didn’t know where to start so I asked. And you helped me a lot, thank you so much.
> RUTA is a great tool. I only have trouble of a regular exceptions in the Eclipse Workbench but I got used to it and I have probably combined wrong versions of RUTA and Eclipse or something.


There were several reports of problems lately which had their source in
different Java versions used.



Best,


Peter



>
> Thank you!
>
> Erik
>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>> construction so this enumeration-annotation-merging might actually be easy and I just don’t see it.
>>>
>>> Thank you so much!
>>>
>>> Erik
>>>
>>>> On 10. Jan 2021, at 16:21, Peter Klügl <pe...@averbis.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>>>> Hi Peter and thank you once again for your excellent support of your excellent RUTA software!
>>>> You are welcome :-)
>>>>
>>>>
>>>>> Your second example was very much what I needed. Thank you so far!
>>>>> I have one last bump in the road:
>>>>>
>>>>> My Person#id feature is an FSArray with ID annotations instead of a plain uima.cas.String. So, one Person annotation might have multiple IDs per the type system.
>>>>> The ID type has a feature “entryId”.
>>>>> In my particular case I actually have only one entry in the id array. Still, I need to access this entry somehow.
>>>>> Is that at all possible in RUTA? I would need something like
>>>>>
>>>>>
>>>>> // collect ids of all covered Persons using an extra list
>>>>> STRINGLIST ids;
>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>   <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ <http://p.id/>>[0].entryId)};};
>>>>>
>>>>> This does not seem to be covered by the FeatureExpression grammar in RUTA. Is there a work around? Otherwise I will have to solve it some other way.
>>>> there are actual "indexed" expressions like Person.ids[0] but it's not
>>>> yet an "official" and stable feature. However, I think it's not even
>>>> necessary.
>>>>
>>>>
>>>> Is your typesystem available somewhere? JCoRe?
>>>>
>>>> Is this a solution for you?
>>>>
>>>>
>>>> PACKAGE uima.ruta;
>>>>
>>>> // mock types
>>>> DECLARE CC, EnumCC;
>>>> DECLARE Person (FSArray ids);
>>>> DECLARE PersonId (String personId);
>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>
>>>> // mock annotations
>>>> "Trump" -> Person;
>>>> "Biden" -> Person;
>>>> "and" -> CC;
>>>> INT counter = 1;
>>>> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
>>>> counter = counter +1, p.ids = pid};
>>>>
>>>> (COMMA? @CC){-> EnumCC};
>>>>
>>>> // identify enum span
>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>
>>>> // collect ids of all covered Persons using a extra list
>>>> STRINGLIST ids;
>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>    <-{p:Person{-> ADD(ids,p.ids.personId)};};
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>>> Many thanks,
>>>>>
>>>>> Erik
>>>>>
>>>>>> On 7. Jan 2021, at 10:47, Peter Klügl <peter.kluegl@averbis.com <ma...@averbis.com>> wrote:
>>>>>>
>>>>>> Hi Erik,
>>>>>>
>>>>>>
>>>>>> it depends on how you want to represent the information of the ids of
>>>>>> the covered Person annotations. You somehow need to represent the values
>>>>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>>>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>>>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>>>>> Person annotation which provide the IDs).
>>>>>>
>>>>>> Here are two examples:
>>>>>>
>>>>>>
>>>>>> PACKAGE uima.ruta;
>>>>>>
>>>>>> // mock types
>>>>>> DECLARE CC, EnumCC;
>>>>>> DECLARE Person (STRING id);
>>>>>> DECLARE PersonEnumeration (FSArray persons);
>>>>>>
>>>>>> // mock annotations
>>>>>> "Trump" -> Person ("id" = "1");
>>>>>> "Biden" -> Person ("id" = "2");
>>>>>> "and" -> CC;
>>>>>>
>>>>>> COMMA? @CC{-> EnumCC};
>>>>>>
>>>>>> // identify enum span
>>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>>>
>>>>>> // collect all covered Persons
>>>>>> pe:PersonEnumeration{-> pe.persons = Person};
>>>>>>
>>>>>> ########################
>>>>>>
>>>>>> ########################
>>>>>>
>>>>>> PACKAGE uima.ruta;
>>>>>>
>>>>>> // mock types
>>>>>> DECLARE CC, EnumCC;
>>>>>> DECLARE Person (STRING id);
>>>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>>>
>>>>>> // mock annotations
>>>>>> "Trump" -> Person ("id" = "1");
>>>>>> "Biden" -> Person ("id" = "2");
>>>>>> "and" -> CC;
>>>>>>
>>>>>> COMMA? @CC{-> EnumCC};
>>>>>>
>>>>>> // identify enum span
>>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>>>
>>>>>> // collect ids of all covered Persons using an extra list
>>>>>> STRINGLIST ids;
>>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>>   <-{p:Person{-> ADD(ids,p.id)};};
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>>>>> Hello everyone (and a happy new year :-)),
>>>>>>>
>>>>>>> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
>>>>>>> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>>>>>>>
>>>>>>> I have rules like this:
>>>>>>>
>>>>>>> (Person (
>>>>>>>  ",” (Person)
>>>>>>>   ","? PennBioIEPOSTag.value=="CC"
>>>>>>> Person
>>>>>>> ) {->MARK(PersonEnumeration)};
>>>>>>>
>>>>>>> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Erik
>>>>>> -- 
>>>>>> Dr. Peter Klügl
>>>>>> Head of Text Mining/Machine Learning
>>>>>>
>>>>>> Averbis GmbH
>>>>>> Salzstr. 15
>>>>>> 79098 Freiburg
>>>>>> Germany
>>>>>>
>>>>>> Fon: +49 761 708 394 0
>>>>>> Fax: +49 761 708 394 10
>>>>>> Email: peter.kluegl@averbis.com
>>>>>> Web: https://averbis.com
>>>>>>
>>>>>> Headquarters: Freiburg im Breisgau
>>>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>>>
>>>> -- 
>>>> Dr. Peter Klügl
>>>> Head of Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: peter.kluegl@averbis.com <ma...@averbis.com>
>>>> Web: https://averbis.com <https://averbis.com/>
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> -- 
>> Dr. Peter Klügl
>> Head of Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.kluegl@averbis.com <ma...@averbis.com>
>> Web: https://averbis.com <https://averbis.com/>
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Posted by Erik Fäßler <er...@uni-jena.de>.

> 
> :-)
> 
> I was looking the the Person definition there, but didn't find matching
> features.

Oh, sorry, I did not articulate myself clear enough: In my real case work I don’t have Person annotations but Organism annotation which are derived from ConceptMentions. And ConceptMentions have the resourceEntryList feature.
I apologize for the confusion. For the matter of simplicity I made up the Person example in my initial E-Mail and now and bit me in the a** ;-)
> 
> 
> In general, I find it better to create additional annotations for
> complex structures instead of merging the information in an existing
> annotation, simple due to maintainability reasons. It's easier to
> inspect unintended behavior several month later that way ...

Great, I am with you here, feels like I did it the recommended way.
> 
> 
>> 
>> So actually, there is one step missing now: I need to replace merged Organism entries with the covering OrganismEnumeration (Person and PersonEnumeration in my example).
> 
> 
> I am not sure what the input/output behavior should be. Don't you have
> two separate annotations and isn't the enum the merge of the semantic?

You’re right. And I think I will leave it this way. I’m thinking too complicated.
> 
> 
> Labels and inlined rules are the two best language features I added in
> Ruta, really useful. Let me know if you want to learn more about them
> and if there is information missing in the documentation.
> 

No, it’s all great. It’s just not that trivial and, honestly, while I had a look at the base syntax, I came quite far with cherry-picking from the documentation what I needed. I did not study the syntax in great detail because I could always make it work with doing it. That’s my bad. But this time I didn’t know where to start so I asked. And you helped me a lot, thank you so much.
RUTA is a great tool. I only have trouble of a regular exceptions in the Eclipse Workbench but I got used to it and I have probably combined wrong versions of RUTA and Eclipse or something.

Thank you!

Erik

> 
> 
> Best,
> 
> 
> Peter
> 
> 
> 
>> 
>> construction so this enumeration-annotation-merging might actually be easy and I just don’t see it.
>> 
>> Thank you so much!
>> 
>> Erik
>> 
>>> On 10. Jan 2021, at 16:21, Peter Klügl <pe...@averbis.com> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>>> Hi Peter and thank you once again for your excellent support of your excellent RUTA software!
>>> 
>>> You are welcome :-)
>>> 
>>> 
>>>> Your second example was very much what I needed. Thank you so far!
>>>> I have one last bump in the road:
>>>> 
>>>> My Person#id feature is an FSArray with ID annotations instead of a plain uima.cas.String. So, one Person annotation might have multiple IDs per the type system.
>>>> The ID type has a feature “entryId”.
>>>> In my particular case I actually have only one entry in the id array. Still, I need to access this entry somehow.
>>>> Is that at all possible in RUTA? I would need something like
>>>> 
>>>> 
>>>> // collect ids of all covered Persons using an extra list
>>>> STRINGLIST ids;
>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>   <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ <http://p.id/>>[0].entryId)};};
>>>> 
>>>> This does not seem to be covered by the FeatureExpression grammar in RUTA. Is there a work around? Otherwise I will have to solve it some other way.
>>> 
>>> there are actual "indexed" expressions like Person.ids[0] but it's not
>>> yet an "official" and stable feature. However, I think it's not even
>>> necessary.
>>> 
>>> 
>>> Is your typesystem available somewhere? JCoRe?
>>> 
>>> Is this a solution for you?
>>> 
>>> 
>>> PACKAGE uima.ruta;
>>> 
>>> // mock types
>>> DECLARE CC, EnumCC;
>>> DECLARE Person (FSArray ids);
>>> DECLARE PersonId (String personId);
>>> DECLARE PersonEnumeration (StringArray personIds);
>>> 
>>> // mock annotations
>>> "Trump" -> Person;
>>> "Biden" -> Person;
>>> "and" -> CC;
>>> INT counter = 1;
>>> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
>>> counter = counter +1, p.ids = pid};
>>> 
>>> (COMMA? @CC){-> EnumCC};
>>> 
>>> // identify enum span
>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>> 
>>> // collect ids of all covered Persons using a extra list
>>> STRINGLIST ids;
>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>    <-{p:Person{-> ADD(ids,p.ids.personId)};};
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> Peter
>>> 
>>> 
>>> 
>>>> Many thanks,
>>>> 
>>>> Erik
>>>> 
>>>>> On 7. Jan 2021, at 10:47, Peter Klügl <peter.kluegl@averbis.com <ma...@averbis.com>> wrote:
>>>>> 
>>>>> Hi Erik,
>>>>> 
>>>>> 
>>>>> it depends on how you want to represent the information of the ids of
>>>>> the covered Person annotations. You somehow need to represent the values
>>>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>>>> Person annotation which provide the IDs).
>>>>> 
>>>>> Here are two examples:
>>>>> 
>>>>> 
>>>>> PACKAGE uima.ruta;
>>>>> 
>>>>> // mock types
>>>>> DECLARE CC, EnumCC;
>>>>> DECLARE Person (STRING id);
>>>>> DECLARE PersonEnumeration (FSArray persons);
>>>>> 
>>>>> // mock annotations
>>>>> "Trump" -> Person ("id" = "1");
>>>>> "Biden" -> Person ("id" = "2");
>>>>> "and" -> CC;
>>>>> 
>>>>> COMMA? @CC{-> EnumCC};
>>>>> 
>>>>> // identify enum span
>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>> 
>>>>> // collect all covered Persons
>>>>> pe:PersonEnumeration{-> pe.persons = Person};
>>>>> 
>>>>> ########################
>>>>> 
>>>>> ########################
>>>>> 
>>>>> PACKAGE uima.ruta;
>>>>> 
>>>>> // mock types
>>>>> DECLARE CC, EnumCC;
>>>>> DECLARE Person (STRING id);
>>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>> 
>>>>> // mock annotations
>>>>> "Trump" -> Person ("id" = "1");
>>>>> "Biden" -> Person ("id" = "2");
>>>>> "and" -> CC;
>>>>> 
>>>>> COMMA? @CC{-> EnumCC};
>>>>> 
>>>>> // identify enum span
>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>> 
>>>>> // collect ids of all covered Persons using an extra list
>>>>> STRINGLIST ids;
>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>   <-{p:Person{-> ADD(ids,p.id)};};
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> 
>>>>> Peter
>>>>> 
>>>>> 
>>>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>>>> Hello everyone (and a happy new year :-)),
>>>>>> 
>>>>>> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
>>>>>> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>>>>>> 
>>>>>> I have rules like this:
>>>>>> 
>>>>>> (Person (
>>>>>>  ",” (Person)
>>>>>>   ","? PennBioIEPOSTag.value=="CC"
>>>>>> Person
>>>>>> ) {->MARK(PersonEnumeration)};
>>>>>> 
>>>>>> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Erik
>>>>> -- 
>>>>> Dr. Peter Klügl
>>>>> Head of Text Mining/Machine Learning
>>>>> 
>>>>> Averbis GmbH
>>>>> Salzstr. 15
>>>>> 79098 Freiburg
>>>>> Germany
>>>>> 
>>>>> Fon: +49 761 708 394 0
>>>>> Fax: +49 761 708 394 10
>>>>> Email: peter.kluegl@averbis.com
>>>>> Web: https://averbis.com
>>>>> 
>>>>> Headquarters: Freiburg im Breisgau
>>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>> 
>>> -- 
>>> Dr. Peter Klügl
>>> Head of Text Mining/Machine Learning
>>> 
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>> 
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: peter.kluegl@averbis.com <ma...@averbis.com>
>>> Web: https://averbis.com <https://averbis.com/>
>>> 
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> 
> -- 
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com <ma...@averbis.com>
> Web: https://averbis.com <https://averbis.com/>
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Posted by Peter Klügl <pe...@averbis.com>.

Hi,

Am 11.01.2021 um 08:13 schrieb Erik Fäßler:
> Hello Peter,
>
> thank you again that you put so much thought it in.
> I am a bit embarrassed to say that I already had the solution in my script when I just opened Eclipse again. I think I just didn’t really try it because I didn’t expect it to work.
> This works now, thank you!
>
> In order to better understand my case, here some details:
> My type system is indeed the JCoRe TS.
> And I am not working with Person annotations but with Organism mentions, but I wanted to keep things simple. Organism mentions are extended from ConceptMentions:
> https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125 <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125>
>
> Those have the “resourceEntryList” feature which is an FSArray of ResourceEntry instances:
> https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44 <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44>
>
> The ResourceEntry, finally, has a feature named “entryId”.


:-)

I was looking the the Person definition there, but didn't find matching
features.



>
> The entryIds are set in a separate annotator (JCoRe Linneaus annotator). And my goal is to connect multiple mentions of Organisms ("mouse and human”) into a single expression for a downstream annotator that is checking the Organism mentions directly in front of gene mentions. However, in the example “mouse and human” it would always detect “human” but disregard “mouse”. So I thought I would create new annotations to “merge” the originals.
>
> Is this how you would do it? Alternatively, I could also have merged the two existing Organism annotations. I would even prefer that. But I would not know how to organize this so that, in the end, instead of two single Organism annotations with two resourceEntries there would be only one Organism annotation with both resourceEntries.


It hard to tell without taking a closer look.

In general, I find it better to create additional annotations for
complex structures instead of merging the information in an existing
annotation, simple due to maintainability reasons. It's easier to
inspect unintended behavior several month later that way ...


>
> So actually, there is one step missing now: I need to replace merged Organism entries with the covering OrganismEnumeration (Person and PersonEnumeration in my example).


I am not sure what the input/output behavior should be. Don't you have
two separate annotations and isn't the enum the merge of the semantic?

If you can give me an example, I'll write a rule for you :-)



> Is there a way to do this better in RUTA? I have to say that I have not yet fully penetrated the syntax, I would have not been able to come up with the
> // collect ids of all covered Persons using a extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
>     <-{p:Person{-> ADD(ids,p.ids.personId)};};


Labels and inlined rules are the two best language features I added in
Ruta, really useful. Let me know if you want to learn more about them
and if there is information missing in the documentation.



Best,


Peter



>
> construction so this enumeration-annotation-merging might actually be easy and I just don’t see it.
>
> Thank you so much!
>
> Erik
>
>> On 10. Jan 2021, at 16:21, Peter Klügl <pe...@averbis.com> wrote:
>>
>> Hi,
>>
>>
>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>> Hi Peter and thank you once again for your excellent support of your excellent RUTA software!
>>
>> You are welcome :-)
>>
>>
>>> Your second example was very much what I needed. Thank you so far!
>>> I have one last bump in the road:
>>>
>>> My Person#id feature is an FSArray with ID annotations instead of a plain uima.cas.String. So, one Person annotation might have multiple IDs per the type system.
>>> The ID type has a feature “entryId”.
>>> In my particular case I actually have only one entry in the id array. Still, I need to access this entry somehow.
>>> Is that at all possible in RUTA? I would need something like
>>>
>>>
>>> // collect ids of all covered Persons using an extra list
>>> STRINGLIST ids;
>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>    <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ <http://p.id/>>[0].entryId)};};
>>>
>>> This does not seem to be covered by the FeatureExpression grammar in RUTA. Is there a work around? Otherwise I will have to solve it some other way.
>>
>> there are actual "indexed" expressions like Person.ids[0] but it's not
>> yet an "official" and stable feature. However, I think it's not even
>> necessary.
>>
>>
>> Is your typesystem available somewhere? JCoRe?
>>
>> Is this a solution for you?
>>
>>
>> PACKAGE uima.ruta;
>>
>> // mock types
>> DECLARE CC, EnumCC;
>> DECLARE Person (FSArray ids);
>> DECLARE PersonId (String personId);
>> DECLARE PersonEnumeration (StringArray personIds);
>>
>> // mock annotations
>> "Trump" -> Person;
>> "Biden" -> Person;
>> "and" -> CC;
>> INT counter = 1;
>> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
>> counter = counter +1, p.ids = pid};
>>
>> (COMMA? @CC){-> EnumCC};
>>
>> // identify enum span
>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>
>> // collect ids of all covered Persons using a extra list
>> STRINGLIST ids;
>> pe:PersonEnumeration{-> pe.personIds = ids}
>>     <-{p:Person{-> ADD(ids,p.ids.personId)};};
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>> Many thanks,
>>>
>>> Erik
>>>
>>>> On 7. Jan 2021, at 10:47, Peter Klügl <peter.kluegl@averbis.com <ma...@averbis.com>> wrote:
>>>>
>>>> Hi Erik,
>>>>
>>>>
>>>> it depends on how you want to represent the information of the ids of
>>>> the covered Person annotations. You somehow need to represent the values
>>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>>> Person annotation which provide the IDs).
>>>>
>>>> Here are two examples:
>>>>
>>>>
>>>> PACKAGE uima.ruta;
>>>>
>>>> // mock types
>>>> DECLARE CC, EnumCC;
>>>> DECLARE Person (STRING id);
>>>> DECLARE PersonEnumeration (FSArray persons);
>>>>
>>>> // mock annotations
>>>> "Trump" -> Person ("id" = "1");
>>>> "Biden" -> Person ("id" = "2");
>>>> "and" -> CC;
>>>>
>>>> COMMA? @CC{-> EnumCC};
>>>>
>>>> // identify enum span
>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>
>>>> // collect all covered Persons
>>>> pe:PersonEnumeration{-> pe.persons = Person};
>>>>
>>>> ########################
>>>>
>>>> ########################
>>>>
>>>> PACKAGE uima.ruta;
>>>>
>>>> // mock types
>>>> DECLARE CC, EnumCC;
>>>> DECLARE Person (STRING id);
>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>
>>>> // mock annotations
>>>> "Trump" -> Person ("id" = "1");
>>>> "Biden" -> Person ("id" = "2");
>>>> "and" -> CC;
>>>>
>>>> COMMA? @CC{-> EnumCC};
>>>>
>>>> // identify enum span
>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>
>>>> // collect ids of all covered Persons using an extra list
>>>> STRINGLIST ids;
>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>    <-{p:Person{-> ADD(ids,p.id)};};
>>>>
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>>> Hello everyone (and a happy new year :-)),
>>>>>
>>>>> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
>>>>> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>>>>>
>>>>> I have rules like this:
>>>>>
>>>>> (Person (
>>>>>   ",” (Person)
>>>>>    ","? PennBioIEPOSTag.value=="CC"
>>>>> Person
>>>>> ) {->MARK(PersonEnumeration)};
>>>>>
>>>>> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>>>>>
>>>>> Best,
>>>>>
>>>>> Erik
>>>> -- 
>>>> Dr. Peter Klügl
>>>> Head of Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: peter.kluegl@averbis.com
>>>> Web: https://averbis.com
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>
>> -- 
>> Dr. Peter Klügl
>> Head of Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.kluegl@averbis.com <ma...@averbis.com>
>> Web: https://averbis.com <https://averbis.com/>
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Posted by Erik Fäßler <er...@uni-jena.de>.

Hello Peter,

thank you again that you put so much thought it in.
I am a bit embarrassed to say that I already had the solution in my script when I just opened Eclipse again. I think I just didn’t really try it because I didn’t expect it to work.
This works now, thank you!

In order to better understand my case, here some details:
My type system is indeed the JCoRe TS.
And I am not working with Person annotations but with Organism mentions, but I wanted to keep things simple. Organism mentions are extended from ConceptMentions:
https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125 <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125>

Those have the “resourceEntryList” feature which is an FSArray of ResourceEntry instances:
https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44 <https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44>

The ResourceEntry, finally, has a feature named “entryId”.

The entryIds are set in a separate annotator (JCoRe Linneaus annotator). And my goal is to connect multiple mentions of Organisms ("mouse and human”) into a single expression for a downstream annotator that is checking the Organism mentions directly in front of gene mentions. However, in the example “mouse and human” it would always detect “human” but disregard “mouse”. So I thought I would create new annotations to “merge” the originals.

Is this how you would do it? Alternatively, I could also have merged the two existing Organism annotations. I would even prefer that. But I would not know how to organize this so that, in the end, instead of two single Organism annotations with two resourceEntries there would be only one Organism annotation with both resourceEntries.

So actually, there is one step missing now: I need to replace merged Organism entries with the covering OrganismEnumeration (Person and PersonEnumeration in my example).
Is there a way to do this better in RUTA? I have to say that I have not yet fully penetrated the syntax, I would have not been able to come up with the
// collect ids of all covered Persons using a extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.ids.personId)};};

construction so this enumeration-annotation-merging might actually be easy and I just don’t see it.

Thank you so much!

Erik

> On 10. Jan 2021, at 16:21, Peter Klügl <pe...@averbis.com> wrote:
> 
> Hi,
> 
> 
> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>> Hi Peter and thank you once again for your excellent support of your excellent RUTA software!
> 
> 
> You are welcome :-)
> 
> 
>> 
>> Your second example was very much what I needed. Thank you so far!
>> I have one last bump in the road:
>> 
>> My Person#id feature is an FSArray with ID annotations instead of a plain uima.cas.String. So, one Person annotation might have multiple IDs per the type system.
>> The ID type has a feature “entryId”.
>> In my particular case I actually have only one entry in the id array. Still, I need to access this entry somehow.
>> Is that at all possible in RUTA? I would need something like
>> 
>> 
>> // collect ids of all covered Persons using an extra list
>> STRINGLIST ids;
>> pe:PersonEnumeration{-> pe.personIds = ids}
>>    <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ <http://p.id/>>[0].entryId)};};
>> 
>> This does not seem to be covered by the FeatureExpression grammar in RUTA. Is there a work around? Otherwise I will have to solve it some other way.
> 
> 
> there are actual "indexed" expressions like Person.ids[0] but it's not
> yet an "official" and stable feature. However, I think it's not even
> necessary.
> 
> 
> Is your typesystem available somewhere? JCoRe?
> 
> Is this a solution for you?
> 
> 
> PACKAGE uima.ruta;
> 
> // mock types
> DECLARE CC, EnumCC;
> DECLARE Person (FSArray ids);
> DECLARE PersonId (String personId);
> DECLARE PersonEnumeration (StringArray personIds);
> 
> // mock annotations
> "Trump" -> Person;
> "Biden" -> Person;
> "and" -> CC;
> INT counter = 1;
> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
> counter = counter +1, p.ids = pid};
> 
> (COMMA? @CC){-> EnumCC};
> 
> // identify enum span
> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
> 
> // collect ids of all covered Persons using a extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
>     <-{p:Person{-> ADD(ids,p.ids.personId)};};
> 
> 
> Best,
> 
> 
> Peter
> 
> 
> 
>> 
>> Many thanks,
>> 
>> Erik
>> 
>>> On 7. Jan 2021, at 10:47, Peter Klügl <peter.kluegl@averbis.com <ma...@averbis.com>> wrote:
>>> 
>>> Hi Erik,
>>> 
>>> 
>>> it depends on how you want to represent the information of the ids of
>>> the covered Person annotations. You somehow need to represent the values
>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>> Person annotation which provide the IDs).
>>> 
>>> Here are two examples:
>>> 
>>> 
>>> PACKAGE uima.ruta;
>>> 
>>> // mock types
>>> DECLARE CC, EnumCC;
>>> DECLARE Person (STRING id);
>>> DECLARE PersonEnumeration (FSArray persons);
>>> 
>>> // mock annotations
>>> "Trump" -> Person ("id" = "1");
>>> "Biden" -> Person ("id" = "2");
>>> "and" -> CC;
>>> 
>>> COMMA? @CC{-> EnumCC};
>>> 
>>> // identify enum span
>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>> 
>>> // collect all covered Persons
>>> pe:PersonEnumeration{-> pe.persons = Person};
>>> 
>>> ########################
>>> 
>>> ########################
>>> 
>>> PACKAGE uima.ruta;
>>> 
>>> // mock types
>>> DECLARE CC, EnumCC;
>>> DECLARE Person (STRING id);
>>> DECLARE PersonEnumeration (StringArray personIds);
>>> 
>>> // mock annotations
>>> "Trump" -> Person ("id" = "1");
>>> "Biden" -> Person ("id" = "2");
>>> "and" -> CC;
>>> 
>>> COMMA? @CC{-> EnumCC};
>>> 
>>> // identify enum span
>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>> 
>>> // collect ids of all covered Persons using an extra list
>>> STRINGLIST ids;
>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>    <-{p:Person{-> ADD(ids,p.id)};};
>>> 
>>> 
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> Peter
>>> 
>>> 
>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>> Hello everyone (and a happy new year :-)),
>>>> 
>>>> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
>>>> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>>>> 
>>>> I have rules like this:
>>>> 
>>>> (Person (
>>>>   ",” (Person)
>>>>    ","? PennBioIEPOSTag.value=="CC"
>>>> Person
>>>> ) {->MARK(PersonEnumeration)};
>>>> 
>>>> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>>>> 
>>>> Best,
>>>> 
>>>> Erik
>>> -- 
>>> Dr. Peter Klügl
>>> Head of Text Mining/Machine Learning
>>> 
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>> 
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: peter.kluegl@averbis.com
>>> Web: https://averbis.com
>>> 
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>> 
>> 
> -- 
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com <ma...@averbis.com>
> Web: https://averbis.com <https://averbis.com/>
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Posted by Peter Klügl <pe...@averbis.com>.

Hi,


Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
> Hi Peter and thank you once again for your excellent support of your excellent RUTA software!


You are welcome :-)


>
> Your second example was very much what I needed. Thank you so far!
> I have one last bump in the road:
>
> My Person#id feature is an FSArray with ID annotations instead of a plain uima.cas.String. So, one Person annotation might have multiple IDs per the type system.
> The ID type has a feature “entryId”.
> In my particular case I actually have only one entry in the id array. Still, I need to access this entry somehow.
> Is that at all possible in RUTA? I would need something like
>
>
> // collect ids of all covered Persons using an extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
>     <-{p:Person{-> ADD(ids,p.id <http://p.id/>[0].entryId)};};
>
> This does not seem to be covered by the FeatureExpression grammar in RUTA. Is there a work around? Otherwise I will have to solve it some other way.


there are actual "indexed" expressions like Person.ids[0] but it's not
yet an "official" and stable feature. However, I think it's not even
necessary.


Is your typesystem available somewhere? JCoRe?

Is this a solution for you?


PACKAGE uima.ruta;

// mock types
DECLARE CC, EnumCC;
DECLARE Person (FSArray ids);
DECLARE PersonId (String personId);
DECLARE PersonEnumeration (StringArray personIds);

// mock annotations
"Trump" -> Person;
"Biden" -> Person;
"and" -> CC;
INT counter = 1;
p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
counter = counter +1, p.ids = pid};

(COMMA? @CC){-> EnumCC};

// identify enum span
(Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};

// collect ids of all covered Persons using a extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.ids.personId)};};


Best,


Peter



>
> Many thanks,
>
> Erik
>
>> On 7. Jan 2021, at 10:47, Peter Klügl <pe...@averbis.com> wrote:
>>
>> Hi Erik,
>>
>>
>> it depends on how you want to represent the information of the ids of
>> the covered Person annotations. You somehow need to represent the values
>> in the PersonEnumeration annotation. I assume that the ID feature of
>> Person is uima.cas.String? PersonEnumeration could either use one String
>> Feature, a StringArray feature or a FSArray feature (pointing to the
>> Person annotation which provide the IDs).
>>
>> Here are two examples:
>>
>>
>> PACKAGE uima.ruta;
>>
>> // mock types
>> DECLARE CC, EnumCC;
>> DECLARE Person (STRING id);
>> DECLARE PersonEnumeration (FSArray persons);
>>
>> // mock annotations
>> "Trump" -> Person ("id" = "1");
>> "Biden" -> Person ("id" = "2");
>> "and" -> CC;
>>
>> COMMA? @CC{-> EnumCC};
>>
>> // identify enum span
>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>
>> // collect all covered Persons
>> pe:PersonEnumeration{-> pe.persons = Person};
>>
>> ########################
>>
>> ########################
>>
>> PACKAGE uima.ruta;
>>
>> // mock types
>> DECLARE CC, EnumCC;
>> DECLARE Person (STRING id);
>> DECLARE PersonEnumeration (StringArray personIds);
>>
>> // mock annotations
>> "Trump" -> Person ("id" = "1");
>> "Biden" -> Person ("id" = "2");
>> "and" -> CC;
>>
>> COMMA? @CC{-> EnumCC};
>>
>> // identify enum span
>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>
>> // collect ids of all covered Persons using an extra list
>> STRINGLIST ids;
>> pe:PersonEnumeration{-> pe.personIds = ids}
>>     <-{p:Person{-> ADD(ids,p.id)};};
>>
>>
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>> Hello everyone (and a happy new year :-)),
>>>
>>> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
>>> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>>>
>>> I have rules like this:
>>>
>>> (Person (
>>>    ",” (Person)
>>>     ","? PennBioIEPOSTag.value=="CC"
>>> Person
>>> ) {->MARK(PersonEnumeration)};
>>>
>>> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>>>
>>> Best,
>>>
>>> Erik
>> -- 
>> Dr. Peter Klügl
>> Head of Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.kluegl@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Posted by Erik Fäßler <er...@uni-jena.de>.

Hi Peter and thank you once again for your excellent support of your excellent RUTA software!

Your second example was very much what I needed. Thank you so far!
I have one last bump in the road:

My Person#id feature is an FSArray with ID annotations instead of a plain uima.cas.String. So, one Person annotation might have multiple IDs per the type system.
The ID type has a feature “entryId”.
In my particular case I actually have only one entry in the id array. Still, I need to access this entry somehow.
Is that at all possible in RUTA? I would need something like


// collect ids of all covered Persons using an extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.id <http://p.id/>[0].entryId)};};

This does not seem to be covered by the FeatureExpression grammar in RUTA. Is there a work around? Otherwise I will have to solve it some other way.

Many thanks,

Erik

> On 7. Jan 2021, at 10:47, Peter Klügl <pe...@averbis.com> wrote:
> 
> Hi Erik,
> 
> 
> it depends on how you want to represent the information of the ids of
> the covered Person annotations. You somehow need to represent the values
> in the PersonEnumeration annotation. I assume that the ID feature of
> Person is uima.cas.String? PersonEnumeration could either use one String
> Feature, a StringArray feature or a FSArray feature (pointing to the
> Person annotation which provide the IDs).
> 
> Here are two examples:
> 
> 
> PACKAGE uima.ruta;
> 
> // mock types
> DECLARE CC, EnumCC;
> DECLARE Person (STRING id);
> DECLARE PersonEnumeration (FSArray persons);
> 
> // mock annotations
> "Trump" -> Person ("id" = "1");
> "Biden" -> Person ("id" = "2");
> "and" -> CC;
> 
> COMMA? @CC{-> EnumCC};
> 
> // identify enum span
> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
> 
> // collect all covered Persons
> pe:PersonEnumeration{-> pe.persons = Person};
> 
> ########################
> 
> ########################
> 
> PACKAGE uima.ruta;
> 
> // mock types
> DECLARE CC, EnumCC;
> DECLARE Person (STRING id);
> DECLARE PersonEnumeration (StringArray personIds);
> 
> // mock annotations
> "Trump" -> Person ("id" = "1");
> "Biden" -> Person ("id" = "2");
> "and" -> CC;
> 
> COMMA? @CC{-> EnumCC};
> 
> // identify enum span
> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
> 
> // collect ids of all covered Persons using an extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
>     <-{p:Person{-> ADD(ids,p.id)};};
> 
> 
> 
> 
> Best,
> 
> 
> Peter
> 
> 
> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>> Hello everyone (and a happy new year :-)),
>> 
>> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
>> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>> 
>> I have rules like this:
>> 
>> (Person (
>>    ",” (Person)
>>     ","? PennBioIEPOSTag.value=="CC"
>> Person
>> ) {->MARK(PersonEnumeration)};
>> 
>> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>> 
>> Best,
>> 
>> Erik
> 
> -- 
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>

Re: RUTA: Copy features into new annotation

Posted by Peter Klügl <pe...@averbis.com>.

Hi Erik,


it depends on how you want to represent the information of the ids of
the covered Person annotations. You somehow need to represent the values
in the PersonEnumeration annotation. I assume that the ID feature of
Person is uima.cas.String? PersonEnumeration could either use one String
Feature, a StringArray feature or a FSArray feature (pointing to the
Person annotation which provide the IDs).

Here are two examples:


PACKAGE uima.ruta;

// mock types
DECLARE CC, EnumCC;
DECLARE Person (STRING id);
DECLARE PersonEnumeration (FSArray persons);

// mock annotations
"Trump" -> Person ("id" = "1");
"Biden" -> Person ("id" = "2");
"and" -> CC;

COMMA? @CC{-> EnumCC};

// identify enum span
(Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};

// collect all covered Persons
pe:PersonEnumeration{-> pe.persons = Person};

########################

########################

PACKAGE uima.ruta;

// mock types
DECLARE CC, EnumCC;
DECLARE Person (STRING id);
DECLARE PersonEnumeration (StringArray personIds);

// mock annotations
"Trump" -> Person ("id" = "1");
"Biden" -> Person ("id" = "2");
"and" -> CC;

COMMA? @CC{-> EnumCC};

// identify enum span
(Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};

// collect ids of all covered Persons using an extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.id)};};




Best,


Peter


Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
> Hello everyone (and a happy new year :-)),
>
> I have been working on the following issue: Whenever there is conjunction in text of two entities (e.g. [...]Biden and Trump ran for president […]) I create a new annotation spanning both entities and the conjunction ([Biden and Trump]_coordination). I can do this fine.
> However, my entities - Biden and Trump - also have the ID feature. The new annotation should receive both IDs from the Biden and Trump annotations. But I couldn’t manage to do this.
>
> I have rules like this:
>
> (Person (
>     ",” (Person)
>      ","? PennBioIEPOSTag.value=="CC"
>  Person
> ) {->MARK(PersonEnumeration)};
>
> So an enumeration of Persons are covered with a new annotation of type “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID features from the covered Person annotations. How can I do this?
>
> Best,
>
> Erik

-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó