You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Bonnie MacKellar <bk...@gmail.com> on 2016/06/17 19:21:44 UTC

question on Ruta Query View

Hi

I am trying to use Ruta Query View to get a view of all matches for a
particular annotation type across a large set of .xmi files. However, I am
noticing something strange about Ruta Query View - it doesnt't report lots
of matches that are shown in the Annotation browser (and which I believe
are correct matches). For example, a given annotation type tsCurrent has 4
matches in the file NCT0036712, but these matches do not appear at all in
the list of results in Ruta Query View when I query for tsCurrent.  For
some files, though, the results for all matches do show up, and for other
files, only a partial set of matches are in the query results. I cannot
understand why this is happening. Perhaps my query syntax is wrong?  I can
only find the one example in the manual, which isn't much to go on.

I am attaching a screenshot showing the AnnotationBrowser on the top right
in Eclipse, with all of the matches for tsCurrent, and the Ruta Query view
on bottom, which does not contain those matches. I think it is easier to
see the problem visually.

Also,ultimately I am just trying to get a count of the number of times
certain annotations are made across all of my files. Is there a better way
to do that instead of Ruta Query View?  I can't find another way to total
matches across lots of files.

thanks,
Bonnie MacKellar

[image: Inline image 1]

Re: question on Ruta Query View

Posted by Peter Klügl <pe...@averbis.com>.
Sorry for the delayed response.


It all depends on the filtering setting of the analysis engine and the
script. You can create annotations which are normally not visible to
common ruta rules. The Annotation Browser just displays all, but that
does not mean that the rules can match on these annotations. The Query
View uses a default Ruta analysis engine with the default filtering
settings, which means that annotations starting or ending with
whitespaces/linebreaks and markup are not visible and will be skipped.
It is not yet possible to reconfigure the Query View analysis engine yet
(I think). As I mentioned before, the Query View does not list the
annotations of the type you query, but returns the rule matches of the
rule you query. Can you check whether the missing annotations start or
end with something invisible? Just in case the problem is caused by
something else...


Best,


Peter



Am 19.06.2016 um 15:15 schrieb Bonnie MacKellar:
> I am sorry, I am now really confused. I have a Ruta script which annotates
>  a bunch of text files, resulting in .xmi files which I assume contain the
> annotations. When I open an .xmi file in the Annotation Browser, it shows
> all of the annotations produced by my script, right? It certainly looks
> correct.  I have checked them pretty carefully.
> Since I must specify .xmi files for the query view as well, I was assuming
> it is also listing the annotations in those same files.
>
> Yes, I know I can use UimaFIT but since I have a lot of types, I am
> dreading the configuation task. I just wanted some quick totals, and had
> hoped I could do it in a few minutes with the query view. Why are
> annotations made to be invisible if they end with a line break? That caused
> me no end of grief when I was developing my script. It seems unexpected.
>
> thanks,
> Bonnie MacKellar
>
> On Sun, Jun 19, 2016 at 8:52 AM, Peter Kl�gl <pe...@averbis.com>
> wrote:
>
>> Hi,
>>
>>
>> the annotation browser just lists all annotations in the CAS, it is
>> completely independent of the ruta language and just an extension of the
>> CAS Editor. The query view applies rules on a CAS and lists the rule
>> matches. So the query view is much more powerful than the annotation
>> browser since it can use the complete expressiveness of the language.
>> However, that is also the reason why it is sensible to the visibility
>> concept.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 19.06.2016 um 14:39 schrieb Bonnie MacKellar:
>>> The idea that spaces are making the annotations invisble is totally
>>> plausible. But why does the AnnotationBrowser see them then? The
>>> annotations are there - they haven't been skipped- just the query view is
>>> not picking them up. What is different about Annotation Browser that
>> would
>>> make those annotations not visible?
>>>
>>> thanks,
>>> Bonnie MacKellar
>>>
>>> On Sun, Jun 19, 2016 at 8:03 AM, Peter Kl�gl <pe...@averbis.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> attachements are removed on this mailing list.
>>>>
>>>>
>>>> I would bet that some annotations are not visible to the rules, so they
>>>> are simply skipped -> query view reutrn no matches.
>>>>
>>>>
>>>> In Ruta, annotations are invisble if their begin or end are covered by
>>>> something invisible, that are all annoations of types that are filtered.
>>>> Most often, the annotations are missed because they start or and with a
>>>> space or line break.
>>>>
>>>>
>>>> You can trim annotation, e.g., with
>>>>
>>>>
>>>> RETAINTYPE(SPACE,BREAK);
>>>>
>>>> tsCurrent{-> TRIM(SPACE,BREAK)};
>>>>
>>>> RETAINTYPE;
>>>>
>>>>
>>>>
>>>> You can use the query view for this use case. I have to mention that the
>>>> query view was build to serve as a tool during rule engineering: to get
>>>> a quick overview over the annotated documents. It does not scale with
>>>> the number of documents since there is not indexing across CASes and you
>>>> need to deserialze all CASes.
>>>>
>>>> If it is fast enough, it is totally fine for counting annotations with
>>>> the query view.
>>>>
>>>> You can also write a simple uimaFIT analysis engine and add it to the
>>>> pipeline or the the ruta script. The analysis engine counts the
>>>> annotation in process() and outputs the aggregates result in
>>>> collectionProcessingComplete() (or the overridden method with the
>>>> correct name). If you want to parallelize it, you need a different
>>>> solution with a resource or something.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
>>>>> Hi
>>>>>
>>>>> I am trying to use Ruta Query View to get a view of all matches for a
>>>>> particular annotation type across a large set of .xmi files. However,
>>>>> I am noticing something strange about Ruta Query View - it doesnt't
>>>>> report lots of matches that are shown in the Annotation browser (and
>>>>> which I believe are correct matches). For example, a given annotation
>>>>> type tsCurrent has 4 matches in the file NCT0036712, but these matches
>>>>> do not appear at all in the list of results in Ruta Query View when I
>>>>> query for tsCurrent.  For some files, though, the results for all
>>>>> matches do show up, and for other files, only a partial set of matches
>>>>> are in the query results. I cannot understand why this is happening.
>>>>> Perhaps my query syntax is wrong?  I can only find the one example in
>>>>> the manual, which isn't much to go on.
>>>>>
>>>>> I am attaching a screenshot showing the AnnotationBrowser on the top
>>>>> right in Eclipse, with all of the matches for tsCurrent, and the Ruta
>>>>> Query view on bottom, which does not contain those matches. I think it
>>>>> is easier to see the problem visually.
>>>>>
>>>>> Also,ultimately I am just trying to get a count of the number of times
>>>>> certain annotations are made across all of my files. Is there a better
>>>>> way to do that instead of Ruta Query View?  I can't find another way
>>>>> to total matches across lots of files.
>>>>>
>>>>> thanks,
>>>>> Bonnie MacKellar
>>>>>
>>>>> Inline image 1
>>


Re: question on Ruta Query View

Posted by Bonnie MacKellar <bk...@gmail.com>.
I am sorry, I am now really confused. I have a Ruta script which annotates
 a bunch of text files, resulting in .xmi files which I assume contain the
annotations. When I open an .xmi file in the Annotation Browser, it shows
all of the annotations produced by my script, right? It certainly looks
correct.  I have checked them pretty carefully.
Since I must specify .xmi files for the query view as well, I was assuming
it is also listing the annotations in those same files.

Yes, I know I can use UimaFIT but since I have a lot of types, I am
dreading the configuation task. I just wanted some quick totals, and had
hoped I could do it in a few minutes with the query view. Why are
annotations made to be invisible if they end with a line break? That caused
me no end of grief when I was developing my script. It seems unexpected.

thanks,
Bonnie MacKellar

On Sun, Jun 19, 2016 at 8:52 AM, Peter Klügl <pe...@averbis.com>
wrote:

> Hi,
>
>
> the annotation browser just lists all annotations in the CAS, it is
> completely independent of the ruta language and just an extension of the
> CAS Editor. The query view applies rules on a CAS and lists the rule
> matches. So the query view is much more powerful than the annotation
> browser since it can use the complete expressiveness of the language.
> However, that is also the reason why it is sensible to the visibility
> concept.
>
>
> Best,
>
>
> Peter
>
>
> Am 19.06.2016 um 14:39 schrieb Bonnie MacKellar:
> > The idea that spaces are making the annotations invisble is totally
> > plausible. But why does the AnnotationBrowser see them then? The
> > annotations are there - they haven't been skipped- just the query view is
> > not picking them up. What is different about Annotation Browser that
> would
> > make those annotations not visible?
> >
> > thanks,
> > Bonnie MacKellar
> >
> > On Sun, Jun 19, 2016 at 8:03 AM, Peter Klügl <pe...@averbis.com>
> > wrote:
> >
> >> Hi,
> >>
> >>
> >> attachements are removed on this mailing list.
> >>
> >>
> >> I would bet that some annotations are not visible to the rules, so they
> >> are simply skipped -> query view reutrn no matches.
> >>
> >>
> >> In Ruta, annotations are invisble if their begin or end are covered by
> >> something invisible, that are all annoations of types that are filtered.
> >> Most often, the annotations are missed because they start or and with a
> >> space or line break.
> >>
> >>
> >> You can trim annotation, e.g., with
> >>
> >>
> >> RETAINTYPE(SPACE,BREAK);
> >>
> >> tsCurrent{-> TRIM(SPACE,BREAK)};
> >>
> >> RETAINTYPE;
> >>
> >>
> >>
> >> You can use the query view for this use case. I have to mention that the
> >> query view was build to serve as a tool during rule engineering: to get
> >> a quick overview over the annotated documents. It does not scale with
> >> the number of documents since there is not indexing across CASes and you
> >> need to deserialze all CASes.
> >>
> >> If it is fast enough, it is totally fine for counting annotations with
> >> the query view.
> >>
> >> You can also write a simple uimaFIT analysis engine and add it to the
> >> pipeline or the the ruta script. The analysis engine counts the
> >> annotation in process() and outputs the aggregates result in
> >> collectionProcessingComplete() (or the overridden method with the
> >> correct name). If you want to parallelize it, you need a different
> >> solution with a resource or something.
> >>
> >> Best,
> >>
> >> Peter
> >>
> >>
> >>
> >> Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
> >>> Hi
> >>>
> >>> I am trying to use Ruta Query View to get a view of all matches for a
> >>> particular annotation type across a large set of .xmi files. However,
> >>> I am noticing something strange about Ruta Query View - it doesnt't
> >>> report lots of matches that are shown in the Annotation browser (and
> >>> which I believe are correct matches). For example, a given annotation
> >>> type tsCurrent has 4 matches in the file NCT0036712, but these matches
> >>> do not appear at all in the list of results in Ruta Query View when I
> >>> query for tsCurrent.  For some files, though, the results for all
> >>> matches do show up, and for other files, only a partial set of matches
> >>> are in the query results. I cannot understand why this is happening.
> >>> Perhaps my query syntax is wrong?  I can only find the one example in
> >>> the manual, which isn't much to go on.
> >>>
> >>> I am attaching a screenshot showing the AnnotationBrowser on the top
> >>> right in Eclipse, with all of the matches for tsCurrent, and the Ruta
> >>> Query view on bottom, which does not contain those matches. I think it
> >>> is easier to see the problem visually.
> >>>
> >>> Also,ultimately I am just trying to get a count of the number of times
> >>> certain annotations are made across all of my files. Is there a better
> >>> way to do that instead of Ruta Query View?  I can't find another way
> >>> to total matches across lots of files.
> >>>
> >>> thanks,
> >>> Bonnie MacKellar
> >>>
> >>> Inline image 1
> >>
>
>

Re: question on Ruta Query View

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


the annotation browser just lists all annotations in the CAS, it is
completely independent of the ruta language and just an extension of the
CAS Editor. The query view applies rules on a CAS and lists the rule
matches. So the query view is much more powerful than the annotation
browser since it can use the complete expressiveness of the language.
However, that is also the reason why it is sensible to the visibility
concept.


Best,


Peter


Am 19.06.2016 um 14:39 schrieb Bonnie MacKellar:
> The idea that spaces are making the annotations invisble is totally
> plausible. But why does the AnnotationBrowser see them then? The
> annotations are there - they haven't been skipped- just the query view is
> not picking them up. What is different about Annotation Browser that would
> make those annotations not visible?
>
> thanks,
> Bonnie MacKellar
>
> On Sun, Jun 19, 2016 at 8:03 AM, Peter Kl�gl <pe...@averbis.com>
> wrote:
>
>> Hi,
>>
>>
>> attachements are removed on this mailing list.
>>
>>
>> I would bet that some annotations are not visible to the rules, so they
>> are simply skipped -> query view reutrn no matches.
>>
>>
>> In Ruta, annotations are invisble if their begin or end are covered by
>> something invisible, that are all annoations of types that are filtered.
>> Most often, the annotations are missed because they start or and with a
>> space or line break.
>>
>>
>> You can trim annotation, e.g., with
>>
>>
>> RETAINTYPE(SPACE,BREAK);
>>
>> tsCurrent{-> TRIM(SPACE,BREAK)};
>>
>> RETAINTYPE;
>>
>>
>>
>> You can use the query view for this use case. I have to mention that the
>> query view was build to serve as a tool during rule engineering: to get
>> a quick overview over the annotated documents. It does not scale with
>> the number of documents since there is not indexing across CASes and you
>> need to deserialze all CASes.
>>
>> If it is fast enough, it is totally fine for counting annotations with
>> the query view.
>>
>> You can also write a simple uimaFIT analysis engine and add it to the
>> pipeline or the the ruta script. The analysis engine counts the
>> annotation in process() and outputs the aggregates result in
>> collectionProcessingComplete() (or the overridden method with the
>> correct name). If you want to parallelize it, you need a different
>> solution with a resource or something.
>>
>> Best,
>>
>> Peter
>>
>>
>>
>> Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
>>> Hi
>>>
>>> I am trying to use Ruta Query View to get a view of all matches for a
>>> particular annotation type across a large set of .xmi files. However,
>>> I am noticing something strange about Ruta Query View - it doesnt't
>>> report lots of matches that are shown in the Annotation browser (and
>>> which I believe are correct matches). For example, a given annotation
>>> type tsCurrent has 4 matches in the file NCT0036712, but these matches
>>> do not appear at all in the list of results in Ruta Query View when I
>>> query for tsCurrent.  For some files, though, the results for all
>>> matches do show up, and for other files, only a partial set of matches
>>> are in the query results. I cannot understand why this is happening.
>>> Perhaps my query syntax is wrong?  I can only find the one example in
>>> the manual, which isn't much to go on.
>>>
>>> I am attaching a screenshot showing the AnnotationBrowser on the top
>>> right in Eclipse, with all of the matches for tsCurrent, and the Ruta
>>> Query view on bottom, which does not contain those matches. I think it
>>> is easier to see the problem visually.
>>>
>>> Also,ultimately I am just trying to get a count of the number of times
>>> certain annotations are made across all of my files. Is there a better
>>> way to do that instead of Ruta Query View?  I can't find another way
>>> to total matches across lots of files.
>>>
>>> thanks,
>>> Bonnie MacKellar
>>>
>>> Inline image 1
>>


Re: question on Ruta Query View

Posted by Bonnie MacKellar <bk...@gmail.com>.
The idea that spaces are making the annotations invisble is totally
plausible. But why does the AnnotationBrowser see them then? The
annotations are there - they haven't been skipped- just the query view is
not picking them up. What is different about Annotation Browser that would
make those annotations not visible?

thanks,
Bonnie MacKellar

On Sun, Jun 19, 2016 at 8:03 AM, Peter Klügl <pe...@averbis.com>
wrote:

> Hi,
>
>
> attachements are removed on this mailing list.
>
>
> I would bet that some annotations are not visible to the rules, so they
> are simply skipped -> query view reutrn no matches.
>
>
> In Ruta, annotations are invisble if their begin or end are covered by
> something invisible, that are all annoations of types that are filtered.
> Most often, the annotations are missed because they start or and with a
> space or line break.
>
>
> You can trim annotation, e.g., with
>
>
> RETAINTYPE(SPACE,BREAK);
>
> tsCurrent{-> TRIM(SPACE,BREAK)};
>
> RETAINTYPE;
>
>
>
> You can use the query view for this use case. I have to mention that the
> query view was build to serve as a tool during rule engineering: to get
> a quick overview over the annotated documents. It does not scale with
> the number of documents since there is not indexing across CASes and you
> need to deserialze all CASes.
>
> If it is fast enough, it is totally fine for counting annotations with
> the query view.
>
> You can also write a simple uimaFIT analysis engine and add it to the
> pipeline or the the ruta script. The analysis engine counts the
> annotation in process() and outputs the aggregates result in
> collectionProcessingComplete() (or the overridden method with the
> correct name). If you want to parallelize it, you need a different
> solution with a resource or something.
>
> Best,
>
> Peter
>
>
>
> Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
> > Hi
> >
> > I am trying to use Ruta Query View to get a view of all matches for a
> > particular annotation type across a large set of .xmi files. However,
> > I am noticing something strange about Ruta Query View - it doesnt't
> > report lots of matches that are shown in the Annotation browser (and
> > which I believe are correct matches). For example, a given annotation
> > type tsCurrent has 4 matches in the file NCT0036712, but these matches
> > do not appear at all in the list of results in Ruta Query View when I
> > query for tsCurrent.  For some files, though, the results for all
> > matches do show up, and for other files, only a partial set of matches
> > are in the query results. I cannot understand why this is happening.
> > Perhaps my query syntax is wrong?  I can only find the one example in
> > the manual, which isn't much to go on.
> >
> > I am attaching a screenshot showing the AnnotationBrowser on the top
> > right in Eclipse, with all of the matches for tsCurrent, and the Ruta
> > Query view on bottom, which does not contain those matches. I think it
> > is easier to see the problem visually.
> >
> > Also,ultimately I am just trying to get a count of the number of times
> > certain annotations are made across all of my files. Is there a better
> > way to do that instead of Ruta Query View?  I can't find another way
> > to total matches across lots of files.
> >
> > thanks,
> > Bonnie MacKellar
> >
> > Inline image 1
>
>

Re: question on Ruta Query View

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


attachements are removed on this mailing list.


I would bet that some annotations are not visible to the rules, so they
are simply skipped -> query view reutrn no matches.


In Ruta, annotations are invisble if their begin or end are covered by
something invisible, that are all annoations of types that are filtered.
Most often, the annotations are missed because they start or and with a
space or line break.


You can trim annotation, e.g., with


RETAINTYPE(SPACE,BREAK);

tsCurrent{-> TRIM(SPACE,BREAK)};

RETAINTYPE;



You can use the query view for this use case. I have to mention that the
query view was build to serve as a tool during rule engineering: to get
a quick overview over the annotated documents. It does not scale with
the number of documents since there is not indexing across CASes and you
need to deserialze all CASes.

If it is fast enough, it is totally fine for counting annotations with
the query view.

You can also write a simple uimaFIT analysis engine and add it to the
pipeline or the the ruta script. The analysis engine counts the
annotation in process() and outputs the aggregates result in
collectionProcessingComplete() (or the overridden method with the
correct name). If you want to parallelize it, you need a different
solution with a resource or something.

Best,

Peter



Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
> Hi
>
> I am trying to use Ruta Query View to get a view of all matches for a
> particular annotation type across a large set of .xmi files. However,
> I am noticing something strange about Ruta Query View - it doesnt't
> report lots of matches that are shown in the Annotation browser (and
> which I believe are correct matches). For example, a given annotation
> type tsCurrent has 4 matches in the file NCT0036712, but these matches
> do not appear at all in the list of results in Ruta Query View when I
> query for tsCurrent.  For some files, though, the results for all
> matches do show up, and for other files, only a partial set of matches
> are in the query results. I cannot understand why this is happening.
> Perhaps my query syntax is wrong?  I can only find the one example in
> the manual, which isn't much to go on. 
>
> I am attaching a screenshot showing the AnnotationBrowser on the top
> right in Eclipse, with all of the matches for tsCurrent, and the Ruta
> Query view on bottom, which does not contain those matches. I think it
> is easier to see the problem visually.
>
> Also,ultimately I am just trying to get a count of the number of times
> certain annotations are made across all of my files. Is there a better
> way to do that instead of Ruta Query View?  I can't find another way
> to total matches across lots of files.
>
> thanks,
> Bonnie MacKellar
>
> Inline image 1