You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Peter Klügl <pk...@uni-wuerzburg.de> on 2012/05/04 15:36:24 UTC

Experiences with CAS Editor and large documents

  Hi,

can anyone share some experience how much the CAS Editor can handle?

I am trying to open a xmiCAs with about 0.5M words and 1M annotations, 
but my attempts are not very successful.

After a short look at the implementation, I think the bottleneck is the 
annotation model. I am not really familiar with the code. Jörn, is it 
neccessary to add a jface.Annotation for each uima.Annotation even if it 
isn't displayed?

Best,

Peter

-- 
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg        Tel.: +49-(0)931-31-86741
Am Hubland                  Fax.: +49-(0)931-31-86732
97074 Würzburg              mail: pkluegl@informatik.uni-wuerzburg.de
      http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------


Re: Experiences with CAS Editor and large documents

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
  Hi,

it is not only a performance problem, but also a memory problem.

If I comment line 1244 in AnnotationEditor.java, that is the code 
IAnnotationModelExtension.replaceAnnotations() in method 
AnnotationEditor.syncAnnotations(), it takes a while, but the CAS Editor 
was finally able to display the xmiCAS. Of course, I could then not 
highlight the annotations.

Peter



On 04.05.2012 16:30, Jörn Kottmann wrote:
> On 05/04/2012 04:25 PM, Thilo Goetz wrote:
>> I have not looked at the code, but just in case: I have found
>> that from a performance perspective, it is very important to
>> add JFace annotations in batches.  You're probably doing that
>> already...
>
> Yes, that was changed in many places and made operations
> which where done for many annotations possible, before that deleting
> something could easily take a couple of 10 seconds, or even minutes.
>
> There are certain things which can still be done, and might be necessary
> for very large CASes, e.g using virtual tables.
>
> I hope that profiling it can give us more insight about what is slow.
>
> Jörn


-- 
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg        Tel.: +49-(0)931-31-86741
Am Hubland                  Fax.: +49-(0)931-31-86732
97074 Würzburg              mail: pkluegl@informatik.uni-wuerzburg.de
      http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------


Re: Experiences with CAS Editor and large documents

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
  I don't need it for the TextMarker release. If I need to deactivate it 
for some projects, I can do it in my workspace for now. So no hurry at all.

My thought was to add such a preference, as for the loading the CAS 
leniently. However, if there is a possibility that we get the 
annotations working with large CAS, I'd really prefer that.

Peter

On 12.07.2012 10:30, Jörn Kottmann wrote:
> Yes, I think that will break many things. Still haven't looked into
> the sample for profiling you send me off list.
>
> If you need this for your release and we don't get it faster easily
> we could add the preference to do that and display a warning to the user.
>
> What do you think?
>
> Jörn
>
> On 07/12/2012 10:24 AM, Peter Klügl wrote:
>>  Hi,
>>
>> do you have an opinion about a preference to deactivate the 
>> annotation concept in the CAS Editor?
>>
>> Peter
>>
>> On 25.05.2012 15:20, Peter Klügl wrote:
>>>  Hi,
>>>
>>> Jörn, do you have yet an idea how to improve the CAS Editor 
>>> performance?
>>>
>>> If at all or until we find a solution, I would propose to add an 
>>> option in the preferences to deactive the complete Eclipse 
>>> annotation concept in the CAS Editor. We would loose the nice 
>>> highlighting, but we would still have the selection concept to 
>>> identify UIMA annotations.
>>>
>>> What do you think?
>>>
>>> Peter
>>>
>>>
>>> On 04.05.2012 16:30, Jörn Kottmann wrote:
>>>> On 05/04/2012 04:25 PM, Thilo Goetz wrote:
>>>>> I have not looked at the code, but just in case: I have found
>>>>> that from a performance perspective, it is very important to
>>>>> add JFace annotations in batches.  You're probably doing that
>>>>> already...
>>>>
>>>> Yes, that was changed in many places and made operations
>>>> which where done for many annotations possible, before that deleting
>>>> something could easily take a couple of 10 seconds, or even minutes.
>>>>
>>>> There are certain things which can still be done, and might be 
>>>> necessary
>>>> for very large CASes, e.g using virtual tables.
>>>>
>>>> I hope that profiling it can give us more insight about what is slow.
>>>>
>>>> Jörn
>>>
>>>
>>
>>


-- 
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg        Tel.: +49-(0)931-31-86741
Am Hubland                  Fax.: +49-(0)931-31-86732
97074 Würzburg              mail: pkluegl@informatik.uni-wuerzburg.de
      http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------


Re: Experiences with CAS Editor and large documents

Posted by Jörn Kottmann <ko...@gmail.com>.
Yes, I think that will break many things. Still haven't looked into
the sample for profiling you send me off list.

If you need this for your release and we don't get it faster easily
we could add the preference to do that and display a warning to the user.

What do you think?

Jörn

On 07/12/2012 10:24 AM, Peter Klügl wrote:
>  Hi,
>
> do you have an opinion about a preference to deactivate the annotation 
> concept in the CAS Editor?
>
> Peter
>
> On 25.05.2012 15:20, Peter Klügl wrote:
>>  Hi,
>>
>> Jörn, do you have yet an idea how to improve the CAS Editor performance?
>>
>> If at all or until we find a solution, I would propose to add an 
>> option in the preferences to deactive the complete Eclipse annotation 
>> concept in the CAS Editor. We would loose the nice highlighting, but 
>> we would still have the selection concept to identify UIMA annotations.
>>
>> What do you think?
>>
>> Peter
>>
>>
>> On 04.05.2012 16:30, Jörn Kottmann wrote:
>>> On 05/04/2012 04:25 PM, Thilo Goetz wrote:
>>>> I have not looked at the code, but just in case: I have found
>>>> that from a performance perspective, it is very important to
>>>> add JFace annotations in batches.  You're probably doing that
>>>> already...
>>>
>>> Yes, that was changed in many places and made operations
>>> which where done for many annotations possible, before that deleting
>>> something could easily take a couple of 10 seconds, or even minutes.
>>>
>>> There are certain things which can still be done, and might be 
>>> necessary
>>> for very large CASes, e.g using virtual tables.
>>>
>>> I hope that profiling it can give us more insight about what is slow.
>>>
>>> Jörn
>>
>>
>
>



Re: Experiences with CAS Editor and large documents

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
  Hi,

do you have an opinion about a preference to deactivate the annotation 
concept in the CAS Editor?

Peter

On 25.05.2012 15:20, Peter Klügl wrote:
>  Hi,
>
> Jörn, do you have yet an idea how to improve the CAS Editor performance?
>
> If at all or until we find a solution, I would propose to add an 
> option in the preferences to deactive the complete Eclipse annotation 
> concept in the CAS Editor. We would loose the nice highlighting, but 
> we would still have the selection concept to identify UIMA annotations.
>
> What do you think?
>
> Peter
>
>
> On 04.05.2012 16:30, Jörn Kottmann wrote:
>> On 05/04/2012 04:25 PM, Thilo Goetz wrote:
>>> I have not looked at the code, but just in case: I have found
>>> that from a performance perspective, it is very important to
>>> add JFace annotations in batches.  You're probably doing that
>>> already...
>>
>> Yes, that was changed in many places and made operations
>> which where done for many annotations possible, before that deleting
>> something could easily take a couple of 10 seconds, or even minutes.
>>
>> There are certain things which can still be done, and might be necessary
>> for very large CASes, e.g using virtual tables.
>>
>> I hope that profiling it can give us more insight about what is slow.
>>
>> Jörn
>
>


-- 
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg        Tel.: +49-(0)931-31-86741
Am Hubland                  Fax.: +49-(0)931-31-86732
97074 Würzburg              mail: pkluegl@informatik.uni-wuerzburg.de
      http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------


Re: Experiences with CAS Editor and large documents

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
  Hi,

Jörn, do you have yet an idea how to improve the CAS Editor performance?

If at all or until we find a solution, I would propose to add an option 
in the preferences to deactive the complete Eclipse annotation concept 
in the CAS Editor. We would loose the nice highlighting, but we would 
still have the selection concept to identify UIMA annotations.

What do you think?

Peter


On 04.05.2012 16:30, Jörn Kottmann wrote:
> On 05/04/2012 04:25 PM, Thilo Goetz wrote:
>> I have not looked at the code, but just in case: I have found
>> that from a performance perspective, it is very important to
>> add JFace annotations in batches.  You're probably doing that
>> already...
>
> Yes, that was changed in many places and made operations
> which where done for many annotations possible, before that deleting
> something could easily take a couple of 10 seconds, or even minutes.
>
> There are certain things which can still be done, and might be necessary
> for very large CASes, e.g using virtual tables.
>
> I hope that profiling it can give us more insight about what is slow.
>
> Jörn


-- 
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg        Tel.: +49-(0)931-31-86741
Am Hubland                  Fax.: +49-(0)931-31-86732
97074 Würzburg              mail: pkluegl@informatik.uni-wuerzburg.de
      http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------


Re: Experiences with CAS Editor and large documents

Posted by Jörn Kottmann <ko...@gmail.com>.
On 05/04/2012 04:25 PM, Thilo Goetz wrote:
> I have not looked at the code, but just in case: I have found
> that from a performance perspective, it is very important to
> add JFace annotations in batches.  You're probably doing that
> already...

Yes, that was changed in many places and made operations
which where done for many annotations possible, before that deleting
something could easily take a couple of 10 seconds, or even minutes.

There are certain things which can still be done, and might be necessary
for very large CASes, e.g using virtual tables.

I hope that profiling it can give us more insight about what is slow.

Jörn

Re: Experiences with CAS Editor and large documents

Posted by Thilo Goetz <tw...@gmx.de>.
I have not looked at the code, but just in case: I have found
that from a performance perspective, it is very important to
add JFace annotations in batches.  You're probably doing that
already...

--Thilo

On 04/05/12 16:18, Jörn Kottmann wrote:
> Hello,
> 
> I already fixed many performance issues for large CASes, but
> if the CAS gets larger there might be more things which become
> problematic.
> 
> Well, I don't really know what is slow there ... so I suggest we do some
> profiling to identify what can be improved.
> 
> Jörn
> 
> On 05/04/2012 03:36 PM, Peter Klügl wrote:
>>  Hi,
>>
>> can anyone share some experience how much the CAS Editor can handle?
>>
>> I am trying to open a xmiCAs with about 0.5M words and 1M annotations,
>> but my attempts are not very successful.
>>
>> After a short look at the implementation, I think the bottleneck is
>> the annotation model. I am not really familiar with the code. Jörn, is
>> it neccessary to add a jface.Annotation for each uima.Annotation even
>> if it isn't displayed?
>>
>> Best,
>>
>> Peter
>>
> 


Re: Experiences with CAS Editor and large documents

Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,

I already fixed many performance issues for large CASes, but
if the CAS gets larger there might be more things which become
problematic.

Well, I don't really know what is slow there ... so I suggest we do some
profiling to identify what can be improved.

Jörn

On 05/04/2012 03:36 PM, Peter Klügl wrote:
>  Hi,
>
> can anyone share some experience how much the CAS Editor can handle?
>
> I am trying to open a xmiCAs with about 0.5M words and 1M annotations, 
> but my attempts are not very successful.
>
> After a short look at the implementation, I think the bottleneck is 
> the annotation model. I am not really familiar with the code. Jörn, is 
> it neccessary to add a jface.Annotation for each uima.Annotation even 
> if it isn't displayed?
>
> Best,
>
> Peter
>