You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Danai Wiriyayanyongsuk <da...@gmail.com> on 2007/08/21 03:06:17 UTC

Questions regarding the Heap class and the heap size.

Hi Folks,

I'm trying to understand how the Heap class works and would like to ask few
questions as followed:

1) Is the following statement correct?
"The class org.apache.uima.cas.impl.Heap has an internal integer array named
"heap" which maintains the type system structure. The required space for the
"heap" is solely depends on the type system structure (basically, number of
features). It does not depend on, say, the size of String (or any) arrays or
the length of the document text (Sofa)."

2) If the above statement is correct, should we reduce the default minimum
heap size (Heap.MIN_PAGE_SIZE) from 1000 to, say, 100, and the default
initial heap size (Heap. DEFAULT_PAGE_SIZE and
CASImpl.DEFAULT_INITIAL_HEAP_SIZE) from 500000 to, say, 1000 as the current
default values might be too high to maintain a type system? From my
experiment, using a CAS object with around 30 feature definitions in a type
system uses about only 78 items (integers) of the "heap".
I could be totally wrong on these. Any opinions/advices would be very much
appreciated.

Thanks in advance,

Danai Wiriyayanyongsuk

Re: Questions regarding the Heap class and the heap size.

Posted by Adam Lally <al...@alum.rpi.edu>.
On 8/22/07, Danai Wiriyayanyongsuk <da...@gmail.com> wrote:
> The sizes of the following array types *impact* the size of the heap:
>   - integer and float (as the values will be directly stored in the heap)
>   - String and other feature structures (as the address of each element
> pointing to the related position on other heaps will be stored in the heap)
>
> On the other hands, the sizes of the following array types do *not* impact
> the size of the heap:
>   - boolean, byte, short, long and double (as there are different heaps to
> store the data)
>
> Is my understanding correct?
>

Yes that's correct.  Those data types have their own heaps with
different sized cells.

-Adam

Re: Questions regarding the Heap class and the heap size.

Posted by Danai Wiriyayanyongsuk <da...@gmail.com>.
Adam Lally wrote:
>  Just to clarify: a notable exception is that Strings aren't stored on
>  the heap.  So storing long strings (including the document text) will
>  not increase the heap size.  However, using large arrays *will*
>  increase the heap size.

Thanks Thilo and Adam for the helpful information.

I'd like to understand a bit more on the quote "However, using large arrays
*will* increase the heap size". From my understanding, not all types of
arrays have their size impact the heap (Heap.heap) size. Specifically:

The sizes of the following array types *impact* the size of the heap:
  - integer and float (as the values will be directly stored in the heap)
  - String and other feature structures (as the address of each element
pointing to the related position on other heaps will be stored in the heap)

On the other hands, the sizes of the following array types do *not* impact
the size of the heap:
  - boolean, byte, short, long and double (as there are different heaps to
store the data)

Is my understanding correct?

Thanks,
Danai Wiriyayanyongsuk

Re: Questions regarding the Heap class and the heap size.

Posted by Adam Lally <al...@alum.rpi.edu>.
On 8/22/07, Thilo Goetz <tw...@gmx.de> wrote:
> All the data that your analysis generates (with a few exceptions) lives
> on the heap.  So depending on how many annotations you create, the heap
> may grow very large.  It is usually several times the size of the input
> document.  I've personally had applications where the CAS (most of which
> is the heap) would on average be about 50 times the size of the input
> document.

Just to clarify: a notable exception is that Strings aren't stored on
the heap.  So storing long strings (including the document text) will
not increase the heap size.  However, using large arrays *will*
increase the heap size.

-Adam

Re: Questions regarding the Heap class and the heap size.

Posted by Thilo Goetz <tw...@gmx.de>.
Danai Wiriyayanyongsuk wrote:
> Thanks Marshall and Thilo for shading some light.
> 
> 
> 
> Besides the instances of feature structures (which I guess that it usually
> does not require much of the "Heap.heap" space), are there any kinds of
> information that might require big chunks of the "Heap.heap" space e.g.
> hundreds of array's elements that I should be aware of?
[...]

All the data that your analysis generates (with a few exceptions) lives
on the heap.  So depending on how many annotations you create, the heap
may grow very large.  It is usually several times the size of the input
document.  I've personally had applications where the CAS (most of which
is the heap) would on average be about 50 times the size of the input
document.

Unfortunately there is no good way to get at this data via APIs.  The
way I got this information was by triggering Java heap dumps and looking
at the size of the data structures on the Java heap.

HTH,
Thilo


Re: Questions regarding the Heap class and the heap size.

Posted by Danai Wiriyayanyongsuk <da...@gmail.com>.
Thanks Marshall and Thilo for shading some light.



Besides the instances of feature structures (which I guess that it usually
does not require much of the "Heap.heap" space), are there any kinds of
information that might require big chunks of the "Heap.heap" space e.g.
hundreds of array's elements that I should be aware of?


The reason I'm asking this kind of questions is that I'm trying to figure
out the proper initial heap size I should set for my application.


Thanks,



Danai Wiriyayanyongsuk


On 8/21/07, Thilo Goetz <tw...@gmx.de> wrote:
>
> Marshall Schor wrote:
> > Danai Wiriyayanyongsuk wrote:
> >> Hi Folks,
> >>
> >> I'm trying to understand how the Heap class works and would like to
> >> ask few
> >> questions as followed:
> >>
> >> 1) Is the following statement correct?
> >> "The class org.apache.uima.cas.impl.Heap has an internal integer array
> >> named
> >> "heap" which maintains the type system structure. The required space
> >> for the
> >> "heap" is solely depends on the type system structure (basically,
> >> number of
> >> features). It does not depend on, say, the size of String (or any)
> >> arrays or
> >> the length of the document text (Sofa)."
> >>
> > Actually, the heap is used to store all kinds of things, but most
> > particularly, instances of feature structures made
> > by your application, such as annotations, etc.
> >
> > -Marshall
>
> And to be explicit, the heap does *not* maintain the type
> system.  That's done in org.apache.uima.cas.impl.TypeSystemImpl.
>
> --Thilo
>

Re: Questions regarding the Heap class and the heap size.

Posted by Thilo Goetz <tw...@gmx.de>.
Marshall Schor wrote:
> Danai Wiriyayanyongsuk wrote:
>> Hi Folks,
>>
>> I'm trying to understand how the Heap class works and would like to
>> ask few
>> questions as followed:
>>
>> 1) Is the following statement correct?
>> "The class org.apache.uima.cas.impl.Heap has an internal integer array
>> named
>> "heap" which maintains the type system structure. The required space
>> for the
>> "heap" is solely depends on the type system structure (basically,
>> number of
>> features). It does not depend on, say, the size of String (or any)
>> arrays or
>> the length of the document text (Sofa)."
>>   
> Actually, the heap is used to store all kinds of things, but most
> particularly, instances of feature structures made
> by your application, such as annotations, etc.
> 
> -Marshall

And to be explicit, the heap does *not* maintain the type
system.  That's done in org.apache.uima.cas.impl.TypeSystemImpl.

--Thilo

Re: Questions regarding the Heap class and the heap size.

Posted by Marshall Schor <ms...@schor.com>.
Danai Wiriyayanyongsuk wrote:
> Hi Folks,
>
> I'm trying to understand how the Heap class works and would like to ask few
> questions as followed:
>
> 1) Is the following statement correct?
> "The class org.apache.uima.cas.impl.Heap has an internal integer array named
> "heap" which maintains the type system structure. The required space for the
> "heap" is solely depends on the type system structure (basically, number of
> features). It does not depend on, say, the size of String (or any) arrays or
> the length of the document text (Sofa)."
>   
Actually, the heap is used to store all kinds of things, but most 
particularly, instances of feature structures made
by your application, such as annotations, etc.

-Marshall