You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Tommaso Teofili <to...@gmail.com> on 2009/03/08 15:58:04 UTC

Re: UIMA : java.lang.OutOfMemoryError: Java heap space.....

2009/3/8 Marshall Schor <ms...@schor.com>

> Balkrishnan.V wrote:
> > Marshall Schor <ms...@...> writes:
> >
> >
> >> Did you try increasing your java heap size?  What size are you running
> with?
> >>
> >> -Marshall
> >>
> >>
> >
> > Hi Marshall,
> >
> > ThankYou for your time.
> >
> > Yes, I tried resetting the jvm-heap-size to 1327MB and now I am able to
> run the
> > CVD successfully with 101MB text-log-file.
> >
> > Can you please let me know :
> > (i) Is this maximum heap-size setting (1327MB) enough to handle a
> text-file of
> > size 2GB(say) ?
> It depends on what you're running in UIMA.  But, since 1327MB = about
> 1.3 GB, and that's less than the size of the text file you're hoping to
> process (2GB), maybe it it isn't big enough.  But, it really depends on
> the components you're running within UIMA.  For instance, you might be
> running components which you've written, which read a small part of the
> 2GB text file, as a "chunk", and then process that through the pipeline
> of UIMA components, and then the next small chunk could be processed,
> etc.  It really just depends on the particular processing you're doing.
> > If not, then any idea the approximate maximum text-file size
> > that can be handled with this heap-size setting ?
> >
> No, again it depends not on UIMA, but more on the components UIMA is
> running as annotators.
> > (ii) Why I am unable to increase the JVM Heap-Size more than 1327MB ?
> >
> I don't know.  It may be a limitation of your particular machine.  We
> have seen users who have 64-bit linux machines, running with heap sizes
> of 16 GB, for instance.


This one could be the common issue with JVM and 32 bit architectures: java
heap space is required to be contiguous but, due to 32 bit processor, only
4Gb of maximum contiguous memory can be addressed so, depending on the OS
and the physical memory you have, the amount of addressable user-space
memory can be much less of the total available physical memory.
Claiming this problem to Sun's JVM developers I simply obtained this answer:
"Buy a 64-bit machine"; this implies they don't want to make possible to
address non contiguous space for the JVM.
So, as Marshall said, I also think that splitting your big problem in some
lighter problems could be the solution.
Regards,
Tommaso



>
> > (iii) What should I do to handle the text-files that cannot be handled
> within
> > the bounds of the heap-size setting of 1327MB ?
> >
> A good approach is to break these things up into more managable
> work-units, and process each one separately.
>
> -Marshall
> > Kind Regards,
> > Balkrishnan.V
> >
> >
> >
> >
>

Re: UIMA : java.lang.OutOfMemoryError: Java heap space.....

Posted by Thilo Goetz <tw...@gmx.de>.
Marshall Schor wrote:
> Thanks, Thilo, good points!
> 
> Another fine point below
> 
> Thilo Goetz wrote:
[...]
>> * On 32-bit windows, a process may use up to 2GB of
>> memory, not 4GB.  Subtract from that the memory that
>> the JVM needs, and you get to some number around 1.4GB
>> as the maximum JVM heap space you can allocate.
>>   
> Actually, there seems to be a way to get Windows XP and Server to let
> users have 3GB, not 2GB, but you have to change a setting.  See
> http://msdn.microsoft.com/en-us/library/ms791558.aspx

This switch has cost me weeks of my working life,
with random software failures which finally turned
out to be caused by windows running out of resource
handles very quickly because of this switch.  So I
wouldn't recommend it ;-)

--Thilo

Re: UIMA : java.lang.OutOfMemoryError: Java heap space.....

Posted by Marshall Schor <ms...@schor.com>.
Thanks, Thilo, good points!

Another fine point below

Thilo Goetz wrote:
> Just a few more points on this fascinating topic.
>
> * The JVM internally represents characters as UTF16.
> This means that any ascii text will use twice as much
> memory in the JVM as on disk.
>
> * While reading in the file, you will likely do some
> copying.  Even if you allocate a char[] of the right
> size ahead of time and use that as a buffer to read
> in your file, you'll copy that data when you create
> a string out of it.  So you'll need double the
> amount of the final String memory while reading it
> in.  To the best of my knowledge, there is no way
> around this issue, at least if you want to end up
> with a regular Java string.
>
> * Strings in the JVM use a char[] internally.  So you
> are not only constrained by the maximum heap size, but
> also by the maximum array size on the particular JVM
> implementation you're using.  This detail is buried
> deep down in your JVM documentation.  I don't know
> what the numbers are nowadays, but they used to be
> quite low in the Java 1.4 days.  This may have changed.
>
> * On 32-bit windows, a process may use up to 2GB of
> memory, not 4GB.  Subtract from that the memory that
> the JVM needs, and you get to some number around 1.4GB
> as the maximum JVM heap space you can allocate.
>   
Actually, there seems to be a way to get Windows XP and Server to let
users have 3GB, not 2GB, but you have to change a setting.  See
http://msdn.microsoft.com/en-us/library/ms791558.aspx

-Marshall
> So the upshot is that on 32bit windows, you can't
> read in ascii files into a String that are larger
> than 350MB or so.  The number may be a lot smaller,
> depending on your JVM and how clever your implementation
> is.
>
> In addition, you want to do some UIMA analysis.
> Consider that this needs space, too.  Depending on
> your analysis, the size of the CAS may easily be
> 10 times the size of your text, or more.
>
> So read in your large files in chunks no larger than
> 5 MB, is my recommendation.  If you have files that
> big, you're probably not concerned with the fact that
> you may be cutting up a word here and there.  Still,
> you can try to place splits at end-of-sentence
> characters or whitespace.
>
> --Thilo
>
>
>   

Re: UIMA : java.lang.OutOfMemoryError: Java heap space.....

Posted by Thilo Goetz <tw...@gmx.de>.
Just a few more points on this fascinating topic.

* The JVM internally represents characters as UTF16.
This means that any ascii text will use twice as much
memory in the JVM as on disk.

* While reading in the file, you will likely do some
copying.  Even if you allocate a char[] of the right
size ahead of time and use that as a buffer to read
in your file, you'll copy that data when you create
a string out of it.  So you'll need double the
amount of the final String memory while reading it
in.  To the best of my knowledge, there is no way
around this issue, at least if you want to end up
with a regular Java string.

* Strings in the JVM use a char[] internally.  So you
are not only constrained by the maximum heap size, but
also by the maximum array size on the particular JVM
implementation you're using.  This detail is buried
deep down in your JVM documentation.  I don't know
what the numbers are nowadays, but they used to be
quite low in the Java 1.4 days.  This may have changed.

* On 32-bit windows, a process may use up to 2GB of
memory, not 4GB.  Subtract from that the memory that
the JVM needs, and you get to some number around 1.4GB
as the maximum JVM heap space you can allocate.

So the upshot is that on 32bit windows, you can't
read in ascii files into a String that are larger
than 350MB or so.  The number may be a lot smaller,
depending on your JVM and how clever your implementation
is.

In addition, you want to do some UIMA analysis.
Consider that this needs space, too.  Depending on
your analysis, the size of the CAS may easily be
10 times the size of your text, or more.

So read in your large files in chunks no larger than
5 MB, is my recommendation.  If you have files that
big, you're probably not concerned with the fact that
you may be cutting up a word here and there.  Still,
you can try to place splits at end-of-sentence
characters or whitespace.

--Thilo