You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Wayne W <wa...@gmail.com> on 2012/01/17 06:32:53 UTC
Tika memory leak?
Hi,
we're using Solr running on tomcat with 1GB in production, and of late
we've been having a huge number of OutOfMemory issues. It seems from
what I can tell this is coming from the tika extraction ( tika-0.2.jar) of the
content. I've processed the java dump file using a memory analyzer and
its pretty clean at least the class involved. It seems like a leak to
me, as we don't parse any files larger than 20M, and these objects are
taking up ~700M
You can see screen shots here:
http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
But to summarize (class, number of objects, Used heap size, Retained Heap Size):
org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 80,533,728 604,606,040
org.apache.poi.openxml4j.opc.ZipPackage 2 112 87,009,848
char[] 587 32,216,960 38,216,950
We're really desperate to find a solution to this - any ideas or help
is greatly appreciated.
I didn't realize we'd got so far behind on the version we have, I need
to see however if the latest version will work with Solr ( I have a
feeling won't).
Wayne
Re: Tika memory leak?
Posted by Wayne W <wa...@gmail.com>.
thanks Daan
On Tue, Jan 17, 2012 at 9:08 PM, Daan de Wit <d....@o3spaces.com> wrote:
> Hi Wayne,
>
> Older versions of Tika have memory issues with parsing certain types of
> Excel sheets. It would be best to upgrade your version of Tika to the latest
> stable version.
>
> Best,
> Daan
>
> On 17 January 2012 06:32, Wayne W <wa...@gmail.com> wrote:
>>
>> Hi,
>>
>> we're using Solr running on tomcat with 1GB in production, and of late
>> we've been having a huge number of OutOfMemory issues. It seems from
>> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of
>> the
>> content. I've processed the java dump file using a memory analyzer and
>> its pretty clean at least the class involved. It seems like a leak to
>> me, as we don't parse any files larger than 20M, and these objects are
>> taking up ~700M
>>
>> You can see screen shots here:
>>
>> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
>>
>> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
>>
>>
>> But to summarize (class, number of objects, Used heap size, Retained Heap
>> Size):
>>
>>
>> org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 80,533,728
>> 604,606,040
>>
>> org.apache.poi.openxml4j.opc.ZipPackage 2 112 87,009,848
>> char[] 587 32,216,960 38,216,950
>>
>>
>> We're really desperate to find a solution to this - any ideas or help
>> is greatly appreciated.
>>
>> I didn't realize we'd got so far behind on the version we have, I need
>> to see however if the latest version will work with Solr ( I have a
>> feeling won't).
>>
>> Wayne
>
>
Re: Tika memory leak?
Posted by Daan de Wit <d....@o3spaces.com>.
Hi Wayne,
Older versions of Tika have memory issues with parsing certain types of
Excel sheets. It would be best to upgrade your version of Tika to the
latest stable version.
Best,
Daan
On 17 January 2012 06:32, Wayne W <wa...@gmail.com> wrote:
> Hi,
>
> we're using Solr running on tomcat with 1GB in production, and of late
> we've been having a huge number of OutOfMemory issues. It seems from
> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of
> the
> content. I've processed the java dump file using a memory analyzer and
> its pretty clean at least the class involved. It seems like a leak to
> me, as we don't parse any files larger than 20M, and these objects are
> taking up ~700M
>
> You can see screen shots here:
>
> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
>
> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
>
>
> But to summarize (class, number of objects, Used heap size, Retained Heap
> Size):
>
>
> org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 80,533,728
> 604,606,040
>
> org.apache.poi.openxml4j.opc.ZipPackage 2 112 87,009,848
> char[] 587 32,216,960 38,216,950
>
>
> We're really desperate to find a solution to this - any ideas or help
> is greatly appreciated.
>
> I didn't realize we'd got so far behind on the version we have, I need
> to see however if the latest version will work with Solr ( I have a
> feeling won't).
>
> Wayne
>