You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Wayne W <wa...@gmail.com> on 2012/01/17 06:32:53 UTC

Tika memory leak?

Hi,

we're using Solr running on tomcat with 1GB in production, and of late
we've been having a huge number of OutOfMemory issues. It seems from
what I can tell this is coming from the tika extraction ( tika-0.2.jar) of the
content. I've processed the java dump file using a memory analyzer and
its pretty clean at least the class involved. It seems like a leak to
me, as we don't parse any files larger than 20M, and these objects are
taking up ~700M

You can see screen shots here:
http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png


But to summarize (class, number of objects, Used heap size, Retained Heap Size):


org.apache.xmlbeans.impl.store.Xob$ElementXObj  838,993  80,533,728  604,606,040

org.apache.poi.openxml4j.opc.ZipPackage   2 112   87,009,848
char[]   587    32,216,960    38,216,950


We're really desperate to find a solution to this - any ideas or help
is greatly appreciated.

I didn't realize we'd got so far behind on the version we have, I need
to see however if the latest version will work with Solr ( I have a
feeling won't).

Wayne

Re: Tika memory leak?

Posted by Wayne W <wa...@gmail.com>.
thanks Daan

On Tue, Jan 17, 2012 at 9:08 PM, Daan de Wit <d....@o3spaces.com> wrote:
> Hi Wayne,
>
> Older versions of Tika have memory issues with parsing certain types of
> Excel sheets. It would be best to upgrade your version of Tika to the latest
> stable version.
>
> Best,
> Daan
>
> On 17 January 2012 06:32, Wayne W <wa...@gmail.com> wrote:
>>
>> Hi,
>>
>> we're using Solr running on tomcat with 1GB in production, and of late
>> we've been having a huge number of OutOfMemory issues. It seems from
>> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of
>> the
>> content. I've processed the java dump file using a memory analyzer and
>> its pretty clean at least the class involved. It seems like a leak to
>> me, as we don't parse any files larger than 20M, and these objects are
>> taking up ~700M
>>
>> You can see screen shots here:
>>
>> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
>>
>> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
>>
>>
>> But to summarize (class, number of objects, Used heap size, Retained Heap
>> Size):
>>
>>
>> org.apache.xmlbeans.impl.store.Xob$ElementXObj  838,993  80,533,728
>>  604,606,040
>>
>> org.apache.poi.openxml4j.opc.ZipPackage   2 112   87,009,848
>> char[]   587    32,216,960    38,216,950
>>
>>
>> We're really desperate to find a solution to this - any ideas or help
>> is greatly appreciated.
>>
>> I didn't realize we'd got so far behind on the version we have, I need
>> to see however if the latest version will work with Solr ( I have a
>> feeling won't).
>>
>> Wayne
>
>

Re: Tika memory leak?

Posted by Daan de Wit <d....@o3spaces.com>.
Hi Wayne,

Older versions of Tika have memory issues with parsing certain types of
Excel sheets. It would be best to upgrade your version of Tika to the
latest stable version.

Best,
Daan

On 17 January 2012 06:32, Wayne W <wa...@gmail.com> wrote:

> Hi,
>
> we're using Solr running on tomcat with 1GB in production, and of late
> we've been having a huge number of OutOfMemory issues. It seems from
> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of
> the
> content. I've processed the java dump file using a memory analyzer and
> its pretty clean at least the class involved. It seems like a leak to
> me, as we don't parse any files larger than 20M, and these objects are
> taking up ~700M
>
> You can see screen shots here:
>
> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
>
> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
>
>
> But to summarize (class, number of objects, Used heap size, Retained Heap
> Size):
>
>
> org.apache.xmlbeans.impl.store.Xob$ElementXObj  838,993  80,533,728
>  604,606,040
>
> org.apache.poi.openxml4j.opc.ZipPackage   2 112   87,009,848
> char[]   587    32,216,960    38,216,950
>
>
> We're really desperate to find a solution to this - any ideas or help
> is greatly appreciated.
>
> I didn't realize we'd got so far behind on the version we have, I need
> to see however if the latest version will work with Solr ( I have a
> feeling won't).
>
> Wayne
>