You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Johannes Utzig <ma...@jutzig.de> on 2010/02/08 20:18:13 UTC

Out Of Memory Error while indexing

Hi,

I am using Jackrabbit as a version control system for files (like 
CVS/SVN). I was doing some testing with large binary files and keep 
getting an out of memory error.
This is most likely a configuration mistake on my side, but I can’t seem 
to figure out what I’m doing wrong so hopefully someone is able to point 
me into the right direction.
The file I’m trying to commit (as the jcr:data Property of an 
nt:resource) is an iso image of about 700MB.
As the jcr:mimeType I used application/octet-stream in the hope that 
this would prevent Lucene from indexing the file (I don’t want files 
that large to be indexed). I would prefer to not completely disable 
indexing. Is there a way to prevent Jackrabbit from indexing 
extraordinary large properties?
As for the environment, I use Jackrabbit 1.6.0. The jackrabbit webapp is 
deployed in a tomcat and the clients connect with RMI.
The heap for tomcat is set to 512m
When I try to commit this file, both client and tomcat heap space 
consumption stay pretty low for most of the time. Then (I guess when all 
the contents have been transferred) the tomcat heap rapidly increases 
until I get an out of memory error with the stacktrace provided further 
down.
My repository.xml is the default one except for a DataStorage.
I was hoping that since I did not define any TextExtractors, that 
Jackrabbit would not try to index this huge binary property, but that’s 
apparently where it crashes.

Thanks in advance for any pointer.

Best regards,
Johannes

Repository Config:


<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD 
Jackrabbit 1.6//EN"
"http://jackrabbit.apache.org/dtd/repository-1.6.dtd">
<Repository>

<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository"/>
</FileSystem>

<Security appName="Jackrabbit">

<SecurityManager 
class="org.apache.jackrabbit.core.security.simple.SimpleSecurityManager" 
workspaceName="security">
</SecurityManager>
<AccessManager 
class="org.apache.jackrabbit.core.security.simple.SimpleAccessManager">
</AccessManager>

<LoginModule 
class="org.apache.jackrabbit.core.security.simple.SimpleLoginModule">
<param name="anonymousId" value="anonymous"/>
<param name="adminId" value="admin"/>
</LoginModule>
</Security>

<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>
<Workspace name="${wsp.name}">
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${wsp.home}"/>
</FileSystem>
<PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.DerbyPersistenceManager">
<param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
</PersistenceManager>
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="extractorPoolSize" value="2"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
</Workspace>

<Versioning rootPath="${rep.home}/version">
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/version" />
</FileSystem>
<PersistenceManager 
class="org.apache.jackrabbit.core.persistence.bundle.DerbyPersistenceManager">
<param name="url" value="jdbc:derby:${rep.home}/version/db;create=true"/>
<param name="schemaObjectPrefix" value="version_"/>
</PersistenceManager>
</Versioning>
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${rep.home}/repository/index"/>
<param name="extractorPoolSize" value="2"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>

<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
<param name="path" value="${rep.home}/repository/datastore"/>
<param name="minRecordLength" value="100"/>
</DataStore>
</Repository>


Stacktrace:

05.02.2010 14:48:16 *WARN * LazyTextExtractorField: Exception reading 
value for field: Stream closed (LazyText
ExtractorField.java, line 94)
Exception in thread "Timer-1" java.lang.IllegalStateException: this 
writer hit an OutOfMemoryError; cannot com
mit
at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3353)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3408)
at 
org.apache.jackrabbit.core.query.lucene.AbstractIndex.commit(AbstractIndex.java:363)
at 
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commit(VolatileIndex.java:141)
at 
org.apache.jackrabbit.core.query.lucene.PersistentIndex.copyIndex(PersistentIndex.java:105)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex$VolatileCommit.execute(MultiIndex.java:1984)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:1000)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.commitVolatileIndex(MultiIndex.java:1048)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.flush(MultiIndex.java:893)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:1164)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.access$100(MultiIndex.java:80)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex$1.run(MultiIndex.java:317)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
05.02.2010 14:48:16 *ERROR* SearchManager: Error indexing node. 
(SearchManager.java, line 490)
java.io.IOException: Java heap space
at 
org.apache.jackrabbit.core.query.lucene.Util.createIOException(Util.java:114)
at 
org.apache.jackrabbit.core.query.lucene.AbstractIndex.addDocuments(AbstractIndex.java:199)
at 
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commitPending(VolatileIndex.java:171)
at 
org.apache.jackrabbit.core.query.lucene.VolatileIndex.addDocuments(VolatileIndex.java:82)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex$AddNode.execute(MultiIndex.java:1599)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:1000)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:429)
at 
org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:588)
at org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:486)
at 
org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:244)
at 
org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.j
ava:201)
at 
org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:464) 

at 
org.apache.jackrabbit.core.observation.DelegatingObservationDispatcher.dispatch(DelegatingObservati
onDispatcher.java:127)
at 
org.apache.jackrabbit.core.observation.DelegatingObservationDispatcher.dispatchEvents(DelegatingObs
ervationDispatcher.java:99)
at 
org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:464) 

at 
org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:760) 

at 
org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1115)
at 
org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:351)
at 
org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
at 
org.apache.jackrabbit.core.version.AbstractVersionManager$WriteOperation.save(AbstractVersionManage
r.java:189)
at 
org.apache.jackrabbit.core.version.AbstractVersionManager.checkin(AbstractVersionManager.java:442)
at 
org.apache.jackrabbit.core.version.VersionManagerImpl$2.run(VersionManagerImpl.java:290)
at 
org.apache.jackrabbit.core.version.VersionManagerImpl$DynamicESCFactory.doSourced(VersionManagerImp
l.java:586)
at 
org.apache.jackrabbit.core.version.VersionManagerImpl.checkin(VersionManagerImpl.java:281)
at 
org.apache.jackrabbit.core.version.XAVersionManager.checkin(XAVersionManager.java:180)
at org.apache.jackrabbit.core.NodeImpl.checkin(NodeImpl.java:3367)
at org.apache.jackrabbit.core.NodeImpl.checkin(NodeImpl.java:3346)
at org.apache.jackrabbit.rmi.server.ServerNode.checkin(ServerNode.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at 
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField.stringValue(LazyTextExtractorField.j
ava:91)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:109)
at 
org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
234)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
at 
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:743)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1917)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1895)
at 
org.apache.jackrabbit.core.query.lucene.AbstractIndex$1.call(AbstractIndex.java:183)
at 
org.apache.jackrabbit.core.query.lucene.DynamicPooledExecutor$1.call(DynamicPooledExecutor.java:109
)
at EDU.oswego.cs.dl.util.concurrent.FutureResult$1.run(Unknown Source)
at 
EDU.oswego.cs.dl.util.concurrent.PooledExecutor$RunWhenBlocked.blockedAction(Unknown 
Source)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor.execute(Unknown Source)
at 
org.apache.jackrabbit.core.query.lucene.DynamicPooledExecutor.executeAndWait(DynamicPooledExecutor.
java:113)
at 
org.apache.jackrabbit.core.query.lucene.AbstractIndex.addDocuments(AbstractIndex.java:188)
at 
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commitPending(VolatileIndex.java:171)
at 
org.apache.jackrabbit.core.query.lucene.VolatileIndex.addDocuments(VolatileIndex.java:82)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex$AddNode.execute(MultiIndex.java:1599)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:1000)
at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:429)
at 
org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:588)
at org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:486)
at 
org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:244)
at 
org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.j
ava:201)
at 
org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:464) 

at 
org.apache.jackrabbit.core.observation.DelegatingObservationDispatcher.dispatch(DelegatingObservati
onDispatcher.java:127)
at 
org.apache.jackrabbit.core.observation.DelegatingObservationDispatcher.dispatchEvents(DelegatingObs
ervationDispatcher.java:99)
at 
org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:464)

Re: Out Of Memory Error while indexing

Posted by ma...@jutzig.de.

Hi Alexander,

please see comments inline

On Tue, 9 Feb 2010 11:52:59 +0100, Alexander Klimetschek <ak...@day.com>
wrote:
> On Tue, Feb 9, 2010 at 08:57, Thomas Müller <th...@day.com>
> wrote:
>>> the clients connect with RMI.
>>
>> I'm not sure, but that might be the problem.
> 
> Ideally (and AFAIK when using DataStore), the binary property should
> be stored in a temporary file before it is persisted, and that file
> stream can be used by the indexer. (Right?) Maybe in case of RMI that
> is not the case. Or maybe the config could be changed to a DataStore
> to avoid the problem.
> 

Yes, that is what I observed. It creates a temp file and therefore the
memory consumption stays low on both client and server side.
Before I used a DataStore, Jackrabbit ran out of memory while still
transfering the file. After I added a DataStore, I can succesfully transfer
the file, but then I get the OutOfMemoryError stated in the first post.
So I don't think RMI is the problem here, because the error stays the same
no matter if I use RMI or DavEx.

> In the worst case, the specific full text indexer loads everything
> into memory and is the actual problem.
> 

To me it looks as if that's exactly what's happening and what the heap dump
indicates.
The question is, what can I do about it :)

Thanks for your reply and best regards,
Johannes

Re: Out Of Memory Error while indexing

Posted by Alexander Klimetschek <ak...@day.com>.

On Tue, Feb 9, 2010 at 08:57, Thomas Müller <th...@day.com> wrote:
>> the clients connect with RMI.
>
> I'm not sure, but that might be the problem.

Ideally (and AFAIK when using DataStore), the binary property should
be stored in a temporary file before it is persisted, and that file
stream can be used by the indexer. (Right?) Maybe in case of RMI that
is not the case. Or maybe the config could be changed to a DataStore
to avoid the problem.

In the worst case, the specific full text indexer loads everything
into memory and is the actual problem.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Out Of Memory Error while indexing

Posted by Guo Du <mr...@gmail.com>.

On Tue, Feb 9, 2010 at 1:23 PM,  <ma...@jutzig.de> wrote:
>
> Oh my... sorry everybody for wasting your time, the problem is solved.
>
> I tried the solution with the search index configuration. It didn't work
> for some reason, so I did some more debugging.
> During that I found out that there was a bug in my code that caused the
> mimeType of this binary file to default to text/plain.
> No wonder it ran out of memory when trying to read a line of text from a
> huge binary file...
>
> Sorry again for this and thank you all for the great support.

Wondering is this kind of danger index activity could be seen as WARN
level log message. It may helps us to troubleshooting what's going on
inside.

-Guo

Re: Out Of Memory Error while indexing

Posted by Guo Du <mr...@gmail.com>.

On Tue, Feb 9, 2010 at 1:23 PM,  <ma...@jutzig.de> wrote:
>
> Oh my... sorry everybody for wasting your time, the problem is solved.
>
> I tried the solution with the search index configuration. It didn't work
> for some reason, so I did some more debugging.
> During that I found out that there was a bug in my code that caused the
> mimeType of this binary file to default to text/plain.
> No wonder it ran out of memory when trying to read a line of text from a
> huge binary file...
>
> Sorry again for this and thank you all for the great support.

Wondering is this kind of danger index activity could be seen as WARN
level log message. It may helps us to troubleshooting what's going on
inside.

-Guo

Re: Out Of Memory Error while indexing

Posted by ma...@jutzig.de.

Oh my... sorry everybody for wasting your time, the problem is solved.

I tried the solution with the search index configuration. It didn't work
for some reason, so I did some more debugging.
During that I found out that there was a bug in my code that caused the
mimeType of this binary file to default to text/plain.
No wonder it ran out of memory when trying to read a line of text from a
huge binary file...

Sorry again for this and thank you all for the great support.


Best regards,
Johannes

Re: Out Of Memory Error while indexing

Posted by Tomasz Łazarecki <to...@gmail.com>.

Hello,

Your solutions seem much better than my with new node type. Check them out
and write some info, if it works. :)

.T

2010/2/9 <ma...@jutzig.de>

>
> Hi Tomasz,
>
> comments below
>
> On Tue, 9 Feb 2010 11:38:56 +0100, Tomasz Łazarecki
> <to...@gmail.com> wrote:
> > Hello,
> > That's only an example not the greatest solution.
> >
> > So, here we go...
> > 1.I would register my nodeType VersionControlNode.
> > 2.Next i would add two content props: one for the smaller and second for
> > the
> > bigger files in the VersionControlNode.
> > 3.I'm providing the client api so i would check
> > if(fileSize>MAXIMUM_SMALL_FILE_SIZE) then i'm putting content into the
> > jcr:bigFIle, else to the
> > jcr:smallFile.
> > 4. Because Jackrabbit can show to lucene which content should be indexed
> > and
> > analayzed i'm doning it via the proper configuration. By that i mean
> > jcr:smallFile is indexed, jcr:bigFile is not.
> >
> > As i get it from your e-mail, you are providing the client api.
> > ps:I don't find a solution for the defualt configuration...
> >
> > Best regards/Pozdrawiam
> > Tomasz Łazarecki
> >
> >
> Could one of these solutions work as well?
> 1. I'm not very familiar with the syntax, so I don't know if this is
> correct...
>   <index-rule nodeType="nt:file"
>              condition="@jcr:mimeType != 'bigFile'">
>    <property>jcr:data</property>
>  </index-rule>
>
> 2. Create my own TextExtractor for mimeType 'bigFile' that returns a fake
> reader and therefore prevents the file to be loaded into memory.
>
> Best regards,
> Johannes
>

Re: Out Of Memory Error while indexing

Posted by ma...@jutzig.de.

Hi Tomasz,

comments below

On Tue, 9 Feb 2010 11:38:56 +0100, Tomasz Łazarecki
<to...@gmail.com> wrote:
> Hello,
> That's only an example not the greatest solution.
> 
> So, here we go...
> 1.I would register my nodeType VersionControlNode.
> 2.Next i would add two content props: one for the smaller and second for
> the
> bigger files in the VersionControlNode.
> 3.I'm providing the client api so i would check
> if(fileSize>MAXIMUM_SMALL_FILE_SIZE) then i'm putting content into the
> jcr:bigFIle, else to the
> jcr:smallFile.
> 4. Because Jackrabbit can show to lucene which content should be indexed
> and
> analayzed i'm doning it via the proper configuration. By that i mean
> jcr:smallFile is indexed, jcr:bigFile is not.
> 
> As i get it from your e-mail, you are providing the client api.
> ps:I don't find a solution for the defualt configuration...
> 
> Best regards/Pozdrawiam
> Tomasz Łazarecki
> 
> 
Could one of these solutions work as well?
1. I'm not very familiar with the syntax, so I don't know if this is
correct...
   <index-rule nodeType="nt:file"
              condition="@jcr:mimeType != 'bigFile'">
    <property>jcr:data</property>
  </index-rule>

2. Create my own TextExtractor for mimeType 'bigFile' that returns a fake
reader and therefore prevents the file to be loaded into memory.

Best regards,
Johannes

Re: Out Of Memory Error while indexing

Posted by Tomasz Łazarecki <to...@gmail.com>.

Hello,
That's only an example not the greatest solution.

So, here we go...
1.I would register my nodeType VersionControlNode.
2.Next i would add two content props: one for the smaller and second for the
bigger files in the VersionControlNode.
3.I'm providing the client api so i would check
if(fileSize>MAXIMUM_SMALL_FILE_SIZE) then i'm putting content into the
jcr:bigFIle, else to the
jcr:smallFile.
4. Because Jackrabbit can show to lucene which content should be indexed and
analayzed i'm doning it via the proper configuration. By that i mean
jcr:smallFile is indexed, jcr:bigFile is not.

As i get it from your e-mail, you are providing the client api.
ps:I don't find a solution for the defualt configuration...

Best regards/Pozdrawiam
Tomasz Łazarecki




2010/2/9 <ma...@jutzig.de>

>
> Hi Tomasz,
>
> thank you for your reply. Please see comments below
>
>
> > One of the solution i have on my mind is:
> > 1.you may register your own type like: nt:BigAs**File of course the name
> > should be different.
> > 2. register it on server side via coping file custom_nodetypes.xml  to
> the
> > ${repo.home}/repository/nodetypes
> > 3. set to not index field of type jcr:bigAs**content
> >
>
> Like stated earlier, I'm using Jackrabbit as a version control system. The
> client is implemented as an eclipse team provider similar to the build-in
> CVS team provider. Now if a user commits a file test.txt with just a few
> bytes of content, I do want that to be indexed. So I create a nt:file, set
> the contents, and set the mime type. Later the user changes test.txt and
> commits it. This time the test.txt has 500MB. It's still the same node
> (nt:file) and I'd rather not delete the node and add a new one of another
> kind.
> Now the question for me is, how can I prevent Jackrabbit from crashing on
> these large files?
>
> I have seen the wiki links you gave me before, but so far I wasn't able to
> apply that to my use case.
> How would a configuration look like for
> 'stick to the default settings, but don't index properties larger than N
> bytes?'
> And if that doesn't work, how would I write:
> 'index everything except jcr:data where jcr:mimeType equals my:bigAssFile?'
>
> Thanks for the support and best regards,
> Johannes
>
>
>

Re: Out Of Memory Error while indexing

Posted by ma...@jutzig.de.

Hi Tomasz,

thank you for your reply. Please see comments below


> One of the solution i have on my mind is:
> 1.you may register your own type like: nt:BigAs**File of course the name
> should be different.
> 2. register it on server side via coping file custom_nodetypes.xml  to
the
> ${repo.home}/repository/nodetypes
> 3. set to not index field of type jcr:bigAs**content
> 

Like stated earlier, I'm using Jackrabbit as a version control system. The
client is implemented as an eclipse team provider similar to the build-in
CVS team provider. Now if a user commits a file test.txt with just a few
bytes of content, I do want that to be indexed. So I create a nt:file, set
the contents, and set the mime type. Later the user changes test.txt and
commits it. This time the test.txt has 500MB. It's still the same node
(nt:file) and I'd rather not delete the node and add a new one of another
kind. 
Now the question for me is, how can I prevent Jackrabbit from crashing on
these large files?

I have seen the wiki links you gave me before, but so far I wasn't able to
apply that to my use case.
How would a configuration look like for 
'stick to the default settings, but don't index properties larger than N
bytes?'
And if that doesn't work, how would I write:
'index everything except jcr:data where jcr:mimeType equals my:bigAssFile?'

Thanks for the support and best regards,
Johannes

Re: Out Of Memory Error while indexing

Posted by Tomasz Łazarecki <to...@gmail.com>.

Hello,

I don't find a reson to index this content. In lucene it's easy to turn it
off via setting proper flag of Field.class.
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/document/Field.html

I guess this links should help you how to do that in Jackooo~:
http://wiki.apache.org/jackrabbit/Search
http://wiki.apache.org/jackrabbit/IndexingConfiguration

One of the solution i have on my mind is:
1.you may register your own type like: nt:BigAs**File of course the name
should be different.
2. register it on server side via coping file custom_nodetypes.xml  to the
${repo.home}/repository/nodetypes
3. set to not index field of type jcr:bigAs**content

Pozdrawiam
Tomasz Łazarecki

Re: Out Of Memory Error while indexing

Posted by ma...@jutzig.de.

Hi Thomas,

thank you for your fast reply. Please see comments below.

On Tue, 9 Feb 2010 08:57:06 +0100, Thomas Müller <th...@day.com>
wrote:
> Hi,
> 
>> the clients connect with RMI.
> 
> I'm not sure, but that might be the problem.
> 
> To analyze the problem, I suggest to use the command line option
> -XX:+HeapDumpOnOutOfMemoryError and then use a memory analysis tool
> such as the Eclipse Memory Analyzer (MAT): http://www.eclipse.org/mat
> 
> There is an alternative to using RMI:
> http://wiki.apache.org/jackrabbit/RemoteAccess#DavEx
> 

I do know about the DavEx alternative, but the way I'm using the API, the
DavEx implementation has been a lot slower than the RMI implementation (in
the 1.6.0 version of Jackrabbit at least).
Anyway, I did the same with a DavEx connection instead of RMI but the
problem remains. It takes a while to transfer the iso file, memory stays
rather constant on both client and server, then, when the file is complete,
the memory of the Jackrabbit process increases rapidly until it hits an
OutOfMemoryError.
Stacktrace below.

The Memory Analyzer has only one leak suspect:
The thread org.apache.tomcat.util.threads.ThreadWithAttributes @ 0x45dd178
http-8080-Processor24 keeps local variables with total size 134.502.112
(85,57%) bytes.

The memory is accumulated in one instance of "char[]" loaded by "<system
class loader>".Keywords
char[]

Is there any other information I can provide you with?


Best regards,
Johannes

Exception in thread "Timer-1" java.lang.IllegalStateException: this writer
hit an OutOfMemoryError; cannot com
mit
        at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3353)
        at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3408)
        at
org.apache.jackrabbit.core.query.lucene.AbstractIndex.commit(AbstractIndex.java:363)
        at
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commit(VolatileIndex.java:141)
        at
org.apache.jackrabbit.core.query.lucene.PersistentIndex.copyIndex(PersistentIndex.java:105)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex$VolatileCommit.execute(MultiIndex.java:1984)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:1000)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.commitVolatileIndex(MultiIndex.java:1048)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.flush(MultiIndex.java:893)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.checkFlush(MultiIndex.java:1164)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.access$100(MultiIndex.java:80)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex$1.run(MultiIndex.java:317)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
09.02.2010 09:37:57 *ERROR* SearchManager: Error indexing node.
(SearchManager.java, line 490)
java.io.IOException: Java heap space
        at
org.apache.jackrabbit.core.query.lucene.Util.createIOException(Util.java:114)
        at
org.apache.jackrabbit.core.query.lucene.AbstractIndex.addDocuments(AbstractIndex.java:199)
        at
org.apache.jackrabbit.core.query.lucene.VolatileIndex.commitPending(VolatileIndex.java:171)
        at
org.apache.jackrabbit.core.query.lucene.VolatileIndex.addDocuments(VolatileIndex.java:82)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex$AddNode.execute(MultiIndex.java:1599)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:1000)
        at
org.apache.jackrabbit.core.query.lucene.MultiIndex.update(MultiIndex.java:429)
        at
org.apache.jackrabbit.core.query.lucene.SearchIndex.updateNodes(SearchIndex.java:588)
        at
org.apache.jackrabbit.core.SearchManager.onEvent(SearchManager.java:486)
        at
org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(EventConsumer.java:244)
        at
org.apache.jackrabbit.core.observation.ObservationDispatcher.dispatchEvents(ObservationDispatcher.j
ava:201)
        at
org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:464)

        at
org.apache.jackrabbit.core.observation.DelegatingObservationDispatcher.dispatch(DelegatingObservati
onDispatcher.java:127)
        at
org.apache.jackrabbit.core.observation.DelegatingObservationDispatcher.dispatchEvents(DelegatingObs
ervationDispatcher.java:99)
        at
org.apache.jackrabbit.core.observation.EventStateCollection.dispatch(EventStateCollection.java:464)

        at
org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:760)

        at
org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1115)
        at
org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:351)
        at
org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
        at
org.apache.jackrabbit.core.version.AbstractVersionManager$WriteOperation.save(AbstractVersionManage
r.java:189)
        at
org.apache.jackrabbit.core.version.AbstractVersionManager.checkin(AbstractVersionManager.java:442)
        at
org.apache.jackrabbit.core.version.VersionManagerImpl$2.run(VersionManagerImpl.java:290)
        at
org.apache.jackrabbit.core.version.VersionManagerImpl$DynamicESCFactory.doSourced(VersionManagerImp
l.java:586)
        at
org.apache.jackrabbit.core.version.VersionManagerImpl.checkin(VersionManagerImpl.java:281)
        at
org.apache.jackrabbit.core.version.XAVersionManager.checkin(XAVersionManager.java:180)
        at org.apache.jackrabbit.core.NodeImpl.checkin(NodeImpl.java:3367)
        at org.apache.jackrabbit.core.NodeImpl.checkin(NodeImpl.java:3346)
        at
org.apache.jackrabbit.webdav.jcr.VersionControlledItemCollection.checkin(VersionControlledItemColle
ction.java:268)
        at
org.apache.jackrabbit.webdav.server.AbstractWebdavServlet.doCheckin(AbstractWebdavServlet.java:997)

        at
org.apache.jackrabbit.webdav.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:292)
        at
org.apache.jackrabbit.webdav.server.AbstractWebdavServlet.service(AbstractWebdavServlet.java:196)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:873)
        at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BasePro
tocol.java:665)
        at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
        at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
        at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2882)
        at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
        at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
        at java.lang.StringBuffer.append(StringBuffer.java:306)
        at
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField.stringValue(LazyTextExtractorField.j
ava:91)
        at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:109)
        at
org.apache.lucene.index.DocFieldConsumersPerField.processFields(DocFieldConsumersPerField.java:36)
        at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
234)
        at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:765)
        at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:743)
        at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1917)
        at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1895)
        at
org.apache.jackrabbit.core.query.lucene.AbstractIndex$1.call(AbstractIndex.java:183)
        at
org.apache.jackrabbit.core.query.lucene.DynamicPooledExecutor$1.call(DynamicPooledExecutor.java:109
)
        at EDU.oswego.cs.dl.util.concurrent.FutureResult$1.run(Unknown
Source)
        at
EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
        ... 1 more

Re: Out Of Memory Error while indexing

Posted by Thomas Müller <th...@day.com>.

Hi,

> the clients connect with RMI.

I'm not sure, but that might be the problem.

To analyze the problem, I suggest to use the command line option
-XX:+HeapDumpOnOutOfMemoryError and then use a memory analysis tool
such as the Eclipse Memory Analyzer (MAT): http://www.eclipse.org/mat

There is an alternative to using RMI:
http://wiki.apache.org/jackrabbit/RemoteAccess#DavEx

If the Jackrabbit RMI implementation is really the problem, we should
document that.

Regards,
Thomas