You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Gargate, Siddharth" <sg...@ptc.com> on 2009/02/16 07:00:58 UTC

Outofmemory error for large files

I am trying to index around 150 MB text file with 1024 MB max heap. But
I get Outofmemory error in the SolrJ code. 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2882)
	at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
	at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
	at java.lang.StringBuffer.append(StringBuffer.java:320)
	at java.io.StringWriter.write(StringWriter.java:60)
	at org.apache.solr.common.util.XML.escape(XML.java:206)
	at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
	at org.apache.solr.common.util.XML.writeXML(XML.java:149)
	at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
	at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
	at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
	at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
	at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
	at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
	at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object
in UpdateRequest.getXML with initial size, and cleared the
SolrInputDocument that is having the reference of the file text. Then I
am getting OOM as below:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2786)
	at java.lang.StringCoding.safeTrim(StringCoding.java:64)
	at java.lang.StringCoding.access$300(StringCoding.java:34)
	at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
	at java.lang.StringCoding.encode(StringCoding.java:272)
	at java.lang.String.getBytes(String.java:947)
	at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
	at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
	at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
	at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
	at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOfRange(Arrays.java:3209)
	at java.lang.String.<init>(String.java:216)
	at java.lang.StringBuffer.toString(StringBuffer.java:585)
	at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
	at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
	at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
	at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
	at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
	at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
	at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


So looks like I won't be able to get out of these OOMs. 
Is there any way to avoid these OOMs? One option I see is to break the
file in chunks, but with this, I won't be able to search with multiple
words if they are distributed in different documents.
Also, can somebody tell me the minimum heap size required w.r.t. file
size so that document get indexed successfully? 

Thanks,
Siddharth

Re: Outofmemory error for large files

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Feb 17, 2009 at 1:10 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Right.  But I was trying to point out that a single 150MB Document is not
> in fact what the o.p. wants to do.  For example, if your 150MB represents,
> say, a whole book, should that really be a single document?  Or should
> individual chapters be separate documents, for example?
>
>
Yes, a 150MB document is probably not a good idea. I am only trying to point
out that even if he writes multiple documents in a 150MB batch, he may still
hit the OOME because all the XML is written to memory first and then out to
the server.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Outofmemory error for large files

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Right.  But I was trying to point out that a single 150MB Document is not in fact what the o.p. wants to do.  For example, if your 150MB represents, say, a whole book, should that really be a single document?  Or should individual chapters be separate documents, for example?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Shalin Shekhar Mangar <sh...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Tuesday, February 17, 2009 2:48:08 PM
Subject: Re: Outofmemory error for large files

On Tue, Feb 17, 2009 at 10:26 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Siddharth,
>
> But does your 150MB file represent a single Document?  That doesn't sound
> right.
>

Otis, Solrj writes the whole XML in memory before writing it to server. That
may be one reason behind Sidhharth's OOME. See
https://issues.apache.org/jira/browse/SOLR-973

-- 
Regards,
Shalin Shekhar Mangar.

Re: Outofmemory error for large files

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Feb 17, 2009 at 10:26 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Siddharth,
>
> But does your 150MB file represent a single Document?  That doesn't sound
> right.
>

Otis, Solrj writes the whole XML in memory before writing it to server. That
may be one reason behind Sidhharth's OOME. See
https://issues.apache.org/jira/browse/SOLR-973

-- 
Regards,
Shalin Shekhar Mangar.

Re: Outofmemory error for large files

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Siddharth,

But does your 150MB file represent a single Document?  That doesn't sound right.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: "Gargate, Siddharth" <sg...@ptc.com>
To: solr-user@lucene.apache.org
Sent: Tuesday, February 17, 2009 12:39:53 PM
Subject: RE: Outofmemory error for large files


Otis,

I haven't tried it yet but what I meant is :
If we divide the content in multiple parts, then words will be splitted in two different SOLR documents. If the main document contains 'Hello World' then these two words might get indexed in two different documents. Searching for 'Hello world' won't give me the required search result unless I use OR in the query.

Thanks,
Siddharth

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Tuesday, February 17, 2009 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Outofmemory error for large files

Siddharth,

At the end of your email you said:
"One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents."

Unless I'm missing something unusual about your application, I don't think the above is technically correct.  Have you tried doing this and have you then tried your searches?  Everything should still work, even if you index one document at a time.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: "Gargate, Siddharth" <sg...@ptc.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 2:00:58 PM
Subject: Outofmemory error for large files


I am trying to index around 150 MB text file with 1024 MB max heap. But I get Outofmemory error in the SolrJ code. 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
    at java.lang.StringBuffer.append(StringBuffer.java:320)
    at java.io.StringWriter.write(StringWriter.java:60)
    at org.apache.solr.common.util.XML.escape(XML.java:206)
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
    at org.apache.solr.common.util.XML.writeXML(XML.java:149)
    at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
    at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object in UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that is having the reference of the file text. Then I am getting OOM as below:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.safeTrim(StringCoding.java:64)
    at java.lang.StringCoding.access$300(StringCoding.java:34)
    at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
    at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
    at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


So looks like I won't be able to get out of these OOMs. 
Is there any way to avoid these OOMs? One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents.
Also, can somebody tell me the minimum heap size required w.r.t. file size so that document get indexed successfully? 

Thanks,
Siddharth

RE: Outofmemory error for large files

Posted by "Gargate, Siddharth" <sg...@ptc.com>.
 Otis,

I haven't tried it yet but what I meant is :
If we divide the content in multiple parts, then words will be splitted in two different SOLR documents. If the main document contains 'Hello World' then these two words might get indexed in two different documents. Searching for 'Hello world' won't give me the required search result unless I use OR in the query.

Thanks,
Siddharth

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Tuesday, February 17, 2009 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Outofmemory error for large files

Siddharth,

At the end of your email you said:
"One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents."

Unless I'm missing something unusual about your application, I don't think the above is technically correct.  Have you tried doing this and have you then tried your searches?  Everything should still work, even if you index one document at a time.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: "Gargate, Siddharth" <sg...@ptc.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 2:00:58 PM
Subject: Outofmemory error for large files


I am trying to index around 150 MB text file with 1024 MB max heap. But I get Outofmemory error in the SolrJ code. 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
    at java.lang.StringBuffer.append(StringBuffer.java:320)
    at java.io.StringWriter.write(StringWriter.java:60)
    at org.apache.solr.common.util.XML.escape(XML.java:206)
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
    at org.apache.solr.common.util.XML.writeXML(XML.java:149)
    at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
    at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object in UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that is having the reference of the file text. Then I am getting OOM as below:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.safeTrim(StringCoding.java:64)
    at java.lang.StringCoding.access$300(StringCoding.java:34)
    at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
    at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
    at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


So looks like I won't be able to get out of these OOMs. 
Is there any way to avoid these OOMs? One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents.
Also, can somebody tell me the minimum heap size required w.r.t. file size so that document get indexed successfully? 

Thanks,
Siddharth

Re: Outofmemory error for large files

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Siddharth,

At the end of your email you said:
"One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents."

Unless I'm missing something unusual about your application, I don't think the above is technically correct.  Have you tried doing this and have you then tried your searches?  Everything should still work, even if you index one document at a time.

Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch 




________________________________
From: "Gargate, Siddharth" <sg...@ptc.com>
To: solr-user@lucene.apache.org
Sent: Monday, February 16, 2009 2:00:58 PM
Subject: Outofmemory error for large files


I am trying to index around 150 MB text file with 1024 MB max heap. But
I get Outofmemory error in the SolrJ code. 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav
a:100)
    at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572)
    at java.lang.StringBuffer.append(StringBuffer.java:320)
    at java.io.StringWriter.write(StringWriter.java:60)
    at org.apache.solr.common.util.XML.escape(XML.java:206)
    at org.apache.solr.common.util.XML.escapeCharData(XML.java:79)
    at org.apache.solr.common.util.XML.writeXML(XML.java:149)
    at
org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:
115)
    at
org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques
t.java:200)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.
java:178)
    at
org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd
ateRequest.java:173)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:136)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:243)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


I modified the UpdateRequest class to initialize the StringWriter object
in UpdateRequest.getXML with initial size, and cleared the
SolrInputDocument that is having the reference of the file text. Then I
am getting OOM as below:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.lang.StringCoding.safeTrim(StringCoding.java:64)
    at java.lang.StringCoding.access$300(StringCoding.java:34)
    at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251)
    at java.lang.StringCoding.encode(StringCoding.java:272)
    at java.lang.String.getBytes(String.java:947)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con
tentStreamBase.java:142)
    at
org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con
tentStreamBase.java:154)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


After I increase the heap size upto 1250 MB, I get OOM as 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at
com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403)
    at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
    at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:54)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
    at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde
dSolrServer.java:139)
    at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest
.java:249)
    at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)


So looks like I won't be able to get out of these OOMs. 
Is there any way to avoid these OOMs? One option I see is to break the
file in chunks, but with this, I won't be able to search with multiple
words if they are distributed in different documents.
Also, can somebody tell me the minimum heap size required w.r.t. file
size so that document get indexed successfully? 

Thanks,
Siddharth