You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Harini Raghavan <ha...@insideview.com> on 2005/07/25 13:30:07 UTC

OutOfMemory errors while indexing large documents

Hi All,
I am using lucene to index large documents(HTML pages). The application is 
running on JBoss and MySQL on UNIX. The indexing is throwing OutOfMemory 
errors beyond a certain point. I am not sure why this is happening. I am 
using the default IndexWriter properties, but the lucene documentation 
mentions about setting the max field length on the IndexWriter to some 
optimum value for large documents. Is anyone aware of any optimum settings 
for maxFieldLength, mergeFactor, minMergeDoc and maxMergeDoc?
Thanks,
Harini 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory errors while indexing large documents

Posted by Chris Lu <ch...@gmail.com>.
Also, be careful with MySql JDBC driver.

Depends on how you use MySql, you could have OutOfMemory errors, which
may not be Lucene or parsers' problem.
-- 
Chris Lu
---------------------
Full-Text Search on Any Database
http://www.dbsight.net

On 7/25/05, Harini Raghavan <ha...@insideview.com> wrote:
> I am using org.htmlparser.parserapplications.StringExtractor to parse the
> html pages,  I guess the OutOfMemory occurs while parsing the large HTML
> pages and not while indexing. Sorry about the confusion.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory errors while indexing large documents

Posted by Harini Raghavan <ha...@insideview.com>.
I am using org.htmlparser.parserapplications.StringExtractor to parse the 
html pages,  I guess the OutOfMemory occurs while parsing the large HTML 
pages and not while indexing. Sorry about the confusion.

----- Original Message ----- 
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: <ja...@lucene.apache.org>
Sent: Monday, July 25, 2005 6:43 PM
Subject: Re: OutOfMemory errors while indexing large documents


> Could you be more specific about where the OutOfMemory error is 
> happening?  Do you have a complete stack trace?
>
> As for maxFieldLength - in my use of Lucene, it is necessary to index  the 
> entire document and not just the first 10,000 or so terms - I set 
> maxFieldLength to Integer.MAX_VALUE.
>
>     Erik
>
>
> On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:
>
>> Hi All,
>> I am using lucene to index large documents(HTML pages). The  application 
>> is running on JBoss and MySQL on UNIX. The indexing is  throwing 
>> OutOfMemory errors beyond a certain point. I am not sure  why this is 
>> happening. I am using the default IndexWriter  properties, but the lucene 
>> documentation mentions about setting the  max field length on the 
>> IndexWriter to some optimum value for large  documents. Is anyone aware 
>> of any optimum settings for  maxFieldLength, mergeFactor, minMergeDoc and 
>> maxMergeDoc?
>> Thanks,
>> Harini
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: OutOfMemory errors while indexing large documents

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Could you be more specific about where the OutOfMemory error is  
happening?  Do you have a complete stack trace?

As for maxFieldLength - in my use of Lucene, it is necessary to index  
the entire document and not just the first 10,000 or so terms - I set  
maxFieldLength to Integer.MAX_VALUE.

     Erik


On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:

> Hi All,
> I am using lucene to index large documents(HTML pages). The  
> application is running on JBoss and MySQL on UNIX. The indexing is  
> throwing OutOfMemory errors beyond a certain point. I am not sure  
> why this is happening. I am using the default IndexWriter  
> properties, but the lucene documentation mentions about setting the  
> max field length on the IndexWriter to some optimum value for large  
> documents. Is anyone aware of any optimum settings for  
> maxFieldLength, mergeFactor, minMergeDoc and maxMergeDoc?
> Thanks,
> Harini
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org