You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Harini Raghavan <ha...@insideview.com> on 2005/07/25 13:30:07 UTC
OutOfMemory errors while indexing large documents
Hi All,
I am using lucene to index large documents(HTML pages). The application is
running on JBoss and MySQL on UNIX. The indexing is throwing OutOfMemory
errors beyond a certain point. I am not sure why this is happening. I am
using the default IndexWriter properties, but the lucene documentation
mentions about setting the max field length on the IndexWriter to some
optimum value for large documents. Is anyone aware of any optimum settings
for maxFieldLength, mergeFactor, minMergeDoc and maxMergeDoc?
Thanks,
Harini
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemory errors while indexing large documents
Posted by Chris Lu <ch...@gmail.com>.
Also, be careful with MySql JDBC driver.
Depends on how you use MySql, you could have OutOfMemory errors, which
may not be Lucene or parsers' problem.
--
Chris Lu
---------------------
Full-Text Search on Any Database
http://www.dbsight.net
On 7/25/05, Harini Raghavan <ha...@insideview.com> wrote:
> I am using org.htmlparser.parserapplications.StringExtractor to parse the
> html pages, I guess the OutOfMemory occurs while parsing the large HTML
> pages and not while indexing. Sorry about the confusion.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemory errors while indexing large documents
Posted by Harini Raghavan <ha...@insideview.com>.
I am using org.htmlparser.parserapplications.StringExtractor to parse the
html pages, I guess the OutOfMemory occurs while parsing the large HTML
pages and not while indexing. Sorry about the confusion.
----- Original Message -----
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: <ja...@lucene.apache.org>
Sent: Monday, July 25, 2005 6:43 PM
Subject: Re: OutOfMemory errors while indexing large documents
> Could you be more specific about where the OutOfMemory error is
> happening? Do you have a complete stack trace?
>
> As for maxFieldLength - in my use of Lucene, it is necessary to index the
> entire document and not just the first 10,000 or so terms - I set
> maxFieldLength to Integer.MAX_VALUE.
>
> Erik
>
>
> On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:
>
>> Hi All,
>> I am using lucene to index large documents(HTML pages). The application
>> is running on JBoss and MySQL on UNIX. The indexing is throwing
>> OutOfMemory errors beyond a certain point. I am not sure why this is
>> happening. I am using the default IndexWriter properties, but the lucene
>> documentation mentions about setting the max field length on the
>> IndexWriter to some optimum value for large documents. Is anyone aware
>> of any optimum settings for maxFieldLength, mergeFactor, minMergeDoc and
>> maxMergeDoc?
>> Thanks,
>> Harini
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: OutOfMemory errors while indexing large documents
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Could you be more specific about where the OutOfMemory error is
happening? Do you have a complete stack trace?
As for maxFieldLength - in my use of Lucene, it is necessary to index
the entire document and not just the first 10,000 or so terms - I set
maxFieldLength to Integer.MAX_VALUE.
Erik
On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:
> Hi All,
> I am using lucene to index large documents(HTML pages). The
> application is running on JBoss and MySQL on UNIX. The indexing is
> throwing OutOfMemory errors beyond a certain point. I am not sure
> why this is happening. I am using the default IndexWriter
> properties, but the lucene documentation mentions about setting the
> max field length on the IndexWriter to some optimum value for large
> documents. Is anyone aware of any optimum settings for
> maxFieldLength, mergeFactor, minMergeDoc and maxMergeDoc?
> Thanks,
> Harini
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org