You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Amit Kapur <am...@newgen.co.in> on 2003/04/03 07:13:38 UTC

Problem while indexing

hi all

I m facing problems like mentioned below while indexing, If anyone has any
help to offer i would to obliged....
**** couldn't rename segments.new to segments ****
**** F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open
files)****

I am trying to index documents using Lucene generating about 30 MB of index
(Optimized) which can be raised to about 100 MB or More ( but that would be
on a high end server machine).

Description of Current Case:
#---Each Document has four fields (One Text field, and 3 other Keyword
Fields).
#---The analyzer is based on a StopFilter and a PorterStemFilter.
#---I am using a Compaq PIII, 128 MB RAM, 650 MHz.
#---mergeFactor is set to 25, and I am optimizing the index after adding
about 20 Documents.
#---Using Lucene Release 1.2

Problem Faced
After adding about 4000 Documents generating an index of 30 MB, I initially
got an error saying, **** couldn't rename segments.new to segments ****
after which the IndexReader or the IndexWriter to the current index couldnot
be opened.

Then I changed a couple of settings,
#---mergeFactor=20 and Optimize was called after ever 10 documents.
#---Using Lucene Release 1.3

Problem Faced
After adding about 1500 Documents generating an index of 10 MB, I initially
got an error saying, **** F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm
(Too many open files)**** after which the IndexWriter to the current index
couldnot be opened.

Now my requirement needs to have a much much larger index (practically) and
I am actually at the point where these errors are coming unpredictably.

Please if anyone could guide me on this ASAP.
Thanx in advance

Regards
Amit

PS: I have already read articles in the mail archieve
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02815.html.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Problem while indexing

Posted by Amit Kapur <am...@newgen.co.in>.

Thanx Terry

Well today again I have made a few changes in the archietecture of my
component where am using Lucene, and changed the way I am using the
IndexReader and as u said made sure that all readers are closed, mergefactor
is back to default (10).

The test run is on and its working pretty well for now, have managed to have
about 1000 documents, index of about 10MB, n counting :) .. hope this time
things are better, well thanx for your word, it made me feel that this has
been rightly done before and can be done even now. I appreciate the way you
replied.

Thanx
would get back to you later ...
Cheers!!
Amit


----- Original Message -----
From: Terry Steichen
To: Lucene Users Group
Sent: Thursday, April 03, 2003 7:38 PM
Subject: Re: Problem while indexing


Amit,

I don't exactly know what your problem is, but I'm using a configuration not
too different from yours with no problems - so at least you know it's
possible.

I have an index of about 125MB which I use on various machines, including an
old Windows98/SE 400MHz notebook.  I used the default MergeFactor (10, I
think) and do a daily merge (the daily addition represents about 200
documents added to a total of over 58,000).  Each document (XML format) has
about 15 fields of various types.  I'm using release 1.3 dev 1.

At one point I too had a problem of too many open files - turned out that I
wasn't closing the IndexReader.  Fixed that, and the number of open files
usually stays below 500 (without Lucene, there are typically about 300-400
open files just for the system).

HTH,

Terry



----- Original Message -----
From: "Amit Kapur" <am...@newgen.co.in>
To: <lu...@jakarta.apache.org>
Cc: <lu...@jakarta.apache.org>
Sent: Thursday, April 03, 2003 12:13 AM
Subject: Problem while indexing


>
> hi all
>
> I m facing problems like mentioned below while indexing, If anyone has any
> help to offer i would to obliged....
> **** couldn't rename segments.new to segments ****
> **** F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open
> files)****
>
> I am trying to index documents using Lucene generating about 30 MB of
index
> (Optimized) which can be raised to about 100 MB or More ( but that would
be
> on a high end server machine).
>
> Description of Current Case:
> #---Each Document has four fields (One Text field, and 3 other Keyword
> Fields).
> #---The analyzer is based on a StopFilter and a PorterStemFilter.
> #---I am using a Compaq PIII, 128 MB RAM, 650 MHz.
> #---mergeFactor is set to 25, and I am optimizing the index after adding
> about 20 Documents.
> #---Using Lucene Release 1.2
>
> Problem Faced
> After adding about 4000 Documents generating an index of 30 MB, I
initially
> got an error saying, **** couldn't rename segments.new to segments ****
> after which the IndexReader or the IndexWriter to the current index
couldnot
> be opened.
>
> Then I changed a couple of settings,
> #---mergeFactor=20 and Optimize was called after ever 10 documents.
> #---Using Lucene Release 1.3
>
> Problem Faced
> After adding about 1500 Documents generating an index of 10 MB, I
initially
> got an error saying, **** F:\Program Files\OmniDocs
Server\ftstest\_3cf.fnm
> (Too many open files)**** after which the IndexWriter to the current index
> couldnot be opened.
>
> Now my requirement needs to have a much much larger index (practically)
and
> I am actually at the point where these errors are coming unpredictably.
>
> Please if anyone could guide me on this ASAP.
> Thanx in advance
>
> Regards
> Amit
>
> PS: I have already read articles in the mail archieve
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02815.html.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Problem while indexing

Posted by Terry Steichen <te...@net-frame.com>.

Amit,

I don't exactly know what your problem is, but I'm using a configuration not
too different from yours with no problems - so at least you know it's
possible.

I have an index of about 125MB which I use on various machines, including an
old Windows98/SE 400MHz notebook.  I used the default MergeFactor (10, I
think) and do a daily merge (the daily addition represents about 200
documents added to a total of over 58,000).  Each document (XML format) has
about 15 fields of various types.  I'm using release 1.3 dev 1.

At one point I too had a problem of too many open files - turned out that I
wasn't closing the IndexReader.  Fixed that, and the number of open files
usually stays below 500 (without Lucene, there are typically about 300-400
open files just for the system).

HTH,

Terry



----- Original Message -----
From: "Amit Kapur" <am...@newgen.co.in>
To: <lu...@jakarta.apache.org>
Cc: <lu...@jakarta.apache.org>
Sent: Thursday, April 03, 2003 12:13 AM
Subject: Problem while indexing


>
> hi all
>
> I m facing problems like mentioned below while indexing, If anyone has any
> help to offer i would to obliged....
> **** couldn't rename segments.new to segments ****
> **** F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open
> files)****
>
> I am trying to index documents using Lucene generating about 30 MB of
index
> (Optimized) which can be raised to about 100 MB or More ( but that would
be
> on a high end server machine).
>
> Description of Current Case:
> #---Each Document has four fields (One Text field, and 3 other Keyword
> Fields).
> #---The analyzer is based on a StopFilter and a PorterStemFilter.
> #---I am using a Compaq PIII, 128 MB RAM, 650 MHz.
> #---mergeFactor is set to 25, and I am optimizing the index after adding
> about 20 Documents.
> #---Using Lucene Release 1.2
>
> Problem Faced
> After adding about 4000 Documents generating an index of 30 MB, I
initially
> got an error saying, **** couldn't rename segments.new to segments ****
> after which the IndexReader or the IndexWriter to the current index
couldnot
> be opened.
>
> Then I changed a couple of settings,
> #---mergeFactor=20 and Optimize was called after ever 10 documents.
> #---Using Lucene Release 1.3
>
> Problem Faced
> After adding about 1500 Documents generating an index of 10 MB, I
initially
> got an error saying, **** F:\Program Files\OmniDocs
Server\ftstest\_3cf.fnm
> (Too many open files)**** after which the IndexWriter to the current index
> couldnot be opened.
>
> Now my requirement needs to have a much much larger index (practically)
and
> I am actually at the point where these errors are coming unpredictably.
>
> Please if anyone could guide me on this ASAP.
> Thanx in advance
>
> Regards
> Amit
>
> PS: I have already read articles in the mail archieve
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02815.html.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Problem while indexing

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Amit,

Please don't mail both lucene-user and lucene-dev.  This is a message
suitable for lucene-user.
If people are not answering your message, try providing more
information, different information, not re-sending the same message.

I'm not a big Windows user, so I don't know how you can increase the
limit for the maximum number of open files on Windows.  It looks like
that is at least a part of the problem.
What's the current number of open files on your machine?
Don't ask how to find that out, I don't know, that would be a Windows
question.

Otis


--- Amit Kapur <am...@newgen.co.in> wrote:
> 
> hi all
> 
> I m facing problems like mentioned below while indexing, If anyone
> has any
> help to offer i would to obliged....
> **** couldn't rename segments.new to segments ****
> **** F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open
> files)****
> 
> I am trying to index documents using Lucene generating about 30 MB of
> index
> (Optimized) which can be raised to about 100 MB or More ( but that
> would be
> on a high end server machine).
> 
> Description of Current Case:
> #---Each Document has four fields (One Text field, and 3 other
> Keyword
> Fields).
> #---The analyzer is based on a StopFilter and a PorterStemFilter.
> #---I am using a Compaq PIII, 128 MB RAM, 650 MHz.
> #---mergeFactor is set to 25, and I am optimizing the index after
> adding
> about 20 Documents.
> #---Using Lucene Release 1.2
> 
> Problem Faced
> After adding about 4000 Documents generating an index of 30 MB, I
> initially
> got an error saying, **** couldn't rename segments.new to segments
> ****
> after which the IndexReader or the IndexWriter to the current index
> couldnot
> be opened.
> 
> Then I changed a couple of settings,
> #---mergeFactor=20 and Optimize was called after ever 10 documents.
> #---Using Lucene Release 1.3
> 
> Problem Faced
> After adding about 1500 Documents generating an index of 10 MB, I
> initially
> got an error saying, **** F:\Program Files\OmniDocs
> Server\ftstest\_3cf.fnm
> (Too many open files)**** after which the IndexWriter to the current
> index
> couldnot be opened.
> 
> Now my requirement needs to have a much much larger index
> (practically) and
> I am actually at the point where these errors are coming
> unpredictably.
> 
> Please if anyone could guide me on this ASAP.
> Thanx in advance
> 
> Regards
> Amit
> 
> PS: I have already read articles in the mail archieve
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02815.html.



__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Highlight Search Result

Posted by Michael Wechner <mi...@wyona.org>.

Lixin Meng wrote:
> When I was looking for a solution that can highlight the query terms in the
> search result, I came cross the following one.
> 
> http://www.iq-computing.de/lucene/highlight.jsp
> 
> It sounds a good solution to me. However, to make it working, one need to
> modify Lucene source code (such as change some private declaration to
> public). I guess you guys already know about it. Just wonder if there is any
> plan (or there is any procedure) to incorporate the suggestions into Lucene
> code base?
> 
> If the answer is no, anybody knows other solution, which doesn't require
> code change, for highlighting?

We implemented Lucene into Apache Lenya (formerly known as Wyona CMS) 
and also offer highlighting by dumping (during crawling) "htdocs" onto 
the filesystem and after the search we read the files (corresponding to 
the hits) and are able to generate the excerpts with highlighting.

You can see it in action at

http://www.oscom.org/search/go?publication-id=all&queryString=Lenya&fields=all&find=Search

You can download the code from:

http://www.wyona.org/download/downloading_and_installing.html

I think Doug Cutting wrote on the mailing list some time ago, that you 
shouldn't put the content as a field into the index, because the index 
will blow up and search performance will be bad.
But I guess in the case of just light and only a few documents it 
wouldn't matter that much. Hence we probably enhance our solution such 
that you can set a flag where the content shall be stored, either within 
the index or on the filesystem.

HTH

Michael

> 
> I am hesitating to make a variation out of Lucene main stream, since I will
> have to patch it everytime Lucene has an new release. After all, I just want
> to use it.
> 
> Regards,
> Lixin
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Highlight Search Result

Posted by Lixin Meng <li...@fulldegree.com>.

When I was looking for a solution that can highlight the query terms in the
search result, I came cross the following one.

http://www.iq-computing.de/lucene/highlight.jsp

It sounds a good solution to me. However, to make it working, one need to
modify Lucene source code (such as change some private declaration to
public). I guess you guys already know about it. Just wonder if there is any
plan (or there is any procedure) to incorporate the suggestions into Lucene
code base?

If the answer is no, anybody knows other solution, which doesn't require
code change, for highlighting?

I am hesitating to make a variation out of Lucene main stream, since I will
have to patch it everytime Lucene has an new release. After all, I just want
to use it.

Regards,
Lixin



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org