You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Shane <sh...@LEXUM.UMontreal.CA> on 2009/09/25 17:22:16 UTC

Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

I'm trying to track a bug in my application using Lucene rc5, its 
regarding Readers. I've noticed that when I index, not every reader gets 
closed, so I eventually run out of avail. fd's.

Before trying to reproduce this problem using the smallest code 
possible, I'd like to know if lucene is supposed to close every reader 
in a Document after the IndexWriter.updateDocument(Term, Document) has 
been called?

Is there a path where lucene may "wait" before closing the readers? 
Maybe after it indexes some other documents?

In my case, I am using one Reader in my field and it is a 
BufferedReader(), but I don't think that should make any difference 
(I'll re-try with a standard reader).

Can someone confirm that after an updateDocument all readers in the 
document should be closed by lucene?

Daniel Shane



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by John Wang <jo...@gmail.com>.
Oops, I completely misunderstood the question. I thought this is about
IndexReaders :)
-John

On Sun, Sep 27, 2009 at 11:14 AM, John Wang <jo...@gmail.com> wrote:

> AFAIK, application has always assume the responsibility of closing
> IndexReader instances.
> However, with 2.9, this is the first time, IndexReader can be instantiated
> via a getter from IndexWriter.
>
> Previously, IndexReaders are usually constructed via IndexReader.open
> factory method. Having a getter on an IndexWriter makes it unclear from the
> API point of view.
>
> IndexReader instances are encapsulated inside IndexWriters, and
> IndexWriters sort of own the IndexReaders, which can be a good thing:
> IndexWriter may want to pool these IndexReader instances etc.
>
> Of course, this is nothing some simple JavaDoc can't fix :)
>
> My $0.02
>
> -John
>
> On Fri, Sep 25, 2009 at 11:41 PM, Daniel Shane <sh...@lexum.umontreal.ca>wrote:
>
>> Thanks Mark for the pointer, I thought somehow that lucene closed them as
>> a convenience, I don't know if it did that in previous releases (aka 2.4.1)
>> but I'll close them myself from now on.
>>
>> Daniel Shane
>>
>>
>> Mark Miller wrote:
>>
>>> Standard convention is that you close our own readers, not the methods
>>> you pass them into.
>>>
>>> Daniel Shane wrote:
>>>
>>>
>>>> I'm trying to track a bug in my application using Lucene rc5, its
>>>> regarding Readers. I've noticed that when I index, not every reader
>>>> gets closed, so I eventually run out of avail. fd's.
>>>>
>>>> Before trying to reproduce this problem using the smallest code
>>>> possible, I'd like to know if lucene is supposed to close every reader
>>>> in a Document after the IndexWriter.updateDocument(Term, Document) has
>>>> been called?
>>>>
>>>> Is there a path where lucene may "wait" before closing the readers?
>>>> Maybe after it indexes some other documents?
>>>>
>>>> In my case, I am using one Reader in my field and it is a
>>>> BufferedReader(), but I don't think that should make any difference
>>>> (I'll re-try with a standard reader).
>>>>
>>>> Can someone confirm that after an updateDocument all readers in the
>>>> document should be closed by lucene?
>>>>
>>>> Daniel Shane
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Mark Miller <ma...@gmail.com>.
Wrong Reader ;)

- Mark

http://www.lucidimagination.com (mobile)

On Sep 26, 2009, at 11:14 PM, John Wang <jo...@gmail.com> wrote:

> AFAIK, application has always assume the responsibility of closing  
> IndexReader instances.
>
> However, with 2.9, this is the first time, IndexReader can be  
> instantiated via a getter from IndexWriter.
>
> Previously, IndexReaders are usually constructed via  
> IndexReader.open factory method. Having a getter on an IndexWriter  
> makes it unclear from the API point of view.
>
> IndexReader instances are encapsulated inside IndexWriters, and  
> IndexWriters sort of own the IndexReaders, which can be a good  
> thing: IndexWriter may want to pool these IndexReader instances etc.
>
> Of course, this is nothing some simple JavaDoc can't fix :)
>
> My $0.02
>
> -John
>
> On Fri, Sep 25, 2009 at 11:41 PM, Daniel Shane <shaned@lexum.umontreal.ca 
> > wrote:
> Thanks Mark for the pointer, I thought somehow that lucene closed  
> them as a convenience, I don't know if it did that in previous  
> releases (aka 2.4.1) but I'll close them myself from now on.
>
> Daniel Shane
>
>
> Mark Miller wrote:
> Standard convention is that you close our own readers, not the methods
> you pass them into.
>
> Daniel Shane wrote:
>
> I'm trying to track a bug in my application using Lucene rc5, its
> regarding Readers. I've noticed that when I index, not every reader
> gets closed, so I eventually run out of avail. fd's.
>
> Before trying to reproduce this problem using the smallest code
> possible, I'd like to know if lucene is supposed to close every reader
> in a Document after the IndexWriter.updateDocument(Term, Document) has
> been called?
>
> Is there a path where lucene may "wait" before closing the readers?
> Maybe after it indexes some other documents?
>
> In my case, I am using one Reader in my field and it is a
> BufferedReader(), but I don't think that should make any difference
> (I'll re-try with a standard reader).
>
> Can someone confirm that after an updateDocument all readers in the
> document should be closed by lucene?
>
> Daniel Shane
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by John Wang <jo...@gmail.com>.
AFAIK, application has always assume the responsibility of closing
IndexReader instances.
However, with 2.9, this is the first time, IndexReader can be instantiated
via a getter from IndexWriter.

Previously, IndexReaders are usually constructed via IndexReader.open
factory method. Having a getter on an IndexWriter makes it unclear from the
API point of view.

IndexReader instances are encapsulated inside IndexWriters, and IndexWriters
sort of own the IndexReaders, which can be a good thing: IndexWriter may
want to pool these IndexReader instances etc.

Of course, this is nothing some simple JavaDoc can't fix :)

My $0.02

-John

On Fri, Sep 25, 2009 at 11:41 PM, Daniel Shane <sh...@lexum.umontreal.ca>wrote:

> Thanks Mark for the pointer, I thought somehow that lucene closed them as a
> convenience, I don't know if it did that in previous releases (aka 2.4.1)
> but I'll close them myself from now on.
>
> Daniel Shane
>
>
> Mark Miller wrote:
>
>> Standard convention is that you close our own readers, not the methods
>> you pass them into.
>>
>> Daniel Shane wrote:
>>
>>
>>> I'm trying to track a bug in my application using Lucene rc5, its
>>> regarding Readers. I've noticed that when I index, not every reader
>>> gets closed, so I eventually run out of avail. fd's.
>>>
>>> Before trying to reproduce this problem using the smallest code
>>> possible, I'd like to know if lucene is supposed to close every reader
>>> in a Document after the IndexWriter.updateDocument(Term, Document) has
>>> been called?
>>>
>>> Is there a path where lucene may "wait" before closing the readers?
>>> Maybe after it indexes some other documents?
>>>
>>> In my case, I am using one Reader in my field and it is a
>>> BufferedReader(), but I don't think that should make any difference
>>> (I'll re-try with a standard reader).
>>>
>>> Can someone confirm that after an updateDocument all readers in the
>>> document should be closed by lucene?
>>>
>>> Daniel Shane
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Chris Hostetter <ho...@fucit.org>.
: However, they may be something with the fact that Lucene's Analyzers
: automatically close the reader when its done analyzing. I think this
: encourages people not to explicitly close them, and creates the potential of
: having open fd's if an exception is thrown in the middle of the analysis or
: before addDocument/updateDocument is called.

It's always been the case that users should close their own Readers -- 
lucene's docs have never indicated that they will close hte reader for 
you, it's just a helpful side effect that once IndexWRiter has consumed 
all hte chars from a Reader it calls close() -- the caller should still 
close() explicitly for precisely the reasons you listed, but there's 
really no downside to multiple close calls.

even if we werent' worried about breaking existing client code (where 
people never call close themselves) it would still be a good idea to leave 
the close() calls in because the sooner the Readers are closed the sooner 
the descriptor can be released -- no reason to wait (ie: during a 
serialized merge for example) until addDocument is done if hte Reader has 
been completley exhausted.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Daniel Shane <sh...@LEXUM.UMontreal.CA>.
Oh boy!

It seems like I have found the problem in my case, which afaik, has 
nothing to do with lucene but rather the library we use to tokenize HTML 
document. Its just that we have changed our HTML parser at the same time 
as the version of Lucene and nekoHTML (cyberneko) does not close its 
HTML reader even when we call parser.abort()/parser.close() (which is 
placed in the close() of the lucene Tokenizer()).

Before that, the HTML parser would close the reader so I wrongfully 
thought it was the change of version of Lucene that caused this.

Bad news is that I had you all worked up for nothing, but good news is 
you don't have any bugs here.

However, they may be something with the fact that Lucene's Analyzers 
automatically close the reader when its done analyzing. I think this 
encourages people not to explicitly close them, and creates the 
potential of having open fd's if an exception is thrown in the middle of 
the analysis or before addDocument/updateDocument is called.

I don't think changing the API of Field to accept a "ReaderFactory" 
would solve anything because there are cases where you must index a 
reader that is already opened (like a network connection) and wrapping 
it with a dummy readerFactory does not look very good.

Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Mark Miller <ma...@gmail.com>.
Yeah, I agree Hoss - we shouldn't pull the rug in either case - if you
could count on it before and its worked out well, its more of a hassle
to push out a surprise than to keep the known behavior.

More of an aside from me ;)

Chris Hostetter wrote:
> : That is my opinion, too. Closing the readers should be done by the caller in
>
> I don't disagree with either of you, but...
>
> : a finally block and not automatically by the IW. I only wanted to confirm,
> : that the behaviour of 2.9 did not change. Closing readers two times is not a
>
> ...i wanted to try and confirm that as well.  if we conciously decide that 
> IndexWriter is going to *stop* closing all Readers that's fine with me, 
> but in the absence of a specific statement like that in the release notes 
> we should strive for no suprises.  (that doesn't have to come in the form 
> of code changes, it can simply be an announcemnt on java-user and 
> documented cavet in the applicable code ... but as yet we don't have 
> confirmation that any behavior change exists.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Chris Hostetter <ho...@fucit.org>.
: That is my opinion, too. Closing the readers should be done by the caller in

I don't disagree with either of you, but...

: a finally block and not automatically by the IW. I only wanted to confirm,
: that the behaviour of 2.9 did not change. Closing readers two times is not a

...i wanted to try and confirm that as well.  if we conciously decide that 
IndexWriter is going to *stop* closing all Readers that's fine with me, 
but in the absence of a specific statement like that in the release notes 
we should strive for no suprises.  (that doesn't have to come in the form 
of code changes, it can simply be an announcemnt on java-user and 
documented cavet in the applicable code ... but as yet we don't have 
confirmation that any behavior change exists.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Uwe Schindler <uw...@thetaphi.de>.
> I guess the fact that Lucene closes it is a legacy we may be stuck with
> - but I think it encourages bad practice code. There is plenty of room
> for an exception to be thrown before the Reader is sure to be closed in
> a finally block.
> 
> Part of why its standard convention to close your own I think, rather
> then somewhere in the method you pass it into. Best practice is close it
> in a finally block yourself, with the try starting right after you make
> it.

That is my opinion, too. Closing the readers should be done by the caller in
a finally block and not automatically by the IW. I only wanted to confirm,
that the behaviour of 2.9 did not change. Closing readers two times is not a
problem. So I would suggest, that all implementors do a close after adding
the document to the IW.

> Not a big deal though. If it was, we'd know about it by now ;)
> 
> Chris Hostetter wrote:
> > : So in 2.9, the Reader is correctly closed, if the TokenStream chain is
> > : correctly set up, passing all close() calls to the delegate.
> >
> > Thanks for digging into that Uwe.
> >
> > So Daniel: The ball is in your court here: what analyzer /
> > tokenizer+tokenfilters is your app using in the cases where you see
> > Readers not getting closed by Lucene -- if they involve your own custom
> > Tokenizers then that may be where the problem is, but if all the
> Analysis
> > pieces you are using come out of hte box with Lucene please let us know
> so
> > we can check them.
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Mark Miller <ma...@gmail.com>.
I guess the fact that Lucene closes it is a legacy we may be stuck with
- but I think it encourages bad practice code. There is plenty of room
for an exception to be thrown before the Reader is sure to be closed in
a finally block.

Part of why its standard convention to close your own I think, rather
then somewhere in the method you pass it into. Best practice is close it
in a finally block yourself, with the try starting right after you make it.


Not a big deal though. If it was, we'd know about it by now ;)

Chris Hostetter wrote:
> : So in 2.9, the Reader is correctly closed, if the TokenStream chain is
> : correctly set up, passing all close() calls to the delegate.
>
> Thanks for digging into that Uwe.
>
> So Daniel: The ball is in your court here: what analyzer / 
> tokenizer+tokenfilters is your app using in the cases where you see 
> Readers not getting closed by Lucene -- if they involve your own custom 
> Tokenizers then that may be where the problem is, but if all the Analysis 
> pieces you are using come out of hte box with Lucene please let us know so 
> we can check them.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Chris Hostetter <ho...@fucit.org>.
: So in 2.9, the Reader is correctly closed, if the TokenStream chain is
: correctly set up, passing all close() calls to the delegate.

Thanks for digging into that Uwe.

So Daniel: The ball is in your court here: what analyzer / 
tokenizer+tokenfilters is your app using in the cases where you see 
Readers not getting closed by Lucene -- if they involve your own custom 
Tokenizers then that may be where the problem is, but if all the Analysis 
pieces you are using come out of hte box with Lucene please let us know so 
we can check them.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Sorry my last reply was nonsense. Even if you reuse TokenStreams, the
consumer should close the stream at the end of operations.

And this is for sure done by the DocInverterPerField (finally-block within
TokenStream.close()). Maybe the user with the problem has created a
TokenFilter or something like that, not passing TS.close() to the underlying
TS? We had such a problem with one of the Solr "TokenFilters" which was a
subclass of Tokenizer not TokenFilter and missed to override the
close/end/... methods.

So in 2.9, the Reader is correctly closed, if the TokenStream chain is
correctly set up, passing all close() calls to the delegate.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Sunday, September 27, 2009 10:48 AM
> To: java-dev@lucene.apache.org
> Subject: RE: Lucene 2.9.0-rc5 : Reader stays open after
> IndexWriter.updateDocument(), is that possible?
> 
> I think I know, what is different than earlier versions:
> 
> The close of the Reader is done by Tokenizer.close(). As in 2.9.0 we did
> much work to get all the Tokenizers reusable, there is the following
> problem:
> 
> The Reader is only closed, if you call Tokenizer.close(), but not, if you
> call Tokenizer.reset(newReader). This was always be so (and has nothing to
> do with the new TokenStream API), but in earlier version not all analyzers
> were able to reuse the TokenStreams.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: Uwe Schindler [mailto:uwe@thetaphi.de]
> > Sent: Sunday, September 27, 2009 10:42 AM
> > To: java-dev@lucene.apache.org
> > Subject: RE: Lucene 2.9.0-rc5 : Reader stays open after
> > IndexWriter.updateDocument(), is that possible?
> >
> > How does your test perform with 2.4.1 ?
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> > > Sent: Sunday, September 27, 2009 3:34 AM
> > > To: Lucene Dev
> > > Subject: Re: Lucene 2.9.0-rc5 : Reader stays open after
> > > IndexWriter.updateDocument(), is that possible?
> > >
> > >
> > > : Thanks Mark for the pointer, I thought somehow that lucene closed
> them
> > > as a
> > > : convenience, I don't know if it did that in previous releases (aka
> > > 2.4.1) but
> > > : I'll close them myself from now on.
> > >
> > > FWIW: As far as i know, Lucene has always closed the Reader for you
> when
> > > calling addDocument/updateDocument -- BUT -- the docs never promized
> > > that Lucene would close any Readers used in Fields.  In fact the Field
> > > constructor docs say "you may not close the Reader until addDocument
> has
> > > been called" suggesting that you should close it yourself.
> > > (Reader.close() is very clear that there should be no effect on
> closing
> > a
> > > Reader multiple times, so this is safe no matter what Lucene does)
> > >
> > > That said: If the behavior has changed in 2.9, this could easily bite
> > lots
> > > of people in the ass if they haven't been closing their readers and
> now
> > > they run out of file handles.  I wrote a quick test to try and
> reproduce
> > > the problem you're describing, but as far as i can tell 2.9.0
> > > (final) still seems to close the Reader for you.
> > >
> > > Can anyone else reproduce this problem of Reader's in Field's not
> > getting
> > > closed?  (my test is below)
> > >
> > > --BEGIN--
> > > package org.apache.lucene;
> > >
> > > /**
> > >  * Licensed to the Apache Software Foundation (ASF) under one or more
> > >  * contributor license agreements.  See the NOTICE file distributed
> with
> > >  * this work for additional information regarding copyright ownership.
> > >  * The ASF licenses this file to You under the Apache License, Version
> > 2.0
> > >  * (the "License"); you may not use this file except in compliance
> with
> > >  * the License.  You may obtain a copy of the License at
> > >  *
> > >  *     http://www.apache.org/licenses/LICENSE-2.0
> > >  *
> > >  * Unless required by applicable law or agreed to in writing, software
> > >  * distributed under the License is distributed on an "AS IS" BASIS,
> > >  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > > implied.
> > >  * See the License for the specific language governing permissions and
> > >  * limitations under the License.
> > >  */
> > >
> > > import org.apache.lucene.analysis.KeywordAnalyzer;
> > > import org.apache.lucene.index.*;
> > > import org.apache.lucene.document.*;
> > > import org.apache.lucene.util.LuceneTestCase;
> > > import org.apache.lucene.store.RAMDirectory;
> > >
> > > import java.io.*;
> > >
> > > public class TestFieldWithReaderClosing extends LuceneTestCase {
> > >
> > >   IndexWriter writer = null;
> > >   Document d = null;
> > >   CloseStateReader reader;
> > >   public void setUp() throws Exception {
> > >     writer = new IndexWriter(new RAMDirectory(),
> > >                              new KeywordAnalyzer(), true,
> > >                              IndexWriter.MaxFieldLength.LIMITED);
> > >     d = new Document();
> > >     d.add(new Field("id", "x", Field.Store.YES,
> Field.Index.ANALYZED));
> > >     reader = new CloseStateReader("foo");
> > >     d.add(new Field("contents", reader));
> > >   }
> > >   public void tearDown() throws Exception {
> > >     writer.close();
> > >     writer = null;
> > >     reader.close();
> > >     reader = null;
> > >   }
> > >
> > >   public void testAdd() throws Exception {
> > >     writer.addDocument(d);
> > >     assertEquals("close count should be 1", 1,
> reader.getCloseCount());
> > >     writer.close();
> > >     assertEquals("close count should still be 1", 1,
> > > reader.getCloseCount());
> > >   }
> > >   public void testEmptyUpdate() throws Exception {
> > >     writer.updateDocument(new Term("id","x"), d);
> > >     assertEquals("close count should be 1", 1,
> reader.getCloseCount());
> > >     writer.close();
> > >     assertEquals("close count should still be 1", 1,
> > > reader.getCloseCount());
> > >   }
> > >   public void testAddAndUpdate() throws Exception {
> > >     writer.addDocument(d);
> > >     assertEquals("close count should be 1", 1,
> reader.getCloseCount());
> > >     d = new Document();
> > >     d.add(new Field("id", "x", Field.Store.YES,
> Field.Index.ANALYZED));
> > >     reader = new CloseStateReader("foo");
> > >     d.add(new Field("contents", reader));
> > >     writer.updateDocument(new Term("id","x"), d);
> > >     assertEquals("new close count should be 1", 1,
> > > reader.getCloseCount());
> > >     writer.close();
> > >     assertEquals("new close count should still be 1", 1,
> > > reader.getCloseCount());
> > >   }
> > >
> > >
> > >   static class CloseStateReader extends StringReader {
> > >     private int closeCount = 0;
> > >     public CloseStateReader(String s) {
> > >       super(s);
> > >     }
> > >     public synchronized void close() {
> > >       closeCount++;
> > >       super.close();
> > >     }
> > >     public int getCloseCount() {
> > >       return closeCount;
> > >     }
> > >   }
> > > }
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Uwe Schindler <uw...@thetaphi.de>.
I think I know, what is different than earlier versions:

The close of the Reader is done by Tokenizer.close(). As in 2.9.0 we did
much work to get all the Tokenizers reusable, there is the following
problem:

The Reader is only closed, if you call Tokenizer.close(), but not, if you
call Tokenizer.reset(newReader). This was always be so (and has nothing to
do with the new TokenStream API), but in earlier version not all analyzers
were able to reuse the TokenStreams.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Sunday, September 27, 2009 10:42 AM
> To: java-dev@lucene.apache.org
> Subject: RE: Lucene 2.9.0-rc5 : Reader stays open after
> IndexWriter.updateDocument(), is that possible?
> 
> How does your test perform with 2.4.1 ?
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> > Sent: Sunday, September 27, 2009 3:34 AM
> > To: Lucene Dev
> > Subject: Re: Lucene 2.9.0-rc5 : Reader stays open after
> > IndexWriter.updateDocument(), is that possible?
> >
> >
> > : Thanks Mark for the pointer, I thought somehow that lucene closed them
> > as a
> > : convenience, I don't know if it did that in previous releases (aka
> > 2.4.1) but
> > : I'll close them myself from now on.
> >
> > FWIW: As far as i know, Lucene has always closed the Reader for you when
> > calling addDocument/updateDocument -- BUT -- the docs never promized
> > that Lucene would close any Readers used in Fields.  In fact the Field
> > constructor docs say "you may not close the Reader until addDocument has
> > been called" suggesting that you should close it yourself.
> > (Reader.close() is very clear that there should be no effect on closing
> a
> > Reader multiple times, so this is safe no matter what Lucene does)
> >
> > That said: If the behavior has changed in 2.9, this could easily bite
> lots
> > of people in the ass if they haven't been closing their readers and now
> > they run out of file handles.  I wrote a quick test to try and reproduce
> > the problem you're describing, but as far as i can tell 2.9.0
> > (final) still seems to close the Reader for you.
> >
> > Can anyone else reproduce this problem of Reader's in Field's not
> getting
> > closed?  (my test is below)
> >
> > --BEGIN--
> > package org.apache.lucene;
> >
> > /**
> >  * Licensed to the Apache Software Foundation (ASF) under one or more
> >  * contributor license agreements.  See the NOTICE file distributed with
> >  * this work for additional information regarding copyright ownership.
> >  * The ASF licenses this file to You under the Apache License, Version
> 2.0
> >  * (the "License"); you may not use this file except in compliance with
> >  * the License.  You may obtain a copy of the License at
> >  *
> >  *     http://www.apache.org/licenses/LICENSE-2.0
> >  *
> >  * Unless required by applicable law or agreed to in writing, software
> >  * distributed under the License is distributed on an "AS IS" BASIS,
> >  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> >  * See the License for the specific language governing permissions and
> >  * limitations under the License.
> >  */
> >
> > import org.apache.lucene.analysis.KeywordAnalyzer;
> > import org.apache.lucene.index.*;
> > import org.apache.lucene.document.*;
> > import org.apache.lucene.util.LuceneTestCase;
> > import org.apache.lucene.store.RAMDirectory;
> >
> > import java.io.*;
> >
> > public class TestFieldWithReaderClosing extends LuceneTestCase {
> >
> >   IndexWriter writer = null;
> >   Document d = null;
> >   CloseStateReader reader;
> >   public void setUp() throws Exception {
> >     writer = new IndexWriter(new RAMDirectory(),
> >                              new KeywordAnalyzer(), true,
> >                              IndexWriter.MaxFieldLength.LIMITED);
> >     d = new Document();
> >     d.add(new Field("id", "x", Field.Store.YES, Field.Index.ANALYZED));
> >     reader = new CloseStateReader("foo");
> >     d.add(new Field("contents", reader));
> >   }
> >   public void tearDown() throws Exception {
> >     writer.close();
> >     writer = null;
> >     reader.close();
> >     reader = null;
> >   }
> >
> >   public void testAdd() throws Exception {
> >     writer.addDocument(d);
> >     assertEquals("close count should be 1", 1, reader.getCloseCount());
> >     writer.close();
> >     assertEquals("close count should still be 1", 1,
> > reader.getCloseCount());
> >   }
> >   public void testEmptyUpdate() throws Exception {
> >     writer.updateDocument(new Term("id","x"), d);
> >     assertEquals("close count should be 1", 1, reader.getCloseCount());
> >     writer.close();
> >     assertEquals("close count should still be 1", 1,
> > reader.getCloseCount());
> >   }
> >   public void testAddAndUpdate() throws Exception {
> >     writer.addDocument(d);
> >     assertEquals("close count should be 1", 1, reader.getCloseCount());
> >     d = new Document();
> >     d.add(new Field("id", "x", Field.Store.YES, Field.Index.ANALYZED));
> >     reader = new CloseStateReader("foo");
> >     d.add(new Field("contents", reader));
> >     writer.updateDocument(new Term("id","x"), d);
> >     assertEquals("new close count should be 1", 1,
> > reader.getCloseCount());
> >     writer.close();
> >     assertEquals("new close count should still be 1", 1,
> > reader.getCloseCount());
> >   }
> >
> >
> >   static class CloseStateReader extends StringReader {
> >     private int closeCount = 0;
> >     public CloseStateReader(String s) {
> >       super(s);
> >     }
> >     public synchronized void close() {
> >       closeCount++;
> >       super.close();
> >     }
> >     public int getCloseCount() {
> >       return closeCount;
> >     }
> >   }
> > }
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Uwe Schindler <uw...@thetaphi.de>.
How does your test perform with 2.4.1 ?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Sunday, September 27, 2009 3:34 AM
> To: Lucene Dev
> Subject: Re: Lucene 2.9.0-rc5 : Reader stays open after
> IndexWriter.updateDocument(), is that possible?
> 
> 
> : Thanks Mark for the pointer, I thought somehow that lucene closed them
> as a
> : convenience, I don't know if it did that in previous releases (aka
> 2.4.1) but
> : I'll close them myself from now on.
> 
> FWIW: As far as i know, Lucene has always closed the Reader for you when
> calling addDocument/updateDocument -- BUT -- the docs never promized
> that Lucene would close any Readers used in Fields.  In fact the Field
> constructor docs say "you may not close the Reader until addDocument has
> been called" suggesting that you should close it yourself.
> (Reader.close() is very clear that there should be no effect on closing a
> Reader multiple times, so this is safe no matter what Lucene does)
> 
> That said: If the behavior has changed in 2.9, this could easily bite lots
> of people in the ass if they haven't been closing their readers and now
> they run out of file handles.  I wrote a quick test to try and reproduce
> the problem you're describing, but as far as i can tell 2.9.0
> (final) still seems to close the Reader for you.
> 
> Can anyone else reproduce this problem of Reader's in Field's not getting
> closed?  (my test is below)
> 
> --BEGIN--
> package org.apache.lucene;
> 
> /**
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *     http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> 
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.index.*;
> import org.apache.lucene.document.*;
> import org.apache.lucene.util.LuceneTestCase;
> import org.apache.lucene.store.RAMDirectory;
> 
> import java.io.*;
> 
> public class TestFieldWithReaderClosing extends LuceneTestCase {
> 
>   IndexWriter writer = null;
>   Document d = null;
>   CloseStateReader reader;
>   public void setUp() throws Exception {
>     writer = new IndexWriter(new RAMDirectory(),
>                              new KeywordAnalyzer(), true,
>                              IndexWriter.MaxFieldLength.LIMITED);
>     d = new Document();
>     d.add(new Field("id", "x", Field.Store.YES, Field.Index.ANALYZED));
>     reader = new CloseStateReader("foo");
>     d.add(new Field("contents", reader));
>   }
>   public void tearDown() throws Exception {
>     writer.close();
>     writer = null;
>     reader.close();
>     reader = null;
>   }
> 
>   public void testAdd() throws Exception {
>     writer.addDocument(d);
>     assertEquals("close count should be 1", 1, reader.getCloseCount());
>     writer.close();
>     assertEquals("close count should still be 1", 1,
> reader.getCloseCount());
>   }
>   public void testEmptyUpdate() throws Exception {
>     writer.updateDocument(new Term("id","x"), d);
>     assertEquals("close count should be 1", 1, reader.getCloseCount());
>     writer.close();
>     assertEquals("close count should still be 1", 1,
> reader.getCloseCount());
>   }
>   public void testAddAndUpdate() throws Exception {
>     writer.addDocument(d);
>     assertEquals("close count should be 1", 1, reader.getCloseCount());
>     d = new Document();
>     d.add(new Field("id", "x", Field.Store.YES, Field.Index.ANALYZED));
>     reader = new CloseStateReader("foo");
>     d.add(new Field("contents", reader));
>     writer.updateDocument(new Term("id","x"), d);
>     assertEquals("new close count should be 1", 1,
> reader.getCloseCount());
>     writer.close();
>     assertEquals("new close count should still be 1", 1,
> reader.getCloseCount());
>   }
> 
> 
>   static class CloseStateReader extends StringReader {
>     private int closeCount = 0;
>     public CloseStateReader(String s) {
>       super(s);
>     }
>     public synchronized void close() {
>       closeCount++;
>       super.close();
>     }
>     public int getCloseCount() {
>       return closeCount;
>     }
>   }
> }
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks Mark for the pointer, I thought somehow that lucene closed them as a
: convenience, I don't know if it did that in previous releases (aka 2.4.1) but
: I'll close them myself from now on.

FWIW: As far as i know, Lucene has always closed the Reader for you when 
calling addDocument/updateDocument -- BUT -- the docs never promized 
that Lucene would close any Readers used in Fields.  In fact the Field 
constructor docs say "you may not close the Reader until addDocument has 
been called" suggesting that you should close it yourself.  
(Reader.close() is very clear that there should be no effect on closing a 
Reader multiple times, so this is safe no matter what Lucene does)

That said: If the behavior has changed in 2.9, this could easily bite lots 
of people in the ass if they haven't been closing their readers and now 
they run out of file handles.  I wrote a quick test to try and reproduce 
the problem you're describing, but as far as i can tell 2.9.0 
(final) still seems to close the Reader for you.

Can anyone else reproduce this problem of Reader's in Field's not getting 
closed?  (my test is below)

--BEGIN--
package org.apache.lucene;

/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.store.RAMDirectory;

import java.io.*;

public class TestFieldWithReaderClosing extends LuceneTestCase {

  IndexWriter writer = null;
  Document d = null;
  CloseStateReader reader;
  public void setUp() throws Exception {
    writer = new IndexWriter(new RAMDirectory(),
                             new KeywordAnalyzer(), true,
                             IndexWriter.MaxFieldLength.LIMITED);
    d = new Document();
    d.add(new Field("id", "x", Field.Store.YES, Field.Index.ANALYZED));
    reader = new CloseStateReader("foo");
    d.add(new Field("contents", reader));
  }
  public void tearDown() throws Exception {
    writer.close();
    writer = null;
    reader.close();
    reader = null;
  }
  
  public void testAdd() throws Exception {
    writer.addDocument(d);
    assertEquals("close count should be 1", 1, reader.getCloseCount());
    writer.close();
    assertEquals("close count should still be 1", 1, reader.getCloseCount());
  }
  public void testEmptyUpdate() throws Exception {
    writer.updateDocument(new Term("id","x"), d);
    assertEquals("close count should be 1", 1, reader.getCloseCount());
    writer.close();
    assertEquals("close count should still be 1", 1, reader.getCloseCount());
  }
  public void testAddAndUpdate() throws Exception {
    writer.addDocument(d);
    assertEquals("close count should be 1", 1, reader.getCloseCount());
    d = new Document();
    d.add(new Field("id", "x", Field.Store.YES, Field.Index.ANALYZED));
    reader = new CloseStateReader("foo");
    d.add(new Field("contents", reader));
    writer.updateDocument(new Term("id","x"), d);
    assertEquals("new close count should be 1", 1, reader.getCloseCount());
    writer.close();
    assertEquals("new close count should still be 1", 1, reader.getCloseCount());
  }

  
  static class CloseStateReader extends StringReader {
    private int closeCount = 0;
    public CloseStateReader(String s) {
      super(s);
    }
    public synchronized void close() {
      closeCount++;
      super.close();
    }
    public int getCloseCount() {
      return closeCount;
    }
  }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Daniel Shane <sh...@LEXUM.UMontreal.CA>.
Thanks Mark for the pointer, I thought somehow that lucene closed them 
as a convenience, I don't know if it did that in previous releases (aka 
2.4.1) but I'll close them myself from now on.

Daniel Shane

Mark Miller wrote:
> Standard convention is that you close our own readers, not the methods
> you pass them into.
>
> Daniel Shane wrote:
>   
>> I'm trying to track a bug in my application using Lucene rc5, its
>> regarding Readers. I've noticed that when I index, not every reader
>> gets closed, so I eventually run out of avail. fd's.
>>
>> Before trying to reproduce this problem using the smallest code
>> possible, I'd like to know if lucene is supposed to close every reader
>> in a Document after the IndexWriter.updateDocument(Term, Document) has
>> been called?
>>
>> Is there a path where lucene may "wait" before closing the readers?
>> Maybe after it indexes some other documents?
>>
>> In my case, I am using one Reader in my field and it is a
>> BufferedReader(), but I don't think that should make any difference
>> (I'll re-try with a standard reader).
>>
>> Can someone confirm that after an updateDocument all readers in the
>> document should be closed by lucene?
>>
>> Daniel Shane
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>     
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?

Posted by Mark Miller <ma...@gmail.com>.
Standard convention is that you close our own readers, not the methods
you pass them into.

Daniel Shane wrote:
> I'm trying to track a bug in my application using Lucene rc5, its
> regarding Readers. I've noticed that when I index, not every reader
> gets closed, so I eventually run out of avail. fd's.
>
> Before trying to reproduce this problem using the smallest code
> possible, I'd like to know if lucene is supposed to close every reader
> in a Document after the IndexWriter.updateDocument(Term, Document) has
> been called?
>
> Is there a path where lucene may "wait" before closing the readers?
> Maybe after it indexes some other documents?
>
> In my case, I am using one Reader in my field and it is a
> BufferedReader(), but I don't think that should make any difference
> (I'll re-try with a standard reader).
>
> Can someone confirm that after an updateDocument all readers in the
> document should be closed by lucene?
>
> Daniel Shane
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org