You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tony Schwartz <to...@simpleobjects.com> on 2005/07/20 14:04:57 UTC

Re: if delete all docs in segment - when is segment deleted

I added the following code to the close() method of IndexWriter to detect a segment that
has all documents deleted upon close.  Does anyone see any problem with this?

=================================
  public synchronized void close() throws IOException {
    flushRamSegments();
    ramDirectory.close();

    if ( directory instanceof FSDirectory && closeDir ) {
    	///////////////////////////
    	// check for any segments that have all docs deleted and remove it.
    	final Vector deletable = new Vector();
    	int len = segmentInfos.size();
    	SegmentReader reader;
    	for ( int i = 0 ; i < len ; i++ ) {
    	  reader = SegmentReader.get( segmentInfos.info( i ) );
   	    if ( reader.numDocs() <= 0 ) { //numDocs excludes deleted docs
   	      deletable.add( reader );
   	    }
    	}
	synchronized (directory) {                 // in- & inter-process sync
	  new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
	      public Object doBody() throws IOException {
	        segmentInfos.write( directory );     // commit before deleting
	        deleteSegments( deletable );  // delete now-unused segments
	        return null;
	      }
	    }.run();
	}
    }

    if (writeLock != null) {
      writeLock.release(); // release write lock
      writeLock = null;
    }
    if(closeDir)
      directory.close();
  }
=================================

Tony Schwartz
tony@simpleobjects.com
"What we need is more cowbell."




> If every doc in a segment is deleted, when does the segment go away?
> Without me having to dig too deep, I was hoping someone could help me prepare for this
> eventuality.  I have an index that grows infinitely.  Old docs are deleted each day just
> before new docs for that day are added.  If I set MaxMergeDocs to some number, say 1
> million, and the segment has 1 million docs in it, and every doc in that segment is
> deleted, will the segment ever be deleted?  If not, how difficult would it be to add
> some type of trigger to detect this "all deleted in segment" condition so lucene could
> remove the huge segment to free disk space.  I'm concerned the segment will never be
> deleted.
>
> Tony Schwartz
> tony@simpleobjects.com
> There are 10 types of people in this world.  Ones that understand binary and ones that
> don't.
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: if delete all docs in segment - when is segment deleted

Posted by Tony Schwartz <to...@simpleobjects.com>.
Actually, there is no need for the closeDir check in the line:
if ( directory instanceof FSDirectory && closeDir ) {

I could also check the size of the deletable before locking for actual deletes.


Tony Schwartz
tony@simpleobjects.com
"What we need is more cowbell."

> I added the following code to the close() method of IndexWriter to detect a segment that
> has all documents deleted upon close.  Does anyone see any problem with this?
>
> =================================
>   public synchronized void close() throws IOException {
>     flushRamSegments();
>     ramDirectory.close();
>
>     if ( directory instanceof FSDirectory && closeDir ) {
>     	///////////////////////////
>     	// check for any segments that have all docs deleted and remove it.
>     	final Vector deletable = new Vector();
>     	int len = segmentInfos.size();
>     	SegmentReader reader;
>     	for ( int i = 0 ; i < len ; i++ ) {
>     	  reader = SegmentReader.get( segmentInfos.info( i ) );
>    	    if ( reader.numDocs() <= 0 ) { //numDocs excludes deleted docs
>    	      deletable.add( reader );
>    	    }
>     	}
> 	synchronized (directory) {                 // in- & inter-process sync
> 	  new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
> 	      public Object doBody() throws IOException {
> 	        segmentInfos.write( directory );     // commit before deleting
> 	        deleteSegments( deletable );  // delete now-unused segments
> 	        return null;
> 	      }
> 	    }.run();
> 	}
>     }
>
>     if (writeLock != null) {
>       writeLock.release(); // release write lock
>       writeLock = null;
>     }
>     if(closeDir)
>       directory.close();
>   }
> =================================
>
> Tony Schwartz
> tony@simpleobjects.com
> "What we need is more cowbell."
>
>
>
>
>> If every doc in a segment is deleted, when does the segment go away?
>> Without me having to dig too deep, I was hoping someone could help me prepare for this
>> eventuality.  I have an index that grows infinitely.  Old docs are deleted each day
>> just
>> before new docs for that day are added.  If I set MaxMergeDocs to some number, say 1
>> million, and the segment has 1 million docs in it, and every doc in that segment is
>> deleted, will the segment ever be deleted?  If not, how difficult would it be to add
>> some type of trigger to detect this "all deleted in segment" condition so lucene could
>> remove the huge segment to free disk space.  I'm concerned the segment will never be
>> deleted.
>>
>> Tony Schwartz
>> tony@simpleobjects.com
>> There are 10 types of people in this world.  Ones that understand binary and ones that
>> don't.
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org