You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Christoph Goller <go...@detego-software.de> on 2003/11/19 16:51:41 UTC

Re: File timestamps / subclassing of IndexReader

Sorry for the delay. I was busy with other things and it took some
time to fix the problems that Doug mentioned.

Furthermore, thanks for the links concerning dublicate checking.

I would like to commit my timestamp patch in the next few days.
So my question goes to  the mailing list whether there are any
objections.

Doug Cutting schrieb:
> I like this approach, but I have two concerns:
> 
> 1. For back-compatibility, the version number should be written at the 
> end of the "segments" file, not at its start.  This may make things a 
> big slower, but we should not incompatibly change the file format: 
> existing indexes should still open correctly.

done.
I also made some tests. Reading and old index and changing it works now.

> 
> 2. There should be public IndexReader.getVersion methods, correspodning 
> to the existing IndexReader.getLastModified methods.  The lastModified 
> methods should be deprecated, with a comment indicating that the 
> getVersion methods should be used instead.
> 

done

please have a look at the latest version of this patch in this mail
(again to be applied to index directory).

Some thoughts about FilterIndexReader:
I noticed that all synchronization for a FilterIndexReader is done
by the FilterIndexReader itself, more precisely by its superclass
IndexReader and its methods delete and close. The synchronization
is not done by FilterIndexReader's member in. Since I introduced
a SegmentInfos member in IndexReader for synchronization purposes
it is needed there also for the superclass of a FilterIndexReader.
I changed the constructor of FilterIndexReader to achieve that.


Christoph



Re: File timestamps / subclassing of IndexReader

Posted by Christoph Goller <go...@detego-software.de>.
Doug Cutting schrieb:
> Christoph Goller wrote:
> 
>> Sorry for the delay. I was busy with other things and it took some
>> time to fix the problems that Doug mentioned.
> 
> 
> No problem.  Thanks so much for working on this patch.  It looks really 
> good, and solves a real problem.  We can't demand that volunteers work 
> on a schedule!

My contribution is small compared to giving the whole thing to the public.
There are a couple of people and small companies doing
consulting and product development based on Lucene (including myself).
All this would not be possible without your work. Thank you Doug.

> My only concerns now are aesthetic.  You add segmentInfos as a protected 
> field of IndexReader, which is a public class.  This means it will 
> appear in the javadoc.  Yet this field is of type SegmentInfos, a 
> package-private class, which will not be in the javadoc.
> 
> I try hard to make the javadoc only show things which are meant to be a 
> part of the public API.  So the question is, should SegmentInfos be a 
> public class, i.e., do we think that folks might need to reference it in 
> their application code, or should it remain package private?
> 
> My rule is that, if you cannot imagine a reasonable use for something, 
> or if there's an already another public way of doing it, don't make it 
> public.  Keeping more of the implementation private makes it easier to 
> compatibly evolve the implementation.  If they really want, folks can 
> change things in their private copy of the sources.  What we want to 
> avoid is incompatible changes for folks who only use public APIs.  Over 
> the long term, a smaller public API is easier to maintain.
> 
> My preference is that SegmentInfos remain package private.  The easiest 
> way to do this would just be to make the segmentInfos field of 
> IndexReader package private (i.e. with no public/private declaration). 
> The only problem is that then IndexReader would not be correctly 
> subclassable from outside the package.
> 
> Alternately, segmentInfos could be made private, will all access from 
> IndexReader only.  Can you see a way to get things to work correctly 
> this way?

I have to think thoroughly about the whole problem. This will take some time.

> 
> Finally, you add a new constructor to SegmentsReader but the old 
> constructor is no longer used (except by the new constructor).  It would 
> be better to simply replace the existing constructor with a new 
> constructor.  Less code is better.

ok.

> 
> If you are tired of working on this, and/or have no more time, please 
> say so.  If you are unable, I will try to address the above issues myself.
> 

No, I would like to finish that.

 > As far as I can tell, this patch looks good.
 > I have a question, though.  You changed the constructor to
 > SegmentReader to take both SegmentInfos and a SegmentInfo.
 > Is there a need to pass both?
 > Also, in SegmentReader you do:
 >
 > -            directory().touchFile("segments");
 > +            if(segmentInfos != null)
 > +              segmentInfos.write(directory());
 >
 > Can segmentInfos ever be == null at this point?

In some of the test cases SegmentReader is used directly without
getting it via IndexReader.open. In those cases segmentInfos might be
null.

 >
 > I don't have the source code here to check what SegmentInfos'
 > write(...) method does, so I can't tell why it was added here.

It writes the segments file, which now (my patch) contains also the index
version number. The version number is incremented each time the index
is changes by either a IndexWriter or a IndexReader.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: File timestamps / subclassing of IndexReader

Posted by Doug Cutting <cu...@lucene.com>.
Christoph Goller wrote:
> Ps: Patch with segmentInfos field made package private is attached.

+1

Thanks for your persistence.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: File timestamps / subclassing of IndexReader

Posted by Christoph Goller <go...@detego-software.de>.
Doug Cutting schrieb:
> My preference is that SegmentInfos remain package private.  The easiest 
> way to do this would just be to make the segmentInfos field of 
> IndexReader package private (i.e. with no public/private declaration). 
> The only problem is that then IndexReader would not be correctly 
> subclassable from outside the package.
> 
> Alternately, segmentInfos could be made private, will all access from 
> IndexReader only.  Can you see a way to get things to work correctly 
> this way?

Making segmentInfos field of IndexReader package private would be
easy to do. I think FilterIndexReader would still work if we enforce
that all subclasses of FilterIndexReader (even from outside the package)
call super(IndexReader) in their constructors. Note that I call
super.segmentInfos = in.segmentInfos in FilterIndexReader(IndexReader).
Thus all sysnchronization is done by FilterIndexReader superclass
IndexReader using in.segmentInfos.

I don't see a problem for subclassing IndexReader directly either,
even from outside the package. With the segmentInfos field made package
private such classes simply would not see segmentInfos. It would be null
and consequently would not be used by delete(int). Of course such
subclasses would have to take care of versioning themselves. I am not
sure what kind of subclasses of IndexReader you have in mind. With the
segmentInfos field made package private I can imagine very general
subclasses e.g. implemented by a totally different full-text index.
However, IndexReader and all its subclasses need a directory and
synchronization (write.lock) is done by IndexReader in the final
methods delete(int) and close(). However that restriction is an old
one.

This is a very complex subject. Maybe I am missing the point.

Christoph

Ps: Patch with segmentInfos field made package private is attached.

Re: File timestamps / subclassing of IndexReader

Posted by Doug Cutting <cu...@lucene.com>.
Christoph Goller wrote:
> Sorry for the delay. I was busy with other things and it took some
> time to fix the problems that Doug mentioned.

No problem.  Thanks so much for working on this patch.  It looks really 
good, and solves a real problem.  We can't demand that volunteers work 
on a schedule!

My only concerns now are aesthetic.  You add segmentInfos as a protected 
field of IndexReader, which is a public class.  This means it will 
appear in the javadoc.  Yet this field is of type SegmentInfos, a 
package-private class, which will not be in the javadoc.

I try hard to make the javadoc only show things which are meant to be a 
part of the public API.  So the question is, should SegmentInfos be a 
public class, i.e., do we think that folks might need to reference it in 
their application code, or should it remain package private?

My rule is that, if you cannot imagine a reasonable use for something, 
or if there's an already another public way of doing it, don't make it 
public.  Keeping more of the implementation private makes it easier to 
compatibly evolve the implementation.  If they really want, folks can 
change things in their private copy of the sources.  What we want to 
avoid is incompatible changes for folks who only use public APIs.  Over 
the long term, a smaller public API is easier to maintain.

My preference is that SegmentInfos remain package private.  The easiest 
way to do this would just be to make the segmentInfos field of 
IndexReader package private (i.e. with no public/private declaration). 
The only problem is that then IndexReader would not be correctly 
subclassable from outside the package.

Alternately, segmentInfos could be made private, will all access from 
IndexReader only.  Can you see a way to get things to work correctly 
this way?

Finally, you add a new constructor to SegmentsReader but the old 
constructor is no longer used (except by the new constructor).  It would 
be better to simply replace the existing constructor with a new 
constructor.  Less code is better.

If you are tired of working on this, and/or have no more time, please 
say so.  If you are unable, I will try to address the above issues myself.

Thanks again for all your contributions,

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: File timestamps / subclassing of IndexReader

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello Christoph,

As far as I can tell, this patch looks good.
I have a question, though.  You changed the constructor to
SegmentReader to take both SegmentInfos and a SegmentInfo.
Is there a need to pass both?
Also, in SegmentReader you do:

-            directory().touchFile("segments");
+            if(segmentInfos != null)
+              segmentInfos.write(directory());

Can segmentInfos ever be == null at this point?

I don't have the source code here to check what SegmentInfos'
write(...) method does, so I can't tell why it was added here.

I also spotted a few instances of a typo in the Javadoc: "segmets"
instead of "segments".

Looks good, though.  Oh, I don't know enough about FilterIndexReader,
so I can't make a call about that segmentInfos, but I Doug's comment
about the 'protected' access modifier sounds valid to me.

Otis


--- Christoph Goller <go...@detego-software.de> wrote:
> Sorry for the delay. I was busy with other things and it took some
> time to fix the problems that Doug mentioned.
> 
> Furthermore, thanks for the links concerning dublicate checking.
> 
> I would like to commit my timestamp patch in the next few days.
> So my question goes to  the mailing list whether there are any
> objections.
> 
> Doug Cutting schrieb:
> > I like this approach, but I have two concerns:
> > 
> > 1. For back-compatibility, the version number should be written at
> the 
> > end of the "segments" file, not at its start.  This may make things
> a 
> > big slower, but we should not incompatibly change the file format: 
> > existing indexes should still open correctly.
> 
> done.
> I also made some tests. Reading and old index and changing it works
> now.
> 
> > 
> > 2. There should be public IndexReader.getVersion methods,
> correspodning 
> > to the existing IndexReader.getLastModified methods.  The
> lastModified 
> > methods should be deprecated, with a comment indicating that the 
> > getVersion methods should be used instead.
> > 
> 
> done
> 
> please have a look at the latest version of this patch in this mail
> (again to be applied to index directory).
> 
> Some thoughts about FilterIndexReader:
> I noticed that all synchronization for a FilterIndexReader is done
> by the FilterIndexReader itself, more precisely by its superclass
> IndexReader and its methods delete and close. The synchronization
> is not done by FilterIndexReader's member in. Since I introduced
> a SegmentInfos member in IndexReader for synchronization purposes
> it is needed there also for the superclass of a FilterIndexReader.
> I changed the constructor of FilterIndexReader to achieve that.
> 
> 
> Christoph
> 
> 
> > Index: FilterIndexReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/FilterIndexReader.java,v
> retrieving revision 1.3
> diff -u -r1.3 FilterIndexReader.java
> --- FilterIndexReader.java	6 Nov 2003 15:30:57 -0000	1.3
> +++ FilterIndexReader.java	19 Nov 2003 14:23:36 -0000
> @@ -114,6 +114,7 @@
>  
>    public FilterIndexReader(IndexReader in) {
>      super(in.directory());
> +    segmentInfos = in.segmentInfos;
>      this.in = in;
>    }
>  
> Index: IndexReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/IndexReader.java,v
> retrieving revision 1.22
> diff -u -r1.22 IndexReader.java
> --- IndexReader.java	21 Oct 2003 18:24:23 -0000	1.22
> +++ IndexReader.java	19 Nov 2003 14:23:36 -0000
> @@ -83,14 +83,14 @@
>  public abstract class IndexReader {
>    protected IndexReader(Directory directory) {
>      this.directory = directory;
> -    segmentInfosAge = Long.MAX_VALUE;
> +    stale = false;
> +    segmentInfos = null;
>    }
>  
>    private Directory directory;
>    private Lock writeLock;
> -
> -  //used to determine whether index has chaged since reader was
> opened
> -  private long segmentInfosAge;
> +  protected SegmentInfos segmentInfos;
> +  private boolean stale;
>    
>    /** Returns an IndexReader reading the index in an FSDirectory in
> the named
>    path. */
> @@ -111,21 +111,16 @@
>            directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
>            IndexWriter.COMMIT_LOCK_TIMEOUT) {
>            public Object doBody() throws IOException {
> -            IndexReader result = null;
> -            
>              SegmentInfos infos = new SegmentInfos();
>              infos.read(directory);
>              if (infos.size() == 1) {		  // index is optimized
> -                result = new SegmentReader(infos.info(0), true);
> +              return new SegmentReader(infos, infos.info(0), true);
>              } else {
>                  SegmentReader[] readers = new
> SegmentReader[infos.size()];
>                  for (int i = 0; i < infos.size(); i++)
> -                  readers[i] = new SegmentReader(infos.info(i),
> i==infos.size()-1);
> -                result =  new SegmentsReader(directory, readers);
> +                  readers[i] = new SegmentReader(infos,
> infos.info(i), i==infos.size()-1);
> +                return new SegmentsReader(infos, directory,
> readers);
>              }
> -        
> -            result.segmentInfosAge = lastModified(directory);
> -            return result;
>            }
>          }.run();
>      }
> @@ -134,20 +129,89 @@
>    /** Returns the directory this index resides in. */
>    public Directory directory() { return directory; }
>  
> -  /** Returns the time the index in the named directory was last
> modified. */
> +  /** 
> +   * Returns the time the index in the named directory was last
> modified. 
> +   * 
> +   * <p>Synchronization of IndexReader and IndexWriter instances is 
> +   * no longer done via time stamps of the segmets file since the
> time resolution 
> +   * depends on the hardware platform. Instead, a version number is
> maintained
> +   * within the segments file, which is incremented everytime when
> the index is
> +   * changed.</p>
> +   * 
> +   * @deprecated  Replaced by {@link #getCurrentVersion(String)}
> +   * */
>    public static long lastModified(String directory) throws
> IOException {
>      return lastModified(new File(directory));
>    }
>  
> -  /** Returns the time the index in the named directory was last
> modified. */
> +  /** 
> +   * Returns the time the index in the named directory was last
> modified. 
> +   * 
> +   * <p>Synchronization of IndexReader and IndexWriter instances is 
> +   * no longer done via time stamps of the segmets file since the
> time resolution 
> +   * depends on the hardware platform. Instead, a version number is
> maintained
> +   * within the segments file, which is incremented everytime when
> the index is
> +   * changed.</p>
> +   * 
> +   * @deprecated  Replaced by {@link #getCurrentVersion(File)}
> +   * */
>    public static long lastModified(File directory) throws IOException
> {
>      return FSDirectory.fileModified(directory, "segments");
>    }
>  
> -  /** Returns the time the index in this directory was last
> modified. */
> +  /** 
> +   * Returns the time the index in the named directory was last
> modified. 
> +   * 
> +   * <p>Synchronization of IndexReader and IndexWriter instances is 
> +   * no longer done via time stamps of the segmets file since the
> time resolution 
> +   * depends on the hardware platform. Instead, a version number is
> maintained
> +   * within the segments file, which is incremented everytime when
> the index is
> +   * changed.</p>
> +   * 
> +   * @deprecated  Replaced by {@link #getCurrentVersion(Directory)}
> +   * */
>    public static long lastModified(Directory directory) throws
> IOException {
>      return directory.fileModified("segments");
>    }
> +  
> +  /**
> +   * Reads version number from segments files. The version number
> counts the
> +   * number of changes of the index.
> +   * 
> +   * @param directory where the index resides.
> +   * @return version number.
> +   * @throws IOException if segments file cannot be read
> +   */
> +  public static long getCurrentVersion(String directory) throws
> IOException {
> +    return getCurrentVersion(new File(directory));
> +  }
> +  
> +  /**
> +   * Reads version number from segments files. The version number
> counts the
> +   * number of changes of the index.
> +   * 
> +   * @param directory where the index resides.
> +   * @return version number.
> +   * @throws IOException if segments file cannot be read
> +   */
> +  public static long getCurrentVersion(File directory) throws
> IOException {
> +    Directory dir = FSDirectory.getDirectory(directory, false);
> +    long version = getCurrentVersion(dir);
> +    dir.close();
> +    return version;
> +  }
> +  
> +  /**
> +   * Reads version number from segments files. The version number
> counts the
> +   * number of changes of the index.
> +   * 
> +   * @param directory where the index resides.
> +   * @return version number.
> +   * @throws IOException if segments file cannot be read.
> +   */
> +  public static long getCurrentVersion(Directory directory) throws
> IOException {
> +    return SegmentInfos.readCurrentVersion(directory);
> +  }
>  
>    /**
>     * Returns <code>true</code> if an index exists at the specified
> directory.
> @@ -274,6 +338,9 @@
>      this will be corrected eventually as the index is further
> modified.
>    */
>    public final synchronized void delete(int docNum) throws
> IOException {
> +    if(stale)
> +      throw new IOException("IndexReader out of date and no longer
> valid for deletion");
> +      
>      if (writeLock == null) {
>        Lock writeLock =
> directory.makeLock(IndexWriter.WRITE_LOCK_NAME);
>        if (!writeLock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT)) //
> obtain write lock
> @@ -282,11 +349,11 @@
>  
>        // we have to check whether index has changed since this
> reader was opened.
>        // if so, this reader is no longer valid for deletion
> -      if(lastModified(directory) > segmentInfosAge){
> +      if(segmentInfos != null  &&
> SegmentInfos.readCurrentVersion(directory) >
> segmentInfos.getVersion()){
> +          stale = true;
>            this.writeLock.release();
>            this.writeLock = null;
> -          throw new IOException(
> -            "IndexReader out of date and no longer valid for
> deletion");
> +          throw new IOException("IndexReader out of date and no
> longer valid for deletion");
>        }
>      }
>      doDelete(docNum);
> Index: SegmentInfos.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentInfos.java,v
> retrieving revision 1.2
> diff -u -r1.2 SegmentInfos.java
> --- SegmentInfos.java	18 Nov 2003 11:35:57 -0000	1.2
> +++ SegmentInfos.java	19 Nov 2003 14:23:36 -0000
> @@ -61,22 +61,28 @@
>  import org.apache.lucene.store.OutputStream;
>  
>  final class SegmentInfos extends Vector {
> -  public int counter = 0;			  // used to name new segments
> +  public int counter = 0;    // used to name new segments
> +  private long version = 0; //counts how often the index has been
> changed by adding or deleting docs
>  
>    public final SegmentInfo info(int i) {
> -    return (SegmentInfo)elementAt(i);
> +    return (SegmentInfo) elementAt(i);
>    }
>  
>    public final void read(Directory directory) throws IOException {
>      InputStream input = directory.openFile("segments");
>      try {
> -      counter = input.readInt();		          // read counter
> +      counter = input.readInt(); // read counter
>        for (int i = input.readInt(); i > 0; i--) { // read
> segmentInfos
> -        SegmentInfo si = new SegmentInfo(input.readString(),
> input.readInt(),
> -          directory);
> +        SegmentInfo si =
> +          new SegmentInfo(input.readString(), input.readInt(),
> directory);
>          addElement(si);
>        }
> -    } finally {
> +      if (input.getFilePointer() >= input.length())
> +        version = 0; // old file format without version number
> +      else
> +        version = input.readLong(); // read version
> +    }
> +    finally {
>        input.close();
>      }
>    }
> @@ -84,18 +90,41 @@
>    public final void write(Directory directory) throws IOException {
>      OutputStream output = directory.createFile("segments.new");
>      try {
> -      output.writeInt(counter);			  // write counter
> -      output.writeInt(size());			  // write infos
> +      output.writeInt(counter); // write counter
> +      output.writeInt(size()); // write infos
>        for (int i = 0; i < size(); i++) {
>          SegmentInfo si = info(i);
>          output.writeString(si.name);
>          output.writeInt(si.docCount);
>        }
> -    } finally {
> +      output.writeLong(++version); // every write changes the index 
>        
> +    }
> +    finally {
>        output.close();
>      }
>  
>      // install new segment info
>      directory.renameFile("segments.new", "segments");
> +  }
> +
> +  /**
> +   * version number when this SegmentInfos was generated.
> +   */
> +  public long getVersion() {
> +    return version;
> +  }
> +
> +  /**
> +   * Current version number from segments file.
> +   */
> +  public static long readCurrentVersion(Directory directory)
> +    throws IOException {
> +
> +    // We cannot be sure whether the segments file is in the old
> format or the new one.
> +    // Therefore we have to read the whole file and cannot simple
> seek to the version entry.
> +
> +    SegmentInfos sis = new SegmentInfos();
> +    sis.read(directory);
> +    return sis.getVersion();
>    }
>  }
> Index: SegmentReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentReader.java,v
> retrieving revision 1.15
> diff -u -r1.15 SegmentReader.java
> --- SegmentReader.java	21 Oct 2003 18:24:23 -0000	1.15
> +++ SegmentReader.java	19 Nov 2003 14:23:37 -0000
> @@ -98,10 +98,11 @@
>    }
>    private Hashtable norms = new Hashtable();
>  
> -  SegmentReader(SegmentInfo si, boolean closeDir)
> +  SegmentReader(SegmentInfos sis, SegmentInfo si, boolean closeDir)
>      throws IOException {
>      this(si);
>      closeDirectory = closeDir;
> +    segmentInfos = sis;
>    }
>  
>    SegmentReader(SegmentInfo si)
> @@ -141,7 +142,10 @@
>            public Object doBody() throws IOException {
>              deletedDocs.write(directory(), segment + ".tmp");
>              directory().renameFile(segment + ".tmp", segment +
> ".del");
> -            directory().touchFile("segments");
> +            if(segmentInfos != null)
> +              segmentInfos.write(directory());
> +            else
> +              directory().touchFile("segments");
>              return null;
>            }
>          }.run();
> Index: SegmentsReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentsReader.java,v
> retrieving revision 1.15
> diff -u -r1.15 SegmentsReader.java
> --- SegmentsReader.java	31 Oct 2003 09:46:54 -0000	1.15
> +++ SegmentsReader.java	19 Nov 2003 14:23:37 -0000
> @@ -78,6 +78,11 @@
>    private int numDocs = -1;
>    private boolean hasDeletions = false;
>  
> +  SegmentsReader(SegmentInfos sis, Directory directory,
> SegmentReader[] r) throws IOException {
> +      this(directory, r);
> +      segmentInfos = sis;
> +  }
> +  
>    SegmentsReader(Directory directory, SegmentReader[] r) throws
> IOException {
>      super(directory);
>      readers = r;
> 
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org