You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2009/01/13 22:51:55 UTC
Re: Pluggable IndexReader (was 2.9/3.0 plan & Java 1.5)

On Mon, Dec 15, 2008 at 07:04:08AM -0500, Michael McCandless wrote:

> These are good points: it may be exposing too much if we fully expose
> SegmentReader now, since some components (deletion tombstones) may
> want to skip that API and operate directly on lower level files.

After thinking things over, I no longer worry about this seeming
contradiction.  Even if the tombstones deletions reader,  the stored fields
reader, or some other component is reading files which were not written all in
one batch as part of the original collection of segment files, they still
relate to the same *logical* segment.  

We wouldn't ever limit the set of files which a SegmentReader is allowed to
read from to the original segment files.  Defining the collection of valid
files for a given point-in-time view of the index is the role of the Snapshot
in KS and the segments_NNN file in Lucene.  It's up to the SegmentReader to
determine which files within the snapshot it should read from.

> >So, how about an IndexArchitecture or IndexPlan class?
> >
> > class MyArchitecture extends IndexArchitecture {
> >   public PostingsWriter PostingsWriter() {
> >     return new PForDeltaPostingsWriter();
> >   }
> >   public PostingsReader PostingsReader() {
> >     return new PForDeltaPostingsReader();
> >   }
> >   public DeletionsWriter DeletionsWriter() {
> >     return new TombstoneWriter();
> >   }
> >   public DeletionsReader DeletionsReader() {
> >     return new TombstoneReader();
> >   }
> > }

> > class MySchema extends Schema {
> >   public MySchema() {
> >     initField("title", "text");
> >     initField("content", "text");
> >   }
> >   public IndexArchitecture indexArchitecture() {
> >     return new MyArchitecture();
> >   }
> >   public Analyzer analyzer() {
> >     return new PolyAnalyzer("en");
> >   }
> > }
> >
> > IndexWriter writer = new IndexWriter(MySchema.open("/path/to/ 
> >index"));
> 
> I think this is a reasonable approach.  I might name it IndexCodec(s)
> though, and I agree conceptually it's orthogonal to a "schema".

FWIW, I've gone forward with "Architecture".

>>> Decouple rollback, commit, IndexDeletionPolicy from  DirectoryIndexReader
>>> into a class like SegmentsVersionSystem which could act as the  controller
>>> for reopen types of methods.  There could be a SegmentVersionSystem  that
>>> manages the versioning of a single segment.
>>
>> I like it. :)
>>
>> Sometimes you want to change up the merge policy for different  writers
>> against the same index.  How does that fit into your plan?
>>
>> My thought is that merge-policies would be application-specific  
>> rather than index-specific.
> 
> This one I'm a little hazy on.  It would be nice to have a single
> source for IndexWriter & IndexReader-acting-as-writer to share this
> logic, but then we are [very, very slowly] migrating towards
> IndexWriter being the only thing that writes to an index so it seems
> like eventually it's OK if this logic is managed via the IndexWriter.

I'm thinking of calling this one "UpdatePolicy".  It would collect together
MergePolicy, DeletionsPolicy, LockFactory, etc -- all the app-specific
behaviors related to interacting with existing data and files.

A Schema.makeUpdatePolicy() factory method can serve as the single, shared
source for this logic.  However, the IndexWriter and IndexReader constructors
would allow the default UpdatePolicy to be overridden with an argument.

We end up with the following hierarchy:

  * Architecture: Stuff that never changes for the life of the index.
    Defining an Achitecture subclass is roughly analogous to choosing a
    storage engine in MySQL (MyISAM vs. InnoDB, etc).
  * Schema: Roughly analogous to an SQL table definition.
  * UpdatePolicy: Stuff that can change up per-index-session.

Of those three classes, the only one that most users would encounter would be
Schema.  Architecture and UpdatePolicy would isolate power-user functionality,
making it easier to grok and master basic indexing technique.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org