You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nicolas Lalevée (JIRA)" <ji...@apache.org> on 2006/09/24 14:32:24 UTC

[jira] Updated: (LUCENE-662) Extendable writer and reader of field data

     [ http://issues.apache.org/jira/browse/LUCENE-662?page=all ]

Nicolas Lalevée updated LUCENE-662:
-----------------------------------

    Attachment: generic-fieldIO-2.patch

I think I got it. What was disturbing on the last patch was the notion of FieldData I added. So I removed it. So let's summerize the diff between the trunk and my patch :

* The concepts :
** an IndexFormat defines which FieldsWriter and FieldsReader to use
** an IndexFormat defines the used extensions, so the user can add it's own files
** the format of an index is attached to the Directory
** the whole index format isn't customizable, just a part of them. So some functions are private or "default", so the Lucene user won't have acess to them : it's Lucene internal stuff. Some others are public or protected : they can be redefined.
** Lucene now provide an API to add some files which are tables of data, as the FieldInfos is
** it is to the FieldsWriter implementation to check if the field to write is of the same format (basically checking by a instanceof).
** the user can add some information at the document level, and provide it's own implementation of Document
** the user can define how data for a field is stored and retreived, and provide it's own implementation of Fieldable
** the reading of field data is done in the Fieldable
** the writting of the field is done in the FieldsWriter

* API change :
** There are new constructors of the directory : contructors with specified IndexFormat
** new Entry and EntryTable : generic API for managing a table of data in a file
** FieldInfos extends now EntryTable

* Code changes :
** AbstractField become Fieldable (Fieldable is no more an interface).
** the FieldsWriter have been separated in the abstract class FieldsWriter and its default implementation DefaultFieldsWriter. Idem for FieldsReader and DefaultFieldsReader.
** the lazy loading have been moved from FieldsReader to Fieldable
** IndexOuput can now write directly from an input stream
** If a field was loaded lazily, the DefaultFieldsWriter directly copy the source input stream to the output stream
** the IndexFileNameFilter take now it's list of known file extensions from the index format
** each time a temporary RAM directory is created, the index format have to be passed : see diff for CompoundFileReader or IndexWriter
** Some private and/or final have been moved to public

* Last worries :
** quite a big one in fact, but I don't know how to handle it : every RMI tests fails because of :
{noformat}
error unmarshalling return; nested exception is:
    [junit]     java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor
    [junit] java.rmi.UnmarshalException: error unmarshalling return; nested exception is:
    [junit]     java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor
    [junit]     at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:157)
{noformat}
** a function is public and it shouldn't : see Fieldable.setLazyData()

I have added an exemple of implementation in the patch that use this future : look at org.apache.lucene.index.rdf

I know this is a big patch but I think the API has not been broken, and I would appreciate comments on this.

> Extendable writer and reader of field data
> ------------------------------------------
>
>                 Key: LUCENE-662
>                 URL: http://issues.apache.org/jira/browse/LUCENE-662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Nicolas Lalevée
>            Priority: Minor
>         Attachments: generic-fieldIO-2.patch, generic-fieldIO.patch
>
>
> As discussed on the dev mailing list, I have modified Lucene to allow to define how the data of a field is writen and read in the index.
> Basically, I have introduced the notion of IndexFormat. It is in fact a factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter and the SegmentMerger are using this factory and not doing a "new FieldsReader/Writer()".
> I have also introduced the notion of FieldData. It handles every data of a field, and also the writing and the reading in a stream. I have done this way because in the current design of Lucene, Fiedable is an interface, so methods with a protected or package visibility cannot be defined.
> A FieldsWriter just writes data into a stream via the FieldData of the field.
> A FieldsReader instanciates a FieldData depending on the field name. Then it use the field data to read the stream. And finnaly it instanciates a Field with the field data.
> About compatibility, I think it is kept, as I have writen a DefaultIndexFormat that provides some DefaultFieldsWriter and DefaultFieldsReader. These implementations do the exact job that is done today.
> To acheive this modification, some classes and methods had to be moved from private and/or final to public or protected.
> About the lazy fields, I have implemented them in a more general way in the implementation of the abstract class FieldData, so it will be totally transparent for the Lucene user that will extends FieldData. The stream is kept in the fieldData and used as soon as the stringValue (or something else) is called. Implementing this way allowed me to handle the recently introduced LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on this lazy field data, the saved input stream is directly copied in the output stream.
> I have a last issue with this patch. The current design allow to read an index in an old format, and just do a writer.addIndexes() into a new format. With the new design, you cannot, because the writer will use the FieldData.write provided by the reader.
> enjoy !

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org