You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by karl wettin <ka...@gmail.com> on 2007/07/26 21:56:27 UTC
Last attempt
Some time ago I tried to introduce LUCENE-581, a new consumer top
layer, the core changes required by LUCENE-550, my InstantaitedIndex.
I would still like to see this a part of the core. It is completely
backwards compatible but contains a few small changes that seems to
be convtroversial, and I'm honestly not sure why:
* Complete definalization of Term, Document and IndexReader.
* IndexWriterInterface
In my eyes, the only thing these things do are to limit Lucene
development to the file-centric Directory store. There is nothing
wrong with Dicretory, I just want to be able to use the same code for
any store design of my chooise. I want unison index handling, no
matter the implementation. One line of code that switch between
Directory, BDB, MemoryIndex, InstantiatedIndex or what not.
This post is about InstantiatedIndex and the things I built upon it.
As time it passed I just gave up on keeping them up to date. It is in
use at this one place where it is just spinning on with no need to
update, stuck to Lucene 2.0 or so. We are now getting close to Lucene
3.0 and I would hate to see this code get lost in time.
It has so many neat features. Beeing really really fast on small
corpuses is just one.
In essense the design is similar to contrib/MemoryIndex, but it can
hold multiple documents.
The definalization and interface also allows for index insert/delete/
optimization notifications.
These two features combined yeilded in an active cache (not really
used in any project, just a proof-of-concept I experimented with on a
site where a lot of users place the exact same query) that update
cached results only when affected by new data. Could be done with
MemoryIndex too, but not as fast as InstantiatedIndex can handle
batches of documents.
One can however do alot of other things with it.
In LUCENE-626 I also use InstantiatedIndex, getting some 10-20 times
faster response times from my contrib/spellcheck augmentation than
when using a RAMDirectory.
There are more features and potentially cool things one might want to
consider in the 550-patch/UML diagram.
Would the changes to the core InstantiatedIndex require ever be
committed? Then I could sit down and bring these patches up to date.
Otherwise I'll just let them become some depricated artifact I use
for a couple of things such as spellchecking, rather than a neat
augmentation of Lucene I could use for any future development.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Last attempt
Posted by karl wettin <ka...@gmail.com>.
27 jul 2007 kl. 02.18 skrev Grant Ingersoll:
> or maybe there is a way to separate out the interface changes from
> the InstantiatedIndex stuff?
One thing I came to think of is to use the IndexReader/IndexWriter
"pipe" available in InstantiatedIndex. I.e. create a Directory, add
documents via IndexWriter, and then pass a new IndexReader to
InstantiatedIndex for merge. Even though this will slow things down
when adding things to an index, I think this is an acceptable solution.
That would only remove the IndexWriterInterface though. It would
still require the definalization of Document, Term and IndexReader.
And by removing IndexWriterInterface one somewhat cripples the
NotifiableIndex.
All in all I don't think it would count as gaining something.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Last attempt
Posted by karl wettin <ka...@gmail.com>.
27 jul 2007 kl. 02.18 skrev Grant Ingersoll:
> I think one thing I wonder about is if there is a way it could be a
> standalone contrib package or maybe there is a way to separate out
> the interface changes from the InstantiatedIndex stuff? That way
> you could lobby for InstIndex as a contrib, and then a separate
> patch for the API changes.
That is sort of what I tried with 581. I want to point out that one
should no longer be looking at that patch. Since it was set no-fix
and merged back with 550 I have replaced the generalization with
aggregations (i.e. Directory does not extend index, there is a new
class, DirectoryIndex, that points at an instance of Directory) and
some other things in order to minimize the impact on core. It is
visible in the UML diagram.
I'd be more than happy to spend the week required to bring 550 up to
speed with the trunk, clean it up and split it up in multiple
patches, but only if I knew that the core changes would be accepted.
Something like this:
* Core changes (complete definalization of Term, Document and
IndexReader + IndexWriterInterface)
* Index (factory class for reader and writer for unison index handling)
* InstantiatedIndex (extends Index)
* NotifiableIndex (decoration layer on top of index)
* Active results cache and the other stuff that is just ideas on top
of the two prior items.
> By the looks of the issue, you had a lot of comments and good
> input, do you feel all the issues have all been addressed? Just
> asking...
I do. At least I did the last time I was looking at the code, 6
months ago or so.
>
> Also, does Mike M's changes affect how you would do these things?
Not sure what you are refering to. 550 is however fairly isolated
from the rest of Lucene.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Last attempt
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Karl,
I have seen this and have always thought I should spend some time on
it, but then didn't get to it. That isn't to say it isn't useful. I
think one thing I wonder about is if there is a way it could be a
standalone contrib package or maybe there is a way to separate out
the interface changes from the InstantiatedIndex stuff? That way you
could lobby for InstIndex as a contrib, and then a separate patch for
the API changes. And please feel free to tell me they can't, I am
just wondering out loud here trying to find a path to take so it
isn't lost.
I think there are some reasons Document is final, although I am not
sure they can't be handled through a buyer beware issue. If you
search the archives for Document and final I think you will see the
arguments. There is also an issue in JIRA related to it (https://
issues.apache.org/jira/browse/LUCENE-778) so you are not the only one
asking for it (I see you commented on that one)
By the looks of the issue, you had a lot of comments and good input,
do you feel all the issues have all been addressed? Just asking...
Also, does Mike M's changes affect how you would do these things?
Mostly just me trying to figure out this patch. I, too, would hate
to see it whither, but I can't make any promises on time, either. By
the way, the Flexible Indexing stuff from Nicolas, et. al is in this
same boat in my mind. Would love to have 'em in Lucene, but don't
have the cycles to do it. Sigh.
-Grant
On Jul 26, 2007, at 3:56 PM, karl wettin wrote:
> Some time ago I tried to introduce LUCENE-581, a new consumer top
> layer, the core changes required by LUCENE-550, my
> InstantaitedIndex. I would still like to see this a part of the
> core. It is completely backwards compatible but contains a few
> small changes that seems to be convtroversial, and I'm honestly not
> sure why:
>
> * Complete definalization of Term, Document and IndexReader.
> * IndexWriterInterface
>
> In my eyes, the only thing these things do are to limit Lucene
> development to the file-centric Directory store. There is nothing
> wrong with Dicretory, I just want to be able to use the same code
> for any store design of my chooise. I want unison index handling,
> no matter the implementation. One line of code that switch between
> Directory, BDB, MemoryIndex, InstantiatedIndex or what not.
>
> This post is about InstantiatedIndex and the things I built upon
> it. As time it passed I just gave up on keeping them up to date. It
> is in use at this one place where it is just spinning on with no
> need to update, stuck to Lucene 2.0 or so. We are now getting close
> to Lucene 3.0 and I would hate to see this code get lost in time.
>
> It has so many neat features. Beeing really really fast on small
> corpuses is just one.
>
> In essense the design is similar to contrib/MemoryIndex, but it can
> hold multiple documents.
>
> The definalization and interface also allows for index insert/
> delete/optimization notifications.
>
> These two features combined yeilded in an active cache (not really
> used in any project, just a proof-of-concept I experimented with on
> a site where a lot of users place the exact same query) that update
> cached results only when affected by new data. Could be done with
> MemoryIndex too, but not as fast as InstantiatedIndex can handle
> batches of documents.
>
> One can however do alot of other things with it.
>
> In LUCENE-626 I also use InstantiatedIndex, getting some 10-20
> times faster response times from my contrib/spellcheck augmentation
> than when using a RAMDirectory.
>
> There are more features and potentially cool things one might want
> to consider in the 550-patch/UML diagram.
>
>
> Would the changes to the core InstantiatedIndex require ever be
> committed? Then I could sit down and bring these patches up to
> date. Otherwise I'll just let them become some depricated artifact
> I use for a couple of things such as spellchecking, rather than a
> neat augmentation of Lucene I could use for any future development.
>
>
> --
> karl
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp
Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org