You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Earwin Burrfoot <ea...@gmail.com> on 2009/04/13 01:34:38 UTC

IndexReader plugins

To support my dream of kicking fieldCache out of the core and to add
some extensibility to Lucene, I want to introduce IndexReaderPlugins.
Rough pseudocode follows:

interface IndexReaderPlugin {
	void attach(SegmentReader reader);
	void detach(SegmentReader reader);

	void attach(MultiSegmentReader reader);
	void detach(MultiSegmentReader reader);
}

IndexReader.java:
private Map<Class, IndexReaderPlugin> plugins;

on opening/closing toplevel/segment reader we iterate over plugins:
for(IndexReaderPlugin plugin : plugins)
    plugin.attach(reader);

the map is passed to toplevel reader initially, and then shared with
lowlevel readers, we can also retrieve plugins:
public <T> T plugin(Class<T> pluginType);

then we can do something like:
indexReader.plugin(ValueSource.class).doSomething // lucene code
indexReader.plugin(FieldsCache.class).forField(LAST_UPDATE_TIME).doSomething
// my code
filter.apply(indexReader.plugin(FilterCache.class)) // my code

Benefits are numerous. We get rid of alien code like:
+++ src/java/org/apache/lucene/index/SegmentReader.java	(working copy)
@@ -83,6 +86,8 @@
+  protected ValueSource valueSource;
+
@@ -555,6 +560,8 @@
+
+      valueSource = new CachingValueSource(this, new
UninversionValueSource(this));

If I don't need ValueSource attached to my readers, I won't have it.
If I need my custom caches attached to my readers, I can do it in a
natural way instead of hacking around MergeScheduler, or comparing
subreader lists.
If I want, I can replace Lucene's native ValueSource with my own
implementation, and all Lucene classes that use it will happily accept
it.

On second thought, we shouldn't share plugin map across subreaders. If
we allow attach(SegmentReader reader) to return an instance of plugin
(plugin decides if it is the same instance always, or per-reader), and
populate the map for subreader with results of attach invoked on
toplevel reader map, we'll turn this code:
segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod(segmentReader);
into:
segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod();
which makes more sense

Any way the general idea is still the same.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
> Michael McCandless wrote:
>   
>> I gave the example to show the init vs inflight distinction, because
>> inflight makes me nervous.
>>     
>
> I'm thinking of some (bad name follows) PluginBundle, that has
> add/remove/inspect methods and constructor/method for filling it with
> default Lucene components.
> Then instead of component map you pass this bundle to IndexReader ctor.
>
> You gain typesafe method signatures I have a soft spot for, and don't
> have to allow inflight reconfiguration for it. The only remaining
> problem of 'init' way - static factory explosion, well, the more - the
> merrier :)
>
>   
+1 Earwin. That sounds great. Static factory explosion is an ugly 
annoyance, but overall a better trade I think - inflight is complicated 
or brittle.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

Michael McCandless wrote:
> I gave the example to show the init vs inflight distinction, because
> inflight makes me nervous.

I'm thinking of some (bad name follows) PluginBundle, that has
add/remove/inspect methods and constructor/method for filling it with
default Lucene components.
Then instead of component map you pass this bundle to IndexReader ctor.

You gain typesafe method signatures I have a soft spot for, and don't
have to allow inflight reconfiguration for it. The only remaining
problem of 'init' way - static factory explosion, well, the more - the
merrier :)

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Apr 14, 2009 at 2:22 PM, Earwin Burrfoot <ea...@gmail.com> wrote:

> What destroys the whole reason is the factory Michael offered - it
> hardcodes component types that you can possibly bind to the reader.

I had been focused entirely on componentizing Lucene, internally.  I
agree that approach is a non-starter if we want to include arbitrary
(non-core) outside components (which would be great).

I gave the example to show the init vs inflight distinction, because
inflight makes me nervous.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

>> LuceneGuice :)  (Thats completely a throwaway, so don't draw any conclusions
>> from it :))
>>     
> I'm absolutely not going to build some uberframework. Something
> simple. A hashmap. And a pair of helper methods. Maybe one class. And
> a bunch of interfaces.
>
>   
You concluded anyway! I forgive you though.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
>
> A hashmap. And a pair of helper methods. Maybe one class. And
> a bunch of interfaces.
>
>   
Cool - def don't be afraid to start small. If a prototype hits 
reasonably soon, I would love to work it into 831. I won't get back to 
it for a while, but when I do (unless someone else jumps on it), I think 
I've got to get a ValueSource back into the segment reader, and it would 
be nicer to jump on this than ram into the static factories - that will 
just create more of them (deprecated or not) when/if this lands.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

On Wed, Apr 15, 2009 at 00:55, Mark Miller <ma...@gmail.com> wrote:
> Earwin Burrfoot wrote:
>>
>> On Wed, Apr 15, 2009 at 00:15, Mark Miller <ma...@gmail.com> wrote:
>>
>>>
>>> Mark Miller wrote:
>>>
>>>>
>>>> Earwin Burrfoot wrote:
>>>>
>>>>>
>>>>> Mark Miller wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> The distinction I am making with core is that we will have to call
>>>>>> known
>>>>>> methods on those
>>>>>> core 'modules' that are not very generic? Doesn't that keep it from
>>>>>> playing
>>>>>> nice with the very generic 'attach this to this segment'?
>>>>>>
>>>>>>
>>>>>
>>>>> Genericity spans binding, notifications and retrieval of a component
>>>>> from a reader.
>>>>>
>>>>> For binding and notifications you can do whatever you want inside your
>>>>> component, because you're provided with the full public IndexReader
>>>>> interface.
>>>>> When you retrieve a component, you get your very own interface with
>>>>> all your needed and totally nongeneric methods.
>>>>>
>>>>> ObscureComponent c = reader.plugin(ObscureComponent.class);
>>>>> c.performSomeObscureNonGenericLogic();
>>>>>
>>>>> This very code can be located both in userland, dealing with custom
>>>>> components and in coreland, dealing with well-known Lucene core
>>>>> components.
>>>>>
>>>>>
>>>>>
>>>>
>>>> Thats sounding interesting. From what you and Mike are saying, its a
>>>> dependency injection framework basically then? Impls are plugged in from
>>>> a
>>>> configuration class? Except you can plug in your own random stuff thats
>>>> not
>>>> used internally, so its also kind of an Impl cache?
>>>>
>>>>
>>>> So IndexReader Plugins would be: per Reader impl configuration and
>>>> caching
>>>> - with call backs too though - I guess you could add other callbacks and
>>>> get
>>>> involved in some more interesting stuff as well, if you wanted?
>>>>
>>>> It keeps growing on me anyway.
>>>>
>>>>
>>>
>>> Or its per Reader dependency injection with singleton support only and
>>> support for lifecycle callbacks?
>>>
>>
>> I'm not really going to inject dependencies, so it's more like
>> "service locator" than "dependency injection".
>>
>
> I don't see the distinction really - unless your saying its that a service
> locator may be happy ignoring a missing implementation. From the core
> modules perspective, it seems more like straight dependency injection to me.
It's a matter of who has control.
service locator = components ask for their dependencies
dependency injection = components are stuffed full with their dependencies
For that distinction to become more visible we need to introduce
chained dependencies, alternate dependencies, missing/optional
dependencies, etc. Not going to do this :)

>> So far we discussed per-reader component instances (I can't force
>> myself to call them singletons)
> Right - I don't mean to just call them singletons, it was just part of the
> analogy. In dependency injection you generally have the option of getting
> fresh instances or a singleton. In that way, I was comparing it to a per
> reader dependency injection framework set to singleton rather than new
> instance. Its not *really* a singleton, because its per reader.
Then, in your terminology, they are singletons, yes.

>> , with lifecycle callbacks. These
>> instances also serve as a factory for instances attached to
>> subreaders.
>>
> I don't quite understand that. Do you mean, back to my confusing analogy,
> that we will have both per reader singletons, as well as the option to ask
> for new instances then?
We'll have per-reader singletons for toplevel readers which will be
asked for instances of themselves for newly created subreaders.
I'm not myself sure of exact design yet.

> LuceneGuice :)  (Thats completely a throwaway, so don't draw any conclusions
> from it :))
I'm absolutely not going to build some uberframework. Something
simple. A hashmap. And a pair of helper methods. Maybe one class. And
a bunch of interfaces.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
> On Wed, Apr 15, 2009 at 00:15, Mark Miller <ma...@gmail.com> wrote:
>   
>> Mark Miller wrote:
>>     
>>> Earwin Burrfoot wrote:
>>>       
>>>> Mark Miller wrote:
>>>>
>>>>         
>>>>> The distinction I am making with core is that we will have to call known
>>>>> methods on those
>>>>> core 'modules' that are not very generic? Doesn't that keep it from
>>>>> playing
>>>>> nice with the very generic 'attach this to this segment'?
>>>>>
>>>>>           
>>>> Genericity spans binding, notifications and retrieval of a component
>>>> from a reader.
>>>>
>>>> For binding and notifications you can do whatever you want inside your
>>>> component, because you're provided with the full public IndexReader
>>>> interface.
>>>> When you retrieve a component, you get your very own interface with
>>>> all your needed and totally nongeneric methods.
>>>>
>>>> ObscureComponent c = reader.plugin(ObscureComponent.class);
>>>> c.performSomeObscureNonGenericLogic();
>>>>
>>>> This very code can be located both in userland, dealing with custom
>>>> components and in coreland, dealing with well-known Lucene core
>>>> components.
>>>>
>>>>
>>>>         
>>> Thats sounding interesting. From what you and Mike are saying, its a
>>> dependency injection framework basically then? Impls are plugged in from a
>>> configuration class? Except you can plug in your own random stuff thats not
>>> used internally, so its also kind of an Impl cache?
>>>
>>>
>>> So IndexReader Plugins would be: per Reader impl configuration and caching
>>> - with call backs too though - I guess you could add other callbacks and get
>>> involved in some more interesting stuff as well, if you wanted?
>>>
>>> It keeps growing on me anyway.
>>>
>>>       
>> Or its per Reader dependency injection with singleton support only and
>> support for lifecycle callbacks?
>>     
> I'm not really going to inject dependencies, so it's more like
> "service locator" than "dependency injection".
>   
I don't see the distinction really - unless your saying its that a 
service locator may be happy ignoring a missing implementation. From the 
core modules perspective, it seems more like straight dependency 
injection to me.
> So far we discussed per-reader component instances (I can't force
> myself to call them singletons)
Right - I don't mean to just call them singletons, it was just part of 
the analogy. In dependency injection you generally have the option of 
getting fresh instances or a singleton. In that way, I was comparing it 
to a per reader dependency injection framework set to singleton rather 
than new instance. Its not *really* a singleton, because its per reader.

> , with lifecycle callbacks. These
> instances also serve as a factory for instances attached to
> subreaders.
>   
I don't quite understand that. Do you mean, back to my confusing 
analogy, that we will have both per reader singletons, as well as the 
option to ask for new instances then?

LuceneGuice :)  (Thats completely a throwaway, so don't draw any 
conclusions from it :))

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

On Wed, Apr 15, 2009 at 00:15, Mark Miller <ma...@gmail.com> wrote:
> Mark Miller wrote:
>>
>> Earwin Burrfoot wrote:
>>>
>>> Mark Miller wrote:
>>>
>>>>
>>>> The distinction I am making with core is that we will have to call known
>>>> methods on those
>>>> core 'modules' that are not very generic? Doesn't that keep it from
>>>> playing
>>>> nice with the very generic 'attach this to this segment'?
>>>>
>>>
>>> Genericity spans binding, notifications and retrieval of a component
>>> from a reader.
>>>
>>> For binding and notifications you can do whatever you want inside your
>>> component, because you're provided with the full public IndexReader
>>> interface.
>>> When you retrieve a component, you get your very own interface with
>>> all your needed and totally nongeneric methods.
>>>
>>> ObscureComponent c = reader.plugin(ObscureComponent.class);
>>> c.performSomeObscureNonGenericLogic();
>>>
>>> This very code can be located both in userland, dealing with custom
>>> components and in coreland, dealing with well-known Lucene core
>>> components.
>>>
>>>
>>
>> Thats sounding interesting. From what you and Mike are saying, its a
>> dependency injection framework basically then? Impls are plugged in from a
>> configuration class? Except you can plug in your own random stuff thats not
>> used internally, so its also kind of an Impl cache?
>>
>>
>> So IndexReader Plugins would be: per Reader impl configuration and caching
>> - with call backs too though - I guess you could add other callbacks and get
>> involved in some more interesting stuff as well, if you wanted?
>>
>> It keeps growing on me anyway.
>>
> Or its per Reader dependency injection with singleton support only and
> support for lifecycle callbacks?
I'm not really going to inject dependencies, so it's more like
"service locator" than "dependency injection".
So far we discussed per-reader component instances (I can't force
myself to call them singletons), with lifecycle callbacks. These
instances also serve as a factory for instances attached to
subreaders.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Mark Miller wrote:
> Earwin Burrfoot wrote:
>> Mark Miller wrote:
>>  
>>> The distinction I am making with core is that we will have to call 
>>> known
>>> methods on those
>>> core 'modules' that are not very generic? Doesn't that keep it from 
>>> playing
>>> nice with the very generic 'attach this to this segment'?
>>>     
>> Genericity spans binding, notifications and retrieval of a component
>> from a reader.
>>
>> For binding and notifications you can do whatever you want inside your
>> component, because you're provided with the full public IndexReader
>> interface.
>> When you retrieve a component, you get your very own interface with
>> all your needed and totally nongeneric methods.
>>
>> ObscureComponent c = reader.plugin(ObscureComponent.class);
>> c.performSomeObscureNonGenericLogic();
>>
>> This very code can be located both in userland, dealing with custom
>> components and in coreland, dealing with well-known Lucene core
>> components.
>>
>>   
> Thats sounding interesting. From what you and Mike are saying, its a 
> dependency injection framework basically then? Impls are plugged in 
> from a configuration class? Except you can plug in your own random 
> stuff thats not used internally, so its also kind of an Impl cache?
>
>
> So IndexReader Plugins would be: per Reader impl configuration and 
> caching - with call backs too though - I guess you could add other 
> callbacks and get involved in some more interesting stuff as well, if 
> you wanted?
>
> It keeps growing on me anyway.
>
Or its per Reader dependency injection with singleton support only and 
support for lifecycle callbacks?

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
> Mark Miller wrote:
>   
>> The distinction I am making with core is that we will have to call known
>> methods on those
>> core 'modules' that are not very generic? Doesn't that keep it from playing
>> nice with the very generic 'attach this to this segment'?
>>     
> Genericity spans binding, notifications and retrieval of a component
> from a reader.
>
> For binding and notifications you can do whatever you want inside your
> component, because you're provided with the full public IndexReader
> interface.
> When you retrieve a component, you get your very own interface with
> all your needed and totally nongeneric methods.
>
> ObscureComponent c = reader.plugin(ObscureComponent.class);
> c.performSomeObscureNonGenericLogic();
>
> This very code can be located both in userland, dealing with custom
> components and in coreland, dealing with well-known Lucene core
> components.
>
>   
Thats sounding interesting. From what you and Mike are saying, its a 
dependency injection framework basically then? Impls are plugged in from 
a configuration class? Except you can plug in your own random stuff 
thats not used internally, so its also kind of an Impl cache?


So IndexReader Plugins would be: per Reader impl configuration and 
caching - with call backs too though - I guess you could add other 
callbacks and get involved in some more interesting stuff as well, if 
you wanted?

It keeps growing on me anyway.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

Mark Miller wrote:
> The distinction I am making with core is that we will have to call known
> methods on those
> core 'modules' that are not very generic? Doesn't that keep it from playing
> nice with the very generic 'attach this to this segment'?
Genericity spans binding, notifications and retrieval of a component
from a reader.

For binding and notifications you can do whatever you want inside your
component, because you're provided with the full public IndexReader
interface.
When you retrieve a component, you get your very own interface with
all your needed and totally nongeneric methods.

ObscureComponent c = reader.plugin(ObscureComponent.class);
c.performSomeObscureNonGenericLogic();

This very code can be located both in userland, dealing with custom
components and in coreland, dealing with well-known Lucene core
components.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Apr 14, 2009 at 2:53 PM, Mark Miller <ma...@gmail.com> wrote:

> The distinction I am making with core is that we will have to call known
> methods on those
> core 'modules' that are not very generic? Doesn't that keep it from playing
> nice with the very generic 'attach this to this segment'?

I think the way it'd work is SegmentReader would no longer expose much
API at all; you'd have to pull the component you want and interact
with that.

EG to execute a search, we'd have to pull the TermsDictComponent (to
get toplevel stats for TermWeight), the NormsComponent and the
PostintsComponent, and TermQuery would then interact with these three
to do its searching.

I think?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
>
> I just don't see any compelling reason to go hybrid here. If you have
> a method to bind 'any' plugin, why do you want to have a special
> method to bind ABC plugin, no matter how 'core' is it?
>   
Thanks Earin, that was very helpful. Comment regarding the above:

The distinction I am making with core is that we will have to call known 
methods on those
core 'modules' that are not very generic? Doesn't that keep it from 
playing nice with the very generic 'attach this to this segment'?

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

> The original example justification was to avoid putting a ValueSource in the
> IndexReader (I guess avoiding the funky init code? valueSource = new
> CachingValueSource(this, new UninversionValueSource(this))
That was a bit of drama for the sake of drama, I couldn't restrain myself :)
My justification is that besides ValueSource, I want to bind a pair of
my own classes to segments, and other lucene developers/users will
probably have similar needs later.
Lucene developers will add references to their components directly on
readers, users will produce ugly code to keep track of segments or
work off patched Lucene, what a mess!

> Filter caches, Query caches, etc seem easy enough to implement external to Lucene just
> using the Reader instance? I guess it helps with sharing (possible merging
> in RAM)?
Yes, I currently have a bunch of caches bound to reader, and they do a
full warmup after each reopen, before this reader starts being used
for searching.
I get ~0-30ms search time for ugly requests with ranges and lots of
and/or/notted filters with nontrivial sorts and clustering, but pay
with ~10-40s reopens.
Per-segment warmup will definetly help here, especially considering
the layout of my segments.

I'm not saying I can't do it without reader plugins. They will simply
save me (and, hopefully, not only me) from writing strange
segment-tracking code.

> Mike appears to have driven it towards modularizing the current pieces in
> SegmentReader now. Thats not so flexible though (as a generic plugin
> framework), and it seems like a bit of a leap over the original proposal.
I believe the api could handle both concerns - modularizing Lucene
innards and allowing external extensibility. Splitting two concerns
will produce two instances of code that does essentially the same
thing.

> The gain if we get this is possibly sharing the components and being able to
> plug in new impls (do we really want to support that, would it make sense
> based on current customization needs?)?
More like new types, not impls, I made several examples up there.
Switching impls for core components is nice for experimentation, but I
don't see absolutely winning cases right now.

> And I don't understand how initing vs inflight destroys the whole reason for
> the API - if you don't necessarily care that its reconfigurable inflight,
> whats the problem? The static constructors are not beautiful, but a leaky
> API would be worse - I'd rather unsuspecting Lucene users were not hit by
> trains :)
initing vs inflight is largely a matter of preference. Initing, I must
admit, is way more kosher, but inflight looks better :) I'm currently
thinking of a way to take the best of both.
What destroys the whole reason is the factory Michael offered - it
hardcodes component types that you can possibly bind to the reader.

> So Marvin has mentioned that that leads us to a bit of a hybrid approach,
> but Earwin doesn't like that?
I just don't see any compelling reason to go hybrid here. If you have
a method to bind 'any' plugin, why do you want to have a special
method to bind ABC plugin, no matter how 'core' is it?

> Hopefully I have not muddied the water further, but I can't get a clear grasp on this. Happy to wait for example code, but if someone can unify the discussion in a bit more a succinct way, I wouldnt complain ;)
I 'hope' to have some time this weekend, so I can provide examples.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Any chance on a brief summary of the current motivations for IndexReader 
plugins (now that all of this discussion has taken place)?

The original example justification was to avoid putting a ValueSource in 
the IndexReader (I guess avoiding the funky init code? valueSource = new 
CachingValueSource(this, new UninversionValueSource(this))

I never intended on keeping that bit of ugly code (it was a test way to 
get the thing inited, and since I've taken valueSource out of the 
Reader), but I guess your idea would allow me to stick my
ValueCache in using a more flexible API - 
IndexReader.bind(ValueSource.class, valueSource). I don't know how 
necessary that is for other pieces though. Now that I am trying to put 
the valueSource back in the Reader (831 still in heavy flux), this is 
looking like it could be nice for that though. But I think in flight 
binding would be wrong (we shouldn't take the view of not worrying about 
our users). But unless it makes sense to bind other things like that 
(not the built in stuff and other than ValueSource), I dont know that it 
makes sense yet.  Filter caches, Query caches, etc seem easy enough to 
implement external to Lucene just using the Reader instance? I guess it 
helps with sharing (possible merging in RAM)?

Mike appears to have driven it towards modularizing the current pieces 
in SegmentReader now. Thats not so flexible though (as a generic plugin 
framework), and it seems like a bit of a leap over the original 
proposal. The gain if we get this is possibly sharing the components and 
being able to plug in new impls (do we really want to support that, 
would it make sense based on current customization needs?)?

So Marvin has mentioned that that leads us to a bit of a hybrid 
approach, but Earwin doesn't like that?

And I don't understand how initing vs inflight destroys the whole reason 
for the API - if you don't necessarily care that its reconfigurable 
inflight, whats the problem? The static constructors are not beautiful, 
but a leaky API would be worse - I'd rather unsuspecting Lucene users 
were not hit by trains :)

Hopefully I have not muddied the water further, but I can't get a clear 
grasp on this. Happy to wait for example code, but if someone can unify 
the discussion in a bit more a succinct way, I wouldnt complain ;)






---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

>> > With the early binding approach, you wouldn't pass all plugins during
>> > creation; you'd pass a factory object that exposes methods like:
>> >
>> >  getPostingsComponent(SegmentInfo)
>> >  getStoredFieldsComponent(SegmentInfo)
>> >  getValueSourceComponent(SegmentInfo)
>>
>> That basically kills the whole idea.
>
> Heh.  I can confirm that that approach turns out to be less flexible than we
> might hope.
You're speaking of getPostings/StoredFields/ValueSourceCompontent
factory, or my initial proposal?

>> My initial reason for adding index plugins was to support user-written
>> components that have strong 1-1 binding with segments.
>
> I consider solving this problem crucial for Lucy.
>
> Earwin/Kirill, I wasn't able to think of any way to pull this off except to
> install components using a hash table and retrieve them by string identifier.
> Can you think of any other options?
That is exactly what I am offering. Except I key by Class<?> and not
String, it's a little bit faster and allows type-safe component
retrieval method.
I don't see any issues with this design, except some strange people
retrieving a component per-hit while searching. I do not pity them :)

>> Filter caches, Query caches, Value caches, Sort caches, Clustering caches,
>> whatever.  The same plugin system can also support lucene-internal
>> components with similarily strong binding to segments/indexes.
>
> Can you elaborate on that?
What exactly do you want to hear? Description for each mentioned component?

>> If you introduce that factory to create components, you hardcode
>> component types once again, and one can't add a new type of component
>> without patching Lucene.
> Not necessarily.  The list of fixed components can be augmented with an
> auxilliary list.
If you have a generic API that works well enough, what is the point in
making partial specializations? API should be minimal.

> However, in Lucy, I'm tempted to strip down the API for SegReader so that you
> would almost always access data by grabbing a component first.  Keeping the
> interface minimal makes supporting wildly disparate back ends more
> straightforward.
Yup! You read my thoughts.

>> Also, I strongly believe components should receive a reader when binding.
>> If they need segmentInfo - they should get it from reader, if they need
>> anything else and it is private - there should be a getter for it.
>
> Hmm.  This is how I've been thinking about having the factory methods work in
> Lucy (translated to Java):
>
>  public class MyArchitecture extends Architecture {
>    public DataReader makeDocReader(Schema schema, Folder folder,
>                                    Snapshot snapshot, Segment segment) {
>      return new ZlibDocReader(schema, folder, snapshot, segment);
>    }
>    // ...
>  }
>
> Do you think something like this would work better?
>
>  public class MyArchitecture extends Architecture {
>    public void registerDocReader(SegReader reader) {
>      ZlibDocReader docReader = new ZlibDocReader(reader.getSchema(),
>        reader.getFolder(), reader.getSnapshot(), reader.getSegment());
>      reader.register("DocReader", docReader);
>    }
>    // ...
>  }
Absolutely. Your next DocReader implementation will most probably need
something off SegReader you forgot to include here ->
makeDocReader(Schema schema, Folder folder, Snapshot snapshot, Segment
segment).
Even more, some other (non-DocReader) component will require its own
totally different stuff, or maybe it will need only create/destroy
notifications and no data.
I'm trying to build a generic API that will also be partially immune
to Lucene's dreaded "We no longer need this stuff, let's deprecate it
and waste time, manpower and API clarity to support it for the three
upcoming years".

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Apr 14, 2009 at 03:52:39PM +0400, Earwin Burrfoot wrote:

> > With the early binding approach, you wouldn't pass all plugins during
> > creation; you'd pass a factory object that exposes methods like:
> >
> >  getPostingsComponent(SegmentInfo)
> >  getStoredFieldsComponent(SegmentInfo)
> >  getValueSourceComponent(SegmentInfo)
> 
> That basically kills the whole idea.

Heh.  I can confirm that that approach turns out to be less flexible than we
might hope.  

In KinoSearch svn trunk, SegReader is now made up entirely of pluggable
components.  These components are loaded via factory methods from the
Architecture class.  It's very nice to be able to override
architecture.makeDocReader() and install your own custom class.  

However, the system doesn't provide a way to install custom components.  That
causes problems for custom Query subclasses that might rely on specialized
data -- for example, an RTreeQuery which needs data from an RTreeReader.

> My initial reason for adding index plugins was to support user-written
> components that have strong 1-1 binding with segments. 

I consider solving this problem crucial for Lucy.

Earwin/Kirill, I wasn't able to think of any way to pull this off except to
install components using a hash table and retrieve them by string identifier.
Can you think of any other options?

Discussion on lucy-dev at...
<http://mail-archives.apache.org/mod_mbox/lucene-lucy-dev/200903.mbox/%3C20090322185122.GA32236@rectangular.com%3E>.

> Filter caches, Query caches, Value caches, Sort caches, Clustering caches,
> whatever.  The same plugin system can also support lucene-internal
> components with similarily strong binding to segments/indexes.

Can you elaborate on that?

> If you introduce that factory to create components, you hardcode
> component types once again, and one can't add a new type of component
> without patching Lucene.

Not necessarily.  The list of fixed components can be augmented with an
auxilliary list.

However, in Lucy, I'm tempted to strip down the API for SegReader so that you
would almost always access data by grabbing a component first.  Keeping the
interface minimal makes supporting wildly disparate back ends more
straightforward.

> Also, I strongly believe components should receive a reader when binding.
> If they need segmentInfo - they should get it from reader, if they need
> anything else and it is private - there should be a getter for it.

Hmm.  This is how I've been thinking about having the factory methods work in
Lucy (translated to Java):

  public class MyArchitecture extends Architecture {
    public DataReader makeDocReader(Schema schema, Folder folder, 
                                    Snapshot snapshot, Segment segment) {
      return new ZlibDocReader(schema, folder, snapshot, segment);
    }
    // ...
  }

Do you think something like this would work better?

  public class MyArchitecture extends Architecture {
    public void registerDocReader(SegReader reader) {
      ZlibDocReader docReader = new ZlibDocReader(reader.getSchema(), 
        reader.getFolder(), reader.getSnapshot(), reader.getSegment());
      reader.register("DocReader", docReader);
    }
    // ...
  }

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

>> IndexReader.java is littered with the likes of:
>> public static IndexReader open(final Directory directory,
>> IndexDeletionPolicy deletionPolicy) throws CorruptIndexException,
>> IOException;
> But I don't understand why is this a problem...
Doubling the number of factory methods? We have to keep old signatures.

> With the early binding approach, you wouldn't pass all plugins during
> creation; you'd pass a factory object that exposes methods like:
>
>  getPostingsComponent(SegmentInfo)
>  getStoredFieldsComponent(SegmentInfo)
>  getValueSourceComponent(SegmentInfo)
That basically kills the whole idea.
My initial reason for adding index plugins was to support user-written
components that have strong 1-1 binding with segments. Filter caches,
Query caches, Value caches, Sort caches, Clustering caches, whatever.
The same plugin system can also support lucene-internal components
with similarily strong binding to segments/indexes.
If you introduce that factory to create components, you hardcode
component types once again, and one can't add a new type of component
without patching Lucene.

Also, I strongly believe components should receive a reader when
binding. If they need segmentInfo - they should get it from reader, if
they need anything else and it is private - there should be a getter
for it.

> But since you seem quite convinced of its worth, maybe code up a
> simple demonstration (patch on Lucene's trunk) of what it buys us,
> where early binding would not buy us that benefit, and then we can
> iterate from there?
I'll do both and then we can compare approaches.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Apr 13, 2009 at 9:44 AM, Earwin Burrfoot <ea...@gmail.com> wrote:
> On Mon, Apr 13, 2009 at 17:14, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot <ea...@gmail.com> wrote:
>>
>>>> I think I'd lean towards only at construction.  Seems dangerous to
>>>> allow swap in/out at some later time.
>>> I have several points pro-runtime:
>>> 1. We have a lot of static factory methods, it's gonna be a problem
>>> passing plugins everywhere on creation.
>>
>> "We" is who here?  Couldn't "they refactor their code to callback into
>> those static factory methods?
> "We" as in "(un)fortunate users of Lucene".
> IndexReader.java is littered with the likes of:
> public static IndexReader open(final Directory directory,
> IndexDeletionPolicy deletionPolicy) throws CorruptIndexException,
> IOException;

But I don't understand why is this a problem...

In the early binding approach (where component is established at
creation of a SegmentReader), you'd pass something that can create new
components, regardless of whether we use static factory vs ctors to
birth a new reader.  Ie, this seems non-differentiating to the early
vs late binding discussion.

>> Presumably you'd pass into Lucene a "CreatesXYZComponent" instance
>> which given a SegmentInfo (or a public'd version of it, with
>> extensibility, etc) returns an XYZComponent.
> I want to pass a component into toplevel reader, it will be
> consequently notified of subreader creation/deletion and given a
> chance to reuse its instance for them, or provide a new one.

I still feel this is backwards, on "first blush".

Meaning, for each SegmentReader I'd like to ask your "creator" to give
me a component that can read this segment, right now (early binding).
Ie, each SegmentReader has a private final instance of the
"PostingsComponent" (say).

Sure, it could be sometimes you'll share a given component instance
across more than one segment, but I'd expect that to be the exception
not the rule.

In Lucene today, everything is a "private" component, though for
reading the doc stores, this is wasteful in the case where the doc
store files are shared across segments. So that'd be one case today
where sharing would be useful.  Maybe you could focus on that as your
first proof point for this change?

>>> 2. public <T extends IRP> void bindPlugin(Class<? super T> type, T
>>> instance) is ..er.. typesafer than passing Map<Class, IRP> in
>>> constructor.
>>
>> Can you make this more concrete?  (I'm having trouble following... the
>> generics).
> Let's define the following:
> interface Plugin {}
> interface Service1 {}
> interface Service2 {}
> class Impl implements Plugin, Service1, Service2 {}
>
> IndexReader: public <T extends Plugin> void bindPlugin(Class<? super
> T> type, T instance);
>
> Now we can invoke bindPlugin like this:
> indexReader.bindPlugin(Service1.class, new Impl());
> indexReader.bindPlugin(Service2.class, new Impl());
>
> Nice points:
> It won't compile if Impl doesn't implement Plugin (which we need for
> attach/detach notifications).
> It won't compile if Impl doesn't implement the service interface we're
> binding it to.
> Service interface doesn't have to extend Plugin, so attach/detach
> methods that are a contract between a plugin and its host won't be
> visible to service consumers that will get an instance through:
> IndexReader: public <T> T getPlugin(Class<T> type);
> Service1 srv = indexReader.getPlugin(Service1.class);
>
> If you're passing all plugins as a bunch during creation, you have to
> use wither Map, or a List of Pairs.
> Map<Class, Plugin> plugins;
> new SegmentReader(somestuff, plugins);
> That way you're able to do less type checking than in previous example.

With the early binding approach, you wouldn't pass all plugins during
creation; you'd pass a factory object that exposes methods like:

  getPostingsComponent(SegmentInfo)
  getStoredFieldsComponent(SegmentInfo)
  getValueSourceComponent(SegmentInfo)

SegmentReader during creation would ask your factory for all the
components it needs, and hold them privately, as final.

We'd still have our strong type checking?  So, "strong type checking"
also seems non-differentiating wrt early vs late binding decision.

>>> 3. Plugins support attachment/detachment and they don't need to know
>>> exact reason they were attached/detached for - there should be no
>>> difference between closing a reader and removing plugin from it.
>> But what's the use case here?  Why would a SegmentReader need to
>> detach/attach a new XYZComponent?
> We already have it for free. A component should receive notification
> when SegmentReader is created. A component should also receive
> notification when SegmentReader is close()d. Nobody prevents us from
> using these notifications at a different point in time.

I understand we got something for free... it's that I don't see the
value of that "something", but I do see added API complexity.

>> And... why not simply open a new SegmentReader?  With this change,
>> opening a new reader can be an extremely low-cost operation since you
>> could share already opened XYZComponents.  A SegmentReader simply
>> becomes a set of components held together.
> I lean to post-construction attach/detach vs passing plugins in ctor
> not because I want to abuse it for inflight indexReader
> reconfiguration, it's just simpler to use with the way IndexReaders
> are created.

I guess net/net I still can't see what we gain by decoupling
instantiation of a SegmentReader from binding of its components (late
binding).  As best I can tell it adds complexity to the API without
gaining us anything (at least anything I understand so far).

But since you seem quite convinced of its worth, maybe code up a
simple demonstration (patch on Lucene's trunk) of what it buys us,
where early binding would not buy us that benefit, and then we can
iterate from there?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

On Mon, Apr 13, 2009 at 17:14, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot <ea...@gmail.com> wrote:
>
>>> I think I'd lean towards only at construction.  Seems dangerous to
>>> allow swap in/out at some later time.
>> I have several points pro-runtime:
>> 1. We have a lot of static factory methods, it's gonna be a problem
>> passing plugins everywhere on creation.
>
> "We" is who here?  Couldn't "they refactor their code to callback into
> those static factory methods?
"We" as in "(un)fortunate users of Lucene".
IndexReader.java is littered with the likes of:
public static IndexReader open(final Directory directory,
IndexDeletionPolicy deletionPolicy) throws CorruptIndexException,
IOException;

> Presumably you'd pass into Lucene a "CreatesXYZComponent" instance
> which given a SegmentInfo (or a public'd version of it, with
> extensibility, etc) returns an XYZComponent.
I want to pass a component into toplevel reader, it will be
consequently notified of subreader creation/deletion and given a
chance to reuse its instance for them, or provide a new one.

>> 2. public <T extends IRP> void bindPlugin(Class<? super T> type, T
>> instance) is ..er.. typesafer than passing Map<Class, IRP> in
>> constructor.
>
> Can you make this more concrete?  (I'm having trouble following... the
> generics).
Let's define the following:
interface Plugin {}
interface Service1 {}
interface Service2 {}
class Impl implements Plugin, Service1, Service2 {}

IndexReader: public <T extends Plugin> void bindPlugin(Class<? super
T> type, T instance);

Now we can invoke bindPlugin like this:
indexReader.bindPlugin(Service1.class, new Impl());
indexReader.bindPlugin(Service2.class, new Impl());

Nice points:
It won't compile if Impl doesn't implement Plugin (which we need for
attach/detach notifications).
It won't compile if Impl doesn't implement the service interface we're
binding it to.
Service interface doesn't have to extend Plugin, so attach/detach
methods that are a contract between a plugin and its host won't be
visible to service consumers that will get an instance through:
IndexReader: public <T> T getPlugin(Class<T> type);
Service1 srv = indexReader.getPlugin(Service1.class);

If you're passing all plugins as a bunch during creation, you have to
use wither Map, or a List of Pairs.
Map<Class, Plugin> plugins;
new SegmentReader(somestuff, plugins);
That way you're able to do less type checking than in previous example.

>> 3. Plugins support attachment/detachment and they don't need to know
>> exact reason they were attached/detached for - there should be no
>> difference between closing a reader and removing plugin from it.
> But what's the use case here?  Why would a SegmentReader need to
> detach/attach a new XYZComponent?
We already have it for free. A component should receive notification
when SegmentReader is created. A component should also receive
notification when SegmentReader is close()d. Nobody prevents us from
using these notifications at a different point in time.

> And... why not simply open a new SegmentReader?  With this change,
> opening a new reader can be an extremely low-cost operation since you
> could share already opened XYZComponents.  A SegmentReader simply
> becomes a set of components held together.
I lean to post-construction attach/detach vs passing plugins in ctor
not because I want to abuse it for inflight indexReader
reconfiguration, it's just simpler to use with the way IndexReaders
are created.


-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot <ea...@gmail.com> wrote:

>> I think I'd lean towards only at construction.  Seems dangerous to
>> allow swap in/out at some later time.
> I have several points pro-runtime:
> 1. We have a lot of static factory methods, it's gonna be a problem
> passing plugins everywhere on creation.

"We" is who here?  Couldn't "they refactor their code to callback into
those static factory methods?

Presumably you'd pass into Lucene a "CreatesXYZComponent" instance
which given a SegmentInfo (or a public'd version of it, with
extensibility, etc) returns an XYZComponent.

> 2. public <T extends IRP> void bindPlugin(Class<? super T> type, T
> instance) is ..er.. typesafer than passing Map<Class, IRP> in
> constructor.

Can you make this more concrete?  (I'm having trouble following... the
generics).

> 3. Plugins support attachment/detachment and they don't need to know
> exact reason they were attached/detached for - there should be no
> difference between closing a reader and removing plugin from it.

But what's the use case here?  Why would a SegmentReader need to
detach/attach a new XYZComponent?

And... why not simply open a new SegmentReader?  With this change,
opening a new reader can be an extremely low-cost operation since you
could share already opened XYZComponents.  A SegmentReader simply
becomes a set of components held together.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

>> Can we outline some requirements for the plugin API?
>>
>> Do we want to attach/detach them to IndexReader after it is created,
>> or only during construction?
>
> I think I'd lean towards only at construction.  Seems dangerous to
> allow swap in/out at some later time.
I have several points pro-runtime:
1. We have a lot of static factory methods, it's gonna be a problem
passing plugins everywhere on creation.
2. public <T extends IRP> void bindPlugin(Class<? super T> type, T
instance) is ..er.. typesafer than passing Map<Class, IRP> in
constructor.
3. Plugins support attachment/detachment and they don't need to know
exact reason they were attached/detached for - there should be no
difference between closing a reader and removing plugin from it.

The only danger is that you can remove/change some plugin in the midst
of indexReader operation. Well, bad for you, you jumped the rails and
got hit by a train.
Another possible issue is synchronization, but synced copy-on-write
proved itself countless times :)

>> We probably want them to support (know the difference between)
>> SegmentReader/MultiSegmentReader.
>
> Yeah... and I think it's fine if some components only "operate" at the
> SegmentReader level.
>
>> What about ParallelReader (does anybody use it at all?),
>> FilterIndexReader, MultiReader?
>
> Presumably these would use "aggregator components" that simply take N
> components under the hood and merge their APIs.  (Like MultiTermEnum,
> MultiTermDocs).
>
>> For a hierarchy of readers, API should probably support the notion of
>> different plugin instances per-subreader.
>
> Yes, eg it needs to be easy for the N segments discovered on opening a
> MultiSegmentReader to each ask for their own instance of
> PostingsComponent.
>
>> Do we want plugins supporting more than one interface, or is it an
>> unnecessary complication?
>> Like:
>> indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class);
>> And then:
>> indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class)
>
> I would start without that, unless we have a clear "now" example that
> requires it?


-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Apr 13, 2009 at 7:31 AM, Earwin Burrfoot <ea...@gmail.com> wrote:
>> I think this (truly componentizing SegmentReader) makes tons of sense.
>>  After all, a SegmentReader is just a bunch of separate components
>> handling different parts of the index.
>>
>> This is really orthogonal to LUCENE-831 (the field cache is just one
>> component).  They can land in either order...
>>
>> Earwin do you want to take an initial stab (patch) at this?
> Okay. The only problem is, I'll take my time, can't cut on sleeping
> any more than now :)

I hear you ;)  And, that's no problem... each feature moves at its own
natural rate.  This is the natural way of open source.

>> I think it'll be interesting how the components API handles near
>> real-time search, because we want/expect components to be able to
>> merge themselves efficiently "in RAM" when possible.  EG if field
>> cache already has certain fields loaded, they can be merged in RAM; if
>> not, they should be merged on disk.  If field cache has pending
>> changes (in a future world when CSF makes it possible to suddenly
>> change say the price of certain documents), then the components must
>> properly implement clone (ideally incremental copy-on-write cloning).
> Can we outline some requirements for the plugin API?
>
> Do we want to attach/detach them to IndexReader after it is created,
> or only during construction?

I think I'd lean towards only at construction.  Seems dangerous to
allow swap in/out at some later time.

> We probably want them to support (know the difference between)
> SegmentReader/MultiSegmentReader.

Yeah... and I think it's fine if some components only "operate" at the
SegmentReader level.

> What about ParallelReader (does anybody use it at all?),
> FilterIndexReader, MultiReader?

Presumably these would use "aggregator components" that simply take N
components under the hood and merge their APIs.  (Like MultiTermEnum,
MultiTermDocs).

> For a hierarchy of readers, API should probably support the notion of
> different plugin instances per-subreader.

Yes, eg it needs to be easy for the N segments discovered on opening a
MultiSegmentReader to each ask for their own instance of
PostingsComponent.

> Do we want plugins supporting more than one interface, or is it an
> unnecessary complication?
> Like:
> indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class);
> And then:
> indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class)

I would start without that, unless we have a clear "now" example that
requires it?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

> I think this (truly componentizing SegmentReader) makes tons of sense.
>  After all, a SegmentReader is just a bunch of separate components
> handling different parts of the index.
>
> This is really orthogonal to LUCENE-831 (the field cache is just one
> component).  They can land in either order...
>
> Earwin do you want to take an initial stab (patch) at this?
Okay. The only problem is, I'll take my time, can't cut on sleeping
any more than now :)

> I think it'll be interesting how the components API handles near
> real-time search, because we want/expect components to be able to
> merge themselves efficiently "in RAM" when possible.  EG if field
> cache already has certain fields loaded, they can be merged in RAM; if
> not, they should be merged on disk.  If field cache has pending
> changes (in a future world when CSF makes it possible to suddenly
> change say the price of certain documents), then the components must
> properly implement clone (ideally incremental copy-on-write cloning).
Can we outline some requirements for the plugin API?

Do we want to attach/detach them to IndexReader after it is created,
or only during construction?
We probably want them to support (know the difference between)
SegmentReader/MultiSegmentReader.
What about ParallelReader (does anybody use it at all?),
FilterIndexReader, MultiReader?
For a hierarchy of readers, API should probably support the notion of
different plugin instances per-subreader.
Do we want plugins supporting more than one interface, or is it an
unnecessary complication?
Like:
indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class);
And then:
indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class)

> Mike
>
> On Sun, Apr 12, 2009 at 7:34 PM, Earwin Burrfoot <ea...@gmail.com> wrote:
>> To support my dream of kicking fieldCache out of the core and to add
>> some extensibility to Lucene, I want to introduce IndexReaderPlugins.
>> Rough pseudocode follows:
>>
>> interface IndexReaderPlugin {
>>        void attach(SegmentReader reader);
>>        void detach(SegmentReader reader);
>>
>>        void attach(MultiSegmentReader reader);
>>        void detach(MultiSegmentReader reader);
>> }
>>
>> IndexReader.java:
>> private Map<Class, IndexReaderPlugin> plugins;
>>
>> on opening/closing toplevel/segment reader we iterate over plugins:
>> for(IndexReaderPlugin plugin : plugins)
>>    plugin.attach(reader);
>>
>> the map is passed to toplevel reader initially, and then shared with
>> lowlevel readers, we can also retrieve plugins:
>> public <T> T plugin(Class<T> pluginType);
>>
>> then we can do something like:
>> indexReader.plugin(ValueSource.class).doSomething // lucene code
>> indexReader.plugin(FieldsCache.class).forField(LAST_UPDATE_TIME).doSomething
>> // my code
>> filter.apply(indexReader.plugin(FilterCache.class)) // my code
>>
>> Benefits are numerous. We get rid of alien code like:
>> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy)
>> @@ -83,6 +86,8 @@
>> +  protected ValueSource valueSource;
>> +
>> @@ -555,6 +560,8 @@
>> +
>> +      valueSource = new CachingValueSource(this, new
>> UninversionValueSource(this));
>>
>> If I don't need ValueSource attached to my readers, I won't have it.
>> If I need my custom caches attached to my readers, I can do it in a
>> natural way instead of hacking around MergeScheduler, or comparing
>> subreader lists.
>> If I want, I can replace Lucene's native ValueSource with my own
>> implementation, and all Lucene classes that use it will happily accept
>> it.
>>
>> On second thought, we shouldn't share plugin map across subreaders. If
>> we allow attach(SegmentReader reader) to return an instance of plugin
>> (plugin decides if it is the same instance always, or per-reader), and
>> populate the map for subreader with results of attach invoked on
>> toplevel reader map, we'll turn this code:
>> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod(segmentReader);
>> into:
>> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod();
>> which makes more sense
>>
>> Any way the general idea is still the same.
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> ICQ: 104465785
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
> This is really orthogonal to LUCENE-831 (the field cache is just one
> component).  They can land in either order...
>   
Yes, and the API does look interesting. I didn't mean to try and tie the 
discussion of it to LUCENE-831. I just wanted Earwin to know we are not 
going to leave him out to dry there. The code given as an example of 
what the proposed API tries to solve has already been factored out of a 
very early patch :) None of this being a big deal about anything, but I 
got the impression Earwin is quite worried the new FieldCache API will 
be limiting for his needs, and I wanted to ensure to him that it won't 
end up being the case.

IndexReader plugin makes sense either way. I'm only beginning to digest 
it though.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Michael McCandless <lu...@mikemccandless.com>.

I think this (truly componentizing SegmentReader) makes tons of sense.
 After all, a SegmentReader is just a bunch of separate components
handling different parts of the index.

This is really orthogonal to LUCENE-831 (the field cache is just one
component).  They can land in either order...

Earwin do you want to take an initial stab (patch) at this?

I think it'll be interesting how the components API handles near
real-time search, because we want/expect components to be able to
merge themselves efficiently "in RAM" when possible.  EG if field
cache already has certain fields loaded, they can be merged in RAM; if
not, they should be merged on disk.  If field cache has pending
changes (in a future world when CSF makes it possible to suddenly
change say the price of certain documents), then the components must
properly implement clone (ideally incremental copy-on-write cloning).

Mike

On Sun, Apr 12, 2009 at 7:34 PM, Earwin Burrfoot <ea...@gmail.com> wrote:
> To support my dream of kicking fieldCache out of the core and to add
> some extensibility to Lucene, I want to introduce IndexReaderPlugins.
> Rough pseudocode follows:
>
> interface IndexReaderPlugin {
>        void attach(SegmentReader reader);
>        void detach(SegmentReader reader);
>
>        void attach(MultiSegmentReader reader);
>        void detach(MultiSegmentReader reader);
> }
>
> IndexReader.java:
> private Map<Class, IndexReaderPlugin> plugins;
>
> on opening/closing toplevel/segment reader we iterate over plugins:
> for(IndexReaderPlugin plugin : plugins)
>    plugin.attach(reader);
>
> the map is passed to toplevel reader initially, and then shared with
> lowlevel readers, we can also retrieve plugins:
> public <T> T plugin(Class<T> pluginType);
>
> then we can do something like:
> indexReader.plugin(ValueSource.class).doSomething // lucene code
> indexReader.plugin(FieldsCache.class).forField(LAST_UPDATE_TIME).doSomething
> // my code
> filter.apply(indexReader.plugin(FilterCache.class)) // my code
>
> Benefits are numerous. We get rid of alien code like:
> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy)
> @@ -83,6 +86,8 @@
> +  protected ValueSource valueSource;
> +
> @@ -555,6 +560,8 @@
> +
> +      valueSource = new CachingValueSource(this, new
> UninversionValueSource(this));
>
> If I don't need ValueSource attached to my readers, I won't have it.
> If I need my custom caches attached to my readers, I can do it in a
> natural way instead of hacking around MergeScheduler, or comparing
> subreader lists.
> If I want, I can replace Lucene's native ValueSource with my own
> implementation, and all Lucene classes that use it will happily accept
> it.
>
> On second thought, we shouldn't share plugin map across subreaders. If
> we allow attach(SegmentReader reader) to return an instance of plugin
> (plugin decides if it is the same instance always, or per-reader), and
> populate the map for subreader with results of attach invoked on
> toplevel reader map, we'll turn this code:
> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod(segmentReader);
> into:
> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod();
> which makes more sense
>
> Any way the general idea is still the same.
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
> But it doesn't make the proposed API any less useful.
Indeed! And I don't mean to slow down that discussion.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Earwin Burrfoot <ea...@gmail.com>.

> Earwin Burrfoot wrote:
>>
>> Benefits are numerous. We get rid of alien code like:
>> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy)
>> @@ -83,6 +86,8 @@
>> +  protected ValueSource valueSource;
>> +
>> @@ -555,6 +560,8 @@
>> +
>> +      valueSource = new CachingValueSource(this, new
>> UninversionValueSource(this));
>>
>
> No need to rid Lucene of code thats not there :) Irregardless of the merits
> of IndexReader plugins, no need to design an API to solve experimental code
> in a rough, early patch for a given issue;)
>
> Design of the new FieldCache is happening at 831, and no comments are
> ignored in that discussion. We are not committing something tonight that
> will not meet your needs. No worries. Shoddy work does not appear to slip
> past McCandless when its time to commit :) (and it may be a long time yet
> for 831)

I probably went a bit overboard with dramatism. :)
But it doesn't make the proposed API any less useful.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexReader plugins

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
> Benefits are numerous. We get rid of alien code like:
> +++ src/java/org/apache/lucene/index/SegmentReader.java	(working copy)
> @@ -83,6 +86,8 @@
> +  protected ValueSource valueSource;
> +
> @@ -555,6 +560,8 @@
> +
> +      valueSource = new CachingValueSource(this, new
> UninversionValueSource(this));
>   
No need to rid Lucene of code thats not there :) Irregardless of the 
merits of IndexReader plugins, no need to design an API to solve 
experimental code in a rough, early patch for a given issue;)

Design of the new FieldCache is happening at 831, and no comments are 
ignored in that discussion. We are not committing something tonight that 
will not meet your needs. No worries. Shoddy work does not appear to 
slip past McCandless when its time to commit :) (and it may be a long 
time yet for 831)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org