You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2010/05/18 09:06:17 UTC

ReadTask and its hierarchy needs some house cleaning

Hi

I wanted to run a benchmark .alg which will take a Filter into account.
However, ReadTask, which is the base for a variety of search related tasks,
does not support a Filter. When I reviewed the class, to understand how I
can easily add such Filter support, I discovered a whole set of classes
which IMO are completely unnecessary. ReadTask defines some with*() methods,
such as withSearch, withTraverse etc. and many classes override ReadTask
just to return true/false in those methods. WarmTask for example, returns
true in withWarm() and false otherwise, while SearchTask returns true in
withSearch and false otherwise.

This created a whole set of extensions that you either need to run in
sequence (e.g. Warm, SearchWithCollector) or create your own extension just
to get the right recipe for the operations to perform.

I suggest we do the following changes:
* Rename ReadTask to SearchTask -- that's because RT uses IndexSearcher,
QueryMaker -- all that suggests it's about Searching and not Reading. It's
only semantics, I know, but I think SearchTask is clearer than ReadTask
* Get rid of all the with*() methods, and instead move to use properties:
search.with.warm, search.with.traverse, search.with.collector etc.
* Introduce protected createCollector, createFilter, createSort, for custom
extensions
* Create a completely new hierarchy for this task, throwing away everything
that can be handled through properties only (like SearchTask, WarmTask etc.)

If we do this, then extensions of the new SearchTask will need to ask
themselves "do I want to search w/ a Collector/Filter/custom Sort?" and not
"do I Warm to be executed?" The core operation behind this task is
IndexSearcher.search. The rest are just settings, or configuration, as well
as some added ops like warm, and traverse. If it makes sense, I can factor
warm() and traverse() into their own protected methods, for extensions to
override as well. It might make sense for warm because custom warms is
something I'm sure will be needed.

This will also allow running algorithms with rounds - different properties
for different rounds.

This approach does not prevent one from creating MySearchTask with
pre-defined and hard-coded settings. But for many others, the question of
which task to execute will go away - you execute SearchTask for the basic
search operations, or w/ the default Collector/Sort, and you control it via
properties. To create your own *SearchTask extension which hard-codes a
recipe, you'll need access to all the do<OP> members, so I'll make them
protected. But that's IMO is a rare requirement, than say running a search
with warm + traverse, and you shouldn't be forced to create a ReadTask
extension for that.

What do you think?

Shai

Re: ReadTask and its hierarchy needs some house cleaning

Posted by Doron Cohen <cd...@gmail.com>.

On Tue, May 18, 2010 at 1:39 PM, Shai Erera <se...@gmail.com> wrote:

> Doron, I like the idea of using params as properties, but I'm not sure I
> want to make it generic for all tasks. Some tasks, like AddDoc, receive a
> size as parameter. Moving from AddDoc(100) to AddDoc(size=100) seems
> redundant to me, although it's not "the end of the world" if we'll do that.
>

Good... a task that has a single parameter can still parse it as today if
getParams() is preserved (no reason not to), though I like the explicity of
including parameters names.

 In addition, I don't understand what did you mean by Read/Search
> differences and sharing IndexReaders ... It seems like IndexReader is
> reused, if you call getIndexReader(), same like getIndexWriter. Therefore
> I'm not sure the name ReadTask is related to that at all.
>

A ReadTask closes the reader if it opened it, but if an OpenReader task ran
explicitly prior to that, the ReadTask would not open nor close the reader,
and all those tasks would share the reader (until a CloseReader task takes
place).


> About SearchTrav - this should go away, and trav become a property?
>

Yes, but a parameter option is my preference, so a Search(traverse) would
achieve this.


> Mike - I like separating Warm from Search. I don't think it's related. I
> can do Warm once, then many searches.
>

While I like this separation, it is equivalent only when readers are shared,
right? Otherwise each task opens its own reader...

Today, readers are either shared or not shared, and it is possible for each
read task to have its own reader (without adding explicit open/close reader
tasks).
Separating the above tasks to not be related to a read task means that
today's alg like this:

{code}
   SearchTraverse
   Search
{code}

become something like this:

{code}
   OpenReader
   VisitHits
   CloseReader
   OpenReader
   Search
   CloseReader
{code}

Seems a lost of some expressiveness.

ReadTask always starts with opening a reader - unless already opened - and
ends with closing the reader - unless it was already opened. In between it
does things that need a reader.

Perhaps a nicer packaging for this would be for the ReadTask (rename to
ReaderTask?) to define an abstract method doReadLogic()  that it would call,
and perhaps even finalize its doLogic() impl, and so all subclasses can
provide the things done between opening and closing the readers. This way,
the hits of Searcher can be made a protected class member, and therefore
available for subclasses to call. Things become trickier though for optional
task actions taking place while looping the results, but then similarly
SearchTask can finalize its doReadLogic() and call doHitsLogic() which it
will define and so forth...  Hmm... doHitsLogic() is similar to Shai's
suggestion of handleHits()...

Regarding the warm() method itself - that's why I wanted to make it
> extendable. I don't think the current warming is good.
>
> Regarding Hits traversal - hits are per query. So It seems reasonable to me
> to have SearchTask which is configured to either traverse hits or not. I
> don't quite see how we can separate it from Search itself, unless we store
> all TopDocs somewhere, for later processing. Also, decoupling the two seems
> wrong to me - if you measure search perf and you want to include hits
> traversal, then those two should be done like the app does it - one
> following the other. Not "run all searches first" then "traverse the hits of
> each". We'll run into other problems like matching hit traversal w/ a query,
> and summing up the times ...
>
> Instead, if SearchTask provides a handleHits() method which by default, if
> configured, loads the doc, you can extend SearchTask and provide your own
> handleHits() impl, which takes into account FS or a Higlighter. Or ... we
> can introduce a HitsProcessor/Handler class which can have different impls
> for Highlighting + FS + "whatever else you want"? We can even separate the
> two (loading fields and highlighting) and invoke the right one if the
> property is set.
>
> Shai
>
>
> On Tue, May 18, 2010 at 1:20 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I agree we should do some house cleaning here...
>>
>> Can't we make "warm", "search", "trav" separate tasks?
>>
>> In fact what is now done by "warm" (just calling .document on all
>> non-deleted docs) is not usually how warming is done.  I would rather
>> rename this to a "LoadAllDocsTask".  We could add other specific
>> warming tasks -- say PopulateFieldCacheTask.  But... typical warming
>> is just to run certain targeted searches on the newly opened reader.
>> I don't think the Search task should have any notion of warming
>> itself.
>>
>> I think "trav" should be renamed to something like "VisitHitsTask",
>> and it can take options like "number of hits to visit", "load Document
>> or not", "run highlighter or not", etc.  Really... ideally, these
>> would be separate tasks as well, and the alg would let me, say,
>> iterate over N hits, invoking the per-hit tasks that I'd like to do
>> for each.
>>
>> Mike
>>
>> On Tue, May 18, 2010 at 5:43 AM, Doron Cohen <cd...@gmail.com> wrote:
>> >> Yes, such algorithms will be affected, but not necessarily deleted. So
>> if
>> >> a WarmReader task is required, one can write it, but it doesn't need to
>> >> extend SearchTask, or it can, but hard-code all the other properties to
>> >> false. Though in most cases you can run SearchTask, w/ warm set to true
>> and
>> >> after warm has been done, queries will be executed.
>> >
>> > Say you want to keep allowing doing only warm and also allowing doing
>> search
>> > with warm. Since each opens a reader, forcing search with warm to be
>> made by
>> > two tasks would mean you either can only use the shared reader (in which
>> > case the reader will only be opened once) or each of them opens its own
>> > reader - and then you are opening two readers, which is a noise in the
>> perf
>> > run.
>> >
>> >> The other search lines look like they can be rewritten w/ rounds?
>> >
>> > Making the package weaker...
>> >
>> >> Anyway, this shows a perfect example for my argument: Search,
>> SearchTrav
>> >> and SearchTravRet only differ by their config options, yet an entire
>> class
>> >> had to be created.
>> >
>> >
>> > Extending to modify behavior is pretty common... in regular Java code
>> > anonymous classes would do it for you (no need to create that class in a
>> > file) just that in bm tasks are created by reflection, so a concrete
>> class
>> > is needed.
>> >
>> >> If such sequence Search tasks is so crucial to be allowed, we can have
>> >> SearchTask accept parameters, like Search(trav=false,warm=true) etc. in
>> >> addition to the static .alg ones. I think personally it's an overkill,
>> but
>> >> could be a nice addition. It will definitely allow the above tasks to
>> run in
>> >> sequence, right?
>> >
>> > I actually like this, but slightly different, see below.
>> >
>> >>
>> >> Point is, if you have 6 attributes, you don't need to create 64 classes
>> in
>> >> order to execute any combination you may need.
>> >
>> > Good point...
>> >
>> > On one hand it is readable and useful to have a concrete class for a
>> > concrete task.
>> > On the other hand it doesn't make sense to have too many combinations.
>> > But then who needs all of them? and the ones really needed can easily be
>> > created...
>> >
>> > It seems to me that the suggested change is affecting the simplicity of
>> > writing algs..?
>> >
>> > ... so, I like the suggestion Search(trav=false,warm=true), just that
>> I'd do
>> > it differently - Read(search,warm).
>> > (Read and not Search because of the above reader's consideration.)
>> > Putting the Read/Search away for now, you could modify PerfTask so that
>> > setParams() would set properties, where properties are delimited by ',',
>> and
>> > for each property, if it contains '=' it is interpreted as name=value,
>> > otherwise it is interpreted as name=true. With modification, an
>> algorithm
>> > having SearchTrav(1000) is modified to SearchTrav(ntrav=1000), all the
>> > params parsing moves to PerfTask, and each task class need only know the
>> > name of its properties, which should differ from those of its supers. So
>> > SearchTravTask's setParams would modify to something like this:
>> >
>> > {code}
>> >   public void setParams(String params) {
>> >     super.setParams(params);
>> >     traversalSize =
>> > (int)Float.parseFloat(getProperty("ntrav",DEFAULT_N_TRAV);
>> >   }
>> > {code}
>> >
>> > This would also allow to get rid of supportsParams() - unused params
>> would
>> > be silently ignored.
>> >
>> > What do you think?
>> >
>> >>
>> >> Shai
>> >>
>> >> On Tue, May 18, 2010 at 10:32 AM, Doron Cohen <cd...@gmail.com>
>> wrote:
>> >>>
>> >>> How would this affect for example current micro-standard.alg?
>> >>>
>> >>> In particular this part of it:
>> >>>
>> >>> {code}
>> >>>         ...
>> >>>         { "WarmNewRdr" Warm > : 50
>> >>>         { "SrchNewRdr" Search > : 500
>> >>>         { "SrchTrvNewRdr" SearchTrav(1000) > : 300
>> >>>         { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
>> >>>         ...
>> >>> {code}
>> >>>
>> >>> Proposed change gets rid of these tasks, right?
>> >>>
>> >>> Doron
>> >>>
>> >>> On Tue, May 18, 2010 at 10:06 AM, Shai Erera <se...@gmail.com>
>> wrote:
>> >>>>
>> >>>> Hi
>> >>>>
>> >>>> I wanted to run a benchmark .alg which will take a Filter into
>> account.
>> >>>> However, ReadTask, which is the base for a variety of search related
>> tasks,
>> >>>> does not support a Filter. When I reviewed the class, to understand
>> how I
>> >>>> can easily add such Filter support, I discovered a whole set of
>> classes
>> >>>> which IMO are completely unnecessary. ReadTask defines some with*()
>> methods,
>> >>>> such as withSearch, withTraverse etc. and many classes override
>> ReadTask
>> >>>> just to return true/false in those methods. WarmTask for example,
>> returns
>> >>>> true in withWarm() and false otherwise, while SearchTask returns true
>> in
>> >>>> withSearch and false otherwise.
>> >>>>
>> >>>> This created a whole set of extensions that you either need to run in
>> >>>> sequence (e.g. Warm, SearchWithCollector) or create your own
>> extension just
>> >>>> to get the right recipe for the operations to perform.
>> >>>>
>> >>>> I suggest we do the following changes:
>> >>>> * Rename ReadTask to SearchTask -- that's because RT uses
>> IndexSearcher,
>> >>>> QueryMaker -- all that suggests it's about Searching and not Reading.
>> It's
>> >>>> only semantics, I know, but I think SearchTask is clearer than
>> ReadTask
>> >>>> * Get rid of all the with*() methods, and instead move to use
>> >>>> properties: search.with.warm, search.with.traverse,
>> search.with.collector
>> >>>> etc.
>> >>>> * Introduce protected createCollector, createFilter, createSort, for
>> >>>> custom extensions
>> >>>> * Create a completely new hierarchy for this task, throwing away
>> >>>> everything that can be handled through properties only (like
>> SearchTask,
>> >>>> WarmTask etc.)
>> >>>>
>> >>>> If we do this, then extensions of the new SearchTask will need to ask
>> >>>> themselves "do I want to search w/ a Collector/Filter/custom Sort?"
>> and not
>> >>>> "do I Warm to be executed?" The core operation behind this task is
>> >>>> IndexSearcher.search. The rest are just settings, or configuration,
>> as well
>> >>>> as some added ops like warm, and traverse. If it makes sense, I can
>> factor
>> >>>> warm() and traverse() into their own protected methods, for
>> extensions to
>> >>>> override as well. It might make sense for warm because custom warms
>> is
>> >>>> something I'm sure will be needed.
>> >>>>
>> >>>> This will also allow running algorithms with rounds - different
>> >>>> properties for different rounds.
>> >>>>
>> >>>> This approach does not prevent one from creating MySearchTask with
>> >>>> pre-defined and hard-coded settings. But for many others, the
>> question of
>> >>>> which task to execute will go away - you execute SearchTask for the
>> basic
>> >>>> search operations, or w/ the default Collector/Sort, and you control
>> it via
>> >>>> properties. To create your own *SearchTask extension which hard-codes
>> a
>> >>>> recipe, you'll need access to all the do<OP> members, so I'll make
>> them
>> >>>> protected. But that's IMO is a rare requirement, than say running a
>> search
>> >>>> with warm + traverse, and you shouldn't be forced to create a
>> ReadTask
>> >>>> extension for that.
>> >>>>
>> >>>> What do you think?
>> >>>>
>> >>>> Shai
>> >>>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: ReadTask and its hierarchy needs some house cleaning

Posted by Shai Erera <se...@gmail.com>.

Doron, I like the idea of using params as properties, but I'm not sure I
want to make it generic for all tasks. Some tasks, like AddDoc, receive a
size as parameter. Moving from AddDoc(100) to AddDoc(size=100) seems
redundant to me, although it's not "the end of the world" if we'll do that.

In addition, I don't understand what did you mean by Read/Search differences
and sharing IndexReaders ... It seems like IndexReader is reused, if you
call getIndexReader(), same like getIndexWriter. Therefore I'm not sure the
name ReadTask is related to that at all.

About SearchTrav - this should go away, and trav become a property?

Mike - I like separating Warm from Search. I don't think it's related. I can
do Warm once, then many searches.

Regarding the warm() method itself - that's why I wanted to make it
extendable. I don't think the current warming is good.

Regarding Hits traversal - hits are per query. So It seems reasonable to me
to have SearchTask which is configured to either traverse hits or not. I
don't quite see how we can separate it from Search itself, unless we store
all TopDocs somewhere, for later processing. Also, decoupling the two seems
wrong to me - if you measure search perf and you want to include hits
traversal, then those two should be done like the app does it - one
following the other. Not "run all searches first" then "traverse the hits of
each". We'll run into other problems like matching hit traversal w/ a query,
and summing up the times ...

Instead, if SearchTask provides a handleHits() method which by default, if
configured, loads the doc, you can extend SearchTask and provide your own
handleHits() impl, which takes into account FS or a Higlighter. Or ... we
can introduce a HitsProcessor/Handler class which can have different impls
for Highlighting + FS + "whatever else you want"? We can even separate the
two (loading fields and highlighting) and invoke the right one if the
property is set.

Shai

On Tue, May 18, 2010 at 1:20 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I agree we should do some house cleaning here...
>
> Can't we make "warm", "search", "trav" separate tasks?
>
> In fact what is now done by "warm" (just calling .document on all
> non-deleted docs) is not usually how warming is done.  I would rather
> rename this to a "LoadAllDocsTask".  We could add other specific
> warming tasks -- say PopulateFieldCacheTask.  But... typical warming
> is just to run certain targeted searches on the newly opened reader.
> I don't think the Search task should have any notion of warming
> itself.
>
> I think "trav" should be renamed to something like "VisitHitsTask",
> and it can take options like "number of hits to visit", "load Document
> or not", "run highlighter or not", etc.  Really... ideally, these
> would be separate tasks as well, and the alg would let me, say,
> iterate over N hits, invoking the per-hit tasks that I'd like to do
> for each.
>
> Mike
>
> On Tue, May 18, 2010 at 5:43 AM, Doron Cohen <cd...@gmail.com> wrote:
> >> Yes, such algorithms will be affected, but not necessarily deleted. So
> if
> >> a WarmReader task is required, one can write it, but it doesn't need to
> >> extend SearchTask, or it can, but hard-code all the other properties to
> >> false. Though in most cases you can run SearchTask, w/ warm set to true
> and
> >> after warm has been done, queries will be executed.
> >
> > Say you want to keep allowing doing only warm and also allowing doing
> search
> > with warm. Since each opens a reader, forcing search with warm to be made
> by
> > two tasks would mean you either can only use the shared reader (in which
> > case the reader will only be opened once) or each of them opens its own
> > reader - and then you are opening two readers, which is a noise in the
> perf
> > run.
> >
> >> The other search lines look like they can be rewritten w/ rounds?
> >
> > Making the package weaker...
> >
> >> Anyway, this shows a perfect example for my argument: Search, SearchTrav
> >> and SearchTravRet only differ by their config options, yet an entire
> class
> >> had to be created.
> >
> >
> > Extending to modify behavior is pretty common... in regular Java code
> > anonymous classes would do it for you (no need to create that class in a
> > file) just that in bm tasks are created by reflection, so a concrete
> class
> > is needed.
> >
> >> If such sequence Search tasks is so crucial to be allowed, we can have
> >> SearchTask accept parameters, like Search(trav=false,warm=true) etc. in
> >> addition to the static .alg ones. I think personally it's an overkill,
> but
> >> could be a nice addition. It will definitely allow the above tasks to
> run in
> >> sequence, right?
> >
> > I actually like this, but slightly different, see below.
> >
> >>
> >> Point is, if you have 6 attributes, you don't need to create 64 classes
> in
> >> order to execute any combination you may need.
> >
> > Good point...
> >
> > On one hand it is readable and useful to have a concrete class for a
> > concrete task.
> > On the other hand it doesn't make sense to have too many combinations.
> > But then who needs all of them? and the ones really needed can easily be
> > created...
> >
> > It seems to me that the suggested change is affecting the simplicity of
> > writing algs..?
> >
> > ... so, I like the suggestion Search(trav=false,warm=true), just that I'd
> do
> > it differently - Read(search,warm).
> > (Read and not Search because of the above reader's consideration.)
> > Putting the Read/Search away for now, you could modify PerfTask so that
> > setParams() would set properties, where properties are delimited by ',',
> and
> > for each property, if it contains '=' it is interpreted as name=value,
> > otherwise it is interpreted as name=true. With modification, an algorithm
> > having SearchTrav(1000) is modified to SearchTrav(ntrav=1000), all the
> > params parsing moves to PerfTask, and each task class need only know the
> > name of its properties, which should differ from those of its supers. So
> > SearchTravTask's setParams would modify to something like this:
> >
> > {code}
> >   public void setParams(String params) {
> >     super.setParams(params);
> >     traversalSize =
> > (int)Float.parseFloat(getProperty("ntrav",DEFAULT_N_TRAV);
> >   }
> > {code}
> >
> > This would also allow to get rid of supportsParams() - unused params
> would
> > be silently ignored.
> >
> > What do you think?
> >
> >>
> >> Shai
> >>
> >> On Tue, May 18, 2010 at 10:32 AM, Doron Cohen <cd...@gmail.com>
> wrote:
> >>>
> >>> How would this affect for example current micro-standard.alg?
> >>>
> >>> In particular this part of it:
> >>>
> >>> {code}
> >>>         ...
> >>>         { "WarmNewRdr" Warm > : 50
> >>>         { "SrchNewRdr" Search > : 500
> >>>         { "SrchTrvNewRdr" SearchTrav(1000) > : 300
> >>>         { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
> >>>         ...
> >>> {code}
> >>>
> >>> Proposed change gets rid of these tasks, right?
> >>>
> >>> Doron
> >>>
> >>> On Tue, May 18, 2010 at 10:06 AM, Shai Erera <se...@gmail.com> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I wanted to run a benchmark .alg which will take a Filter into
> account.
> >>>> However, ReadTask, which is the base for a variety of search related
> tasks,
> >>>> does not support a Filter. When I reviewed the class, to understand
> how I
> >>>> can easily add such Filter support, I discovered a whole set of
> classes
> >>>> which IMO are completely unnecessary. ReadTask defines some with*()
> methods,
> >>>> such as withSearch, withTraverse etc. and many classes override
> ReadTask
> >>>> just to return true/false in those methods. WarmTask for example,
> returns
> >>>> true in withWarm() and false otherwise, while SearchTask returns true
> in
> >>>> withSearch and false otherwise.
> >>>>
> >>>> This created a whole set of extensions that you either need to run in
> >>>> sequence (e.g. Warm, SearchWithCollector) or create your own extension
> just
> >>>> to get the right recipe for the operations to perform.
> >>>>
> >>>> I suggest we do the following changes:
> >>>> * Rename ReadTask to SearchTask -- that's because RT uses
> IndexSearcher,
> >>>> QueryMaker -- all that suggests it's about Searching and not Reading.
> It's
> >>>> only semantics, I know, but I think SearchTask is clearer than
> ReadTask
> >>>> * Get rid of all the with*() methods, and instead move to use
> >>>> properties: search.with.warm, search.with.traverse,
> search.with.collector
> >>>> etc.
> >>>> * Introduce protected createCollector, createFilter, createSort, for
> >>>> custom extensions
> >>>> * Create a completely new hierarchy for this task, throwing away
> >>>> everything that can be handled through properties only (like
> SearchTask,
> >>>> WarmTask etc.)
> >>>>
> >>>> If we do this, then extensions of the new SearchTask will need to ask
> >>>> themselves "do I want to search w/ a Collector/Filter/custom Sort?"
> and not
> >>>> "do I Warm to be executed?" The core operation behind this task is
> >>>> IndexSearcher.search. The rest are just settings, or configuration, as
> well
> >>>> as some added ops like warm, and traverse. If it makes sense, I can
> factor
> >>>> warm() and traverse() into their own protected methods, for extensions
> to
> >>>> override as well. It might make sense for warm because custom warms is
> >>>> something I'm sure will be needed.
> >>>>
> >>>> This will also allow running algorithms with rounds - different
> >>>> properties for different rounds.
> >>>>
> >>>> This approach does not prevent one from creating MySearchTask with
> >>>> pre-defined and hard-coded settings. But for many others, the question
> of
> >>>> which task to execute will go away - you execute SearchTask for the
> basic
> >>>> search operations, or w/ the default Collector/Sort, and you control
> it via
> >>>> properties. To create your own *SearchTask extension which hard-codes
> a
> >>>> recipe, you'll need access to all the do<OP> members, so I'll make
> them
> >>>> protected. But that's IMO is a rare requirement, than say running a
> search
> >>>> with warm + traverse, and you shouldn't be forced to create a ReadTask
> >>>> extension for that.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Shai
> >>>
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: ReadTask and its hierarchy needs some house cleaning

Posted by Michael McCandless <lu...@mikemccandless.com>.

I agree we should do some house cleaning here...

Can't we make "warm", "search", "trav" separate tasks?

In fact what is now done by "warm" (just calling .document on all
non-deleted docs) is not usually how warming is done.  I would rather
rename this to a "LoadAllDocsTask".  We could add other specific
warming tasks -- say PopulateFieldCacheTask.  But... typical warming
is just to run certain targeted searches on the newly opened reader.
I don't think the Search task should have any notion of warming
itself.

I think "trav" should be renamed to something like "VisitHitsTask",
and it can take options like "number of hits to visit", "load Document
or not", "run highlighter or not", etc.  Really... ideally, these
would be separate tasks as well, and the alg would let me, say,
iterate over N hits, invoking the per-hit tasks that I'd like to do
for each.

Mike

On Tue, May 18, 2010 at 5:43 AM, Doron Cohen <cd...@gmail.com> wrote:
>> Yes, such algorithms will be affected, but not necessarily deleted. So if
>> a WarmReader task is required, one can write it, but it doesn't need to
>> extend SearchTask, or it can, but hard-code all the other properties to
>> false. Though in most cases you can run SearchTask, w/ warm set to true and
>> after warm has been done, queries will be executed.
>
> Say you want to keep allowing doing only warm and also allowing doing search
> with warm. Since each opens a reader, forcing search with warm to be made by
> two tasks would mean you either can only use the shared reader (in which
> case the reader will only be opened once) or each of them opens its own
> reader - and then you are opening two readers, which is a noise in the perf
> run.
>
>> The other search lines look like they can be rewritten w/ rounds?
>
> Making the package weaker...
>
>> Anyway, this shows a perfect example for my argument: Search, SearchTrav
>> and SearchTravRet only differ by their config options, yet an entire class
>> had to be created.
>
>
> Extending to modify behavior is pretty common... in regular Java code
> anonymous classes would do it for you (no need to create that class in a
> file) just that in bm tasks are created by reflection, so a concrete class
> is needed.
>
>> If such sequence Search tasks is so crucial to be allowed, we can have
>> SearchTask accept parameters, like Search(trav=false,warm=true) etc. in
>> addition to the static .alg ones. I think personally it's an overkill, but
>> could be a nice addition. It will definitely allow the above tasks to run in
>> sequence, right?
>
> I actually like this, but slightly different, see below.
>
>>
>> Point is, if you have 6 attributes, you don't need to create 64 classes in
>> order to execute any combination you may need.
>
> Good point...
>
> On one hand it is readable and useful to have a concrete class for a
> concrete task.
> On the other hand it doesn't make sense to have too many combinations.
> But then who needs all of them? and the ones really needed can easily be
> created...
>
> It seems to me that the suggested change is affecting the simplicity of
> writing algs..?
>
> ... so, I like the suggestion Search(trav=false,warm=true), just that I'd do
> it differently - Read(search,warm).
> (Read and not Search because of the above reader's consideration.)
> Putting the Read/Search away for now, you could modify PerfTask so that
> setParams() would set properties, where properties are delimited by ',', and
> for each property, if it contains '=' it is interpreted as name=value,
> otherwise it is interpreted as name=true. With modification, an algorithm
> having SearchTrav(1000) is modified to SearchTrav(ntrav=1000), all the
> params parsing moves to PerfTask, and each task class need only know the
> name of its properties, which should differ from those of its supers. So
> SearchTravTask's setParams would modify to something like this:
>
> {code}
>   public void setParams(String params) {
>     super.setParams(params);
>     traversalSize =
> (int)Float.parseFloat(getProperty("ntrav",DEFAULT_N_TRAV);
>   }
> {code}
>
> This would also allow to get rid of supportsParams() - unused params would
> be silently ignored.
>
> What do you think?
>
>>
>> Shai
>>
>> On Tue, May 18, 2010 at 10:32 AM, Doron Cohen <cd...@gmail.com> wrote:
>>>
>>> How would this affect for example current micro-standard.alg?
>>>
>>> In particular this part of it:
>>>
>>> {code}
>>>         ...
>>>         { "WarmNewRdr" Warm > : 50
>>>         { "SrchNewRdr" Search > : 500
>>>         { "SrchTrvNewRdr" SearchTrav(1000) > : 300
>>>         { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
>>>         ...
>>> {code}
>>>
>>> Proposed change gets rid of these tasks, right?
>>>
>>> Doron
>>>
>>> On Tue, May 18, 2010 at 10:06 AM, Shai Erera <se...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>> I wanted to run a benchmark .alg which will take a Filter into account.
>>>> However, ReadTask, which is the base for a variety of search related tasks,
>>>> does not support a Filter. When I reviewed the class, to understand how I
>>>> can easily add such Filter support, I discovered a whole set of classes
>>>> which IMO are completely unnecessary. ReadTask defines some with*() methods,
>>>> such as withSearch, withTraverse etc. and many classes override ReadTask
>>>> just to return true/false in those methods. WarmTask for example, returns
>>>> true in withWarm() and false otherwise, while SearchTask returns true in
>>>> withSearch and false otherwise.
>>>>
>>>> This created a whole set of extensions that you either need to run in
>>>> sequence (e.g. Warm, SearchWithCollector) or create your own extension just
>>>> to get the right recipe for the operations to perform.
>>>>
>>>> I suggest we do the following changes:
>>>> * Rename ReadTask to SearchTask -- that's because RT uses IndexSearcher,
>>>> QueryMaker -- all that suggests it's about Searching and not Reading. It's
>>>> only semantics, I know, but I think SearchTask is clearer than ReadTask
>>>> * Get rid of all the with*() methods, and instead move to use
>>>> properties: search.with.warm, search.with.traverse, search.with.collector
>>>> etc.
>>>> * Introduce protected createCollector, createFilter, createSort, for
>>>> custom extensions
>>>> * Create a completely new hierarchy for this task, throwing away
>>>> everything that can be handled through properties only (like SearchTask,
>>>> WarmTask etc.)
>>>>
>>>> If we do this, then extensions of the new SearchTask will need to ask
>>>> themselves "do I want to search w/ a Collector/Filter/custom Sort?" and not
>>>> "do I Warm to be executed?" The core operation behind this task is
>>>> IndexSearcher.search. The rest are just settings, or configuration, as well
>>>> as some added ops like warm, and traverse. If it makes sense, I can factor
>>>> warm() and traverse() into their own protected methods, for extensions to
>>>> override as well. It might make sense for warm because custom warms is
>>>> something I'm sure will be needed.
>>>>
>>>> This will also allow running algorithms with rounds - different
>>>> properties for different rounds.
>>>>
>>>> This approach does not prevent one from creating MySearchTask with
>>>> pre-defined and hard-coded settings. But for many others, the question of
>>>> which task to execute will go away - you execute SearchTask for the basic
>>>> search operations, or w/ the default Collector/Sort, and you control it via
>>>> properties. To create your own *SearchTask extension which hard-codes a
>>>> recipe, you'll need access to all the do<OP> members, so I'll make them
>>>> protected. But that's IMO is a rare requirement, than say running a search
>>>> with warm + traverse, and you shouldn't be forced to create a ReadTask
>>>> extension for that.
>>>>
>>>> What do you think?
>>>>
>>>> Shai
>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: ReadTask and its hierarchy needs some house cleaning

Posted by Doron Cohen <cd...@gmail.com>.

>
> Yes, such algorithms will be affected, but not necessarily deleted. So if a
> WarmReader task is required, one can write it, but it doesn't need to extend
> SearchTask, or it can, but hard-code all the other properties to false.
> Though in most cases you can run SearchTask, w/ warm set to true and after
> warm has been done, queries will be executed.
>

Say you want to keep allowing doing only warm and also allowing doing search
with warm. Since each opens a reader, forcing search with warm to be made by
two tasks would mean you either can only use the shared reader (in which
case the reader will only be opened once) or each of them opens its own
reader - and then you are opening two readers, which is a noise in the perf
run.

The other search lines look like they can be rewritten w/ rounds?
>

Making the package weaker...

Anyway, this shows a perfect example for my argument: Search, SearchTrav and
> SearchTravRet only differ by their config options, yet an entire class had
> to be created.
>

Extending to modify behavior is pretty common... in regular Java code
anonymous classes would do it for you (no need to create that class in a
file) just that in bm tasks are created by reflection, so a concrete class
is needed.

If such sequence Search tasks is so crucial to be allowed, we can have
> SearchTask accept parameters, like Search(trav=false,warm=true) etc. in
> addition to the static .alg ones. I think personally it's an overkill, but
> could be a nice addition. It will definitely allow the above tasks to run in
> sequence, right?
>

I actually like this, but slightly different, see below.


> Point is, if you have 6 attributes, you don't need to create 64 classes in
> order to execute any combination you may need.
>

Good point...

On one hand it is readable and useful to have a concrete class for a
concrete task.
On the other hand it doesn't make sense to have too many combinations.
But then who needs all of them? and the ones really needed can easily be
created...

It seems to me that the suggested change is affecting the simplicity of
writing algs..?

... so, I like the suggestion Search(trav=false,warm=true), just that I'd do
it differently - Read(search,warm).
(Read and not Search because of the above reader's consideration.)
Putting the Read/Search away for now, you could modify PerfTask so that
setParams() would set properties, where properties are delimited by ',', and
for each property, if it contains '=' it is interpreted as name=value,
otherwise it is interpreted as name=true. With modification, an algorithm
having SearchTrav(1000) is modified to SearchTrav(ntrav=1000), all the
params parsing moves to PerfTask, and each task class need only know the
name of its properties, which should differ from those of its supers. So
SearchTravTask's setParams would modify to something like this:

{code}
  public void setParams(String params) {
    super.setParams(params);
    traversalSize =
(int)Float.parseFloat(getProperty("ntrav",DEFAULT_N_TRAV);
  }
{code}

This would also allow to get rid of supportsParams() - unused params would
be silently ignored.

What do you think?


> Shai
>
>
> On Tue, May 18, 2010 at 10:32 AM, Doron Cohen <cd...@gmail.com> wrote:
>
>> How would this affect for example current micro-standard.alg?
>>
>> In particular this part of it:
>>
>> {code}
>>         ...
>>         { "WarmNewRdr" Warm > : 50
>>         { "SrchNewRdr" Search > : 500
>>         { "SrchTrvNewRdr" SearchTrav(1000) > : 300
>>         { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
>>         ...
>> {code}
>>
>> Proposed change gets rid of these tasks, right?
>>
>> Doron
>>
>>
>> On Tue, May 18, 2010 at 10:06 AM, Shai Erera <se...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I wanted to run a benchmark .alg which will take a Filter into account.
>>> However, ReadTask, which is the base for a variety of search related tasks,
>>> does not support a Filter. When I reviewed the class, to understand how I
>>> can easily add such Filter support, I discovered a whole set of classes
>>> which IMO are completely unnecessary. ReadTask defines some with*() methods,
>>> such as withSearch, withTraverse etc. and many classes override ReadTask
>>> just to return true/false in those methods. WarmTask for example, returns
>>> true in withWarm() and false otherwise, while SearchTask returns true in
>>> withSearch and false otherwise.
>>>
>>> This created a whole set of extensions that you either need to run in
>>> sequence (e.g. Warm, SearchWithCollector) or create your own extension just
>>> to get the right recipe for the operations to perform.
>>>
>>> I suggest we do the following changes:
>>> * Rename ReadTask to SearchTask -- that's because RT uses IndexSearcher,
>>> QueryMaker -- all that suggests it's about Searching and not Reading. It's
>>> only semantics, I know, but I think SearchTask is clearer than ReadTask
>>> * Get rid of all the with*() methods, and instead move to use properties:
>>> search.with.warm, search.with.traverse, search.with.collector etc.
>>> * Introduce protected createCollector, createFilter, createSort, for
>>> custom extensions
>>> * Create a completely new hierarchy for this task, throwing away
>>> everything that can be handled through properties only (like SearchTask,
>>> WarmTask etc.)
>>>
>>> If we do this, then extensions of the new SearchTask will need to ask
>>> themselves "do I want to search w/ a Collector/Filter/custom Sort?" and not
>>> "do I Warm to be executed?" The core operation behind this task is
>>> IndexSearcher.search. The rest are just settings, or configuration, as well
>>> as some added ops like warm, and traverse. If it makes sense, I can factor
>>> warm() and traverse() into their own protected methods, for extensions to
>>> override as well. It might make sense for warm because custom warms is
>>> something I'm sure will be needed.
>>>
>>> This will also allow running algorithms with rounds - different
>>> properties for different rounds.
>>>
>>> This approach does not prevent one from creating MySearchTask with
>>> pre-defined and hard-coded settings. But for many others, the question of
>>> which task to execute will go away - you execute SearchTask for the basic
>>> search operations, or w/ the default Collector/Sort, and you control it via
>>> properties. To create your own *SearchTask extension which hard-codes a
>>> recipe, you'll need access to all the do<OP> members, so I'll make them
>>> protected. But that's IMO is a rare requirement, than say running a search
>>> with warm + traverse, and you shouldn't be forced to create a ReadTask
>>> extension for that.
>>>
>>> What do you think?
>>>
>>> Shai
>>>
>>
>>
>

Re: ReadTask and its hierarchy needs some house cleaning

Posted by Shai Erera <se...@gmail.com>.

Yes, such algorithms will be affected, but not necessarily deleted. So if a
WarmReader task is required, one can write it, but it doesn't need to extend
SearchTask, or it can, but hard-code all the other properties to false.
Though in most cases you can run SearchTask, w/ warm set to true and after
warm has been done, queries will be executed.

The other search lines look like they can be rewritten w/ rounds?

Anyway, this shows a perfect example for my argument: Search, SearchTrav and
SearchTravRet only differ by their config options, yet an entire class had
to be created.

If such sequence Search tasks is so crucial to be allowed, we can have
SearchTask accept parameters, like Search(trav=false,warm=true) etc. in
addition to the static .alg ones. I think personally it's an overkill, but
could be a nice addition. It will definitely allow the above tasks to run in
sequence, right?

Point is, if you have 6 attributes, you don't need to create 64 classes in
order to execute any combination you may need.

Shai

On Tue, May 18, 2010 at 10:32 AM, Doron Cohen <cd...@gmail.com> wrote:

> How would this affect for example current micro-standard.alg?
>
> In particular this part of it:
>
> {code}
>         ...
>         { "WarmNewRdr" Warm > : 50
>         { "SrchNewRdr" Search > : 500
>         { "SrchTrvNewRdr" SearchTrav(1000) > : 300
>         { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
>         ...
> {code}
>
> Proposed change gets rid of these tasks, right?
>
> Doron
>
>
> On Tue, May 18, 2010 at 10:06 AM, Shai Erera <se...@gmail.com> wrote:
>
>> Hi
>>
>> I wanted to run a benchmark .alg which will take a Filter into account.
>> However, ReadTask, which is the base for a variety of search related tasks,
>> does not support a Filter. When I reviewed the class, to understand how I
>> can easily add such Filter support, I discovered a whole set of classes
>> which IMO are completely unnecessary. ReadTask defines some with*() methods,
>> such as withSearch, withTraverse etc. and many classes override ReadTask
>> just to return true/false in those methods. WarmTask for example, returns
>> true in withWarm() and false otherwise, while SearchTask returns true in
>> withSearch and false otherwise.
>>
>> This created a whole set of extensions that you either need to run in
>> sequence (e.g. Warm, SearchWithCollector) or create your own extension just
>> to get the right recipe for the operations to perform.
>>
>> I suggest we do the following changes:
>> * Rename ReadTask to SearchTask -- that's because RT uses IndexSearcher,
>> QueryMaker -- all that suggests it's about Searching and not Reading. It's
>> only semantics, I know, but I think SearchTask is clearer than ReadTask
>> * Get rid of all the with*() methods, and instead move to use properties:
>> search.with.warm, search.with.traverse, search.with.collector etc.
>> * Introduce protected createCollector, createFilter, createSort, for
>> custom extensions
>> * Create a completely new hierarchy for this task, throwing away
>> everything that can be handled through properties only (like SearchTask,
>> WarmTask etc.)
>>
>> If we do this, then extensions of the new SearchTask will need to ask
>> themselves "do I want to search w/ a Collector/Filter/custom Sort?" and not
>> "do I Warm to be executed?" The core operation behind this task is
>> IndexSearcher.search. The rest are just settings, or configuration, as well
>> as some added ops like warm, and traverse. If it makes sense, I can factor
>> warm() and traverse() into their own protected methods, for extensions to
>> override as well. It might make sense for warm because custom warms is
>> something I'm sure will be needed.
>>
>> This will also allow running algorithms with rounds - different properties
>> for different rounds.
>>
>> This approach does not prevent one from creating MySearchTask with
>> pre-defined and hard-coded settings. But for many others, the question of
>> which task to execute will go away - you execute SearchTask for the basic
>> search operations, or w/ the default Collector/Sort, and you control it via
>> properties. To create your own *SearchTask extension which hard-codes a
>> recipe, you'll need access to all the do<OP> members, so I'll make them
>> protected. But that's IMO is a rare requirement, than say running a search
>> with warm + traverse, and you shouldn't be forced to create a ReadTask
>> extension for that.
>>
>> What do you think?
>>
>> Shai
>>
>
>

Re: ReadTask and its hierarchy needs some house cleaning

Posted by Doron Cohen <cd...@gmail.com>.

How would this affect for example current micro-standard.alg?

In particular this part of it:

{code}
        ...
        { "WarmNewRdr" Warm > : 50
        { "SrchNewRdr" Search > : 500
        { "SrchTrvNewRdr" SearchTrav(1000) > : 300
        { "SrchTrvRetNewRdr" SearchTravRet(2000) > : 100
        ...
{code}

Proposed change gets rid of these tasks, right?

Doron

On Tue, May 18, 2010 at 10:06 AM, Shai Erera <se...@gmail.com> wrote:

> Hi
>
> I wanted to run a benchmark .alg which will take a Filter into account.
> However, ReadTask, which is the base for a variety of search related tasks,
> does not support a Filter. When I reviewed the class, to understand how I
> can easily add such Filter support, I discovered a whole set of classes
> which IMO are completely unnecessary. ReadTask defines some with*() methods,
> such as withSearch, withTraverse etc. and many classes override ReadTask
> just to return true/false in those methods. WarmTask for example, returns
> true in withWarm() and false otherwise, while SearchTask returns true in
> withSearch and false otherwise.
>
> This created a whole set of extensions that you either need to run in
> sequence (e.g. Warm, SearchWithCollector) or create your own extension just
> to get the right recipe for the operations to perform.
>
> I suggest we do the following changes:
> * Rename ReadTask to SearchTask -- that's because RT uses IndexSearcher,
> QueryMaker -- all that suggests it's about Searching and not Reading. It's
> only semantics, I know, but I think SearchTask is clearer than ReadTask
> * Get rid of all the with*() methods, and instead move to use properties:
> search.with.warm, search.with.traverse, search.with.collector etc.
> * Introduce protected createCollector, createFilter, createSort, for custom
> extensions
> * Create a completely new hierarchy for this task, throwing away everything
> that can be handled through properties only (like SearchTask, WarmTask etc.)
>
> If we do this, then extensions of the new SearchTask will need to ask
> themselves "do I want to search w/ a Collector/Filter/custom Sort?" and not
> "do I Warm to be executed?" The core operation behind this task is
> IndexSearcher.search. The rest are just settings, or configuration, as well
> as some added ops like warm, and traverse. If it makes sense, I can factor
> warm() and traverse() into their own protected methods, for extensions to
> override as well. It might make sense for warm because custom warms is
> something I'm sure will be needed.
>
> This will also allow running algorithms with rounds - different properties
> for different rounds.
>
> This approach does not prevent one from creating MySearchTask with
> pre-defined and hard-coded settings. But for many others, the question of
> which task to execute will go away - you execute SearchTask for the basic
> search operations, or w/ the default Collector/Sort, and you control it via
> properties. To create your own *SearchTask extension which hard-codes a
> recipe, you'll need access to all the do<OP> members, so I'll make them
> protected. But that's IMO is a rare requirement, than say running a search
> with warm + traverse, and you shouldn't be forced to create a ReadTask
> extension for that.
>
> What do you think?
>
> Shai
>