You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by John Wang <jo...@gmail.com> on 2009/04/21 02:36:25 UTC

Re: [jira] Issue Comment Edited: (LUCENE-1536) if a filter can support random access API, we should use it

Maybe I am not understanding the patch. But isn't casting from
Filter.getDocIdSet to OpenBitSet kinda dangerous and assuming Filter
constructing a Bitset something we want to move away from?

-John


On Mon, Apr 20, 2009 at 4:27 PM, Jason Rutherglen (JIRA) <ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700984#action_12700984]
>
> Jason Rutherglen edited comment on LUCENE-1536 at 4/20/09 4:26 PM:
> -------------------------------------------------------------------
>
> Perhaps we can go ahead with this patch given we're not sure how
> to do an optimized version of LUCENE-1518 yet. This patch
> entails passing the RandomAccessFilter to TermScorer, what's a
> good way to do this without rewriting too much of the Lucene API?
>
> * TermQuery.createWeight -> TermWeight.scorer instantiates the
> TermScorer which is where we need to pass in the filter? So we
> could somehow pass the filter in via multiple constructors? I
> didn't see a clean API way though.
>
> * Or we can add a new method to Scorer, something like
> getSequentialSubScorers? Which we then iterate over and if one
> is a TermScorer set the filter(s). This setting of the RAF would
> happen in IndexSearcher.doSearch.
>
>      was (Author: jasonrutherglen):
>    Perhaps we can go ahead with this patch given we're not sure how
> to do an optimized version of LUCENE-1345 yet. This patch
> entails passing the RandomAccessFilter to TermScorer, what's a
> good way to do this without rewriting too much of the Lucene API?
>
> * TermQuery.createWeight -> TermWeight.scorer instantiates the
> TermScorer which is where we need to pass in the filter? So we
> could somehow pass the filter in via multiple constructors? I
> didn't see a clean API way though.
>
> * Or we can add a new method to Scorer, something like
> getSequentialSubScorers? Which we then iterate over and if one
> is a TermScorer set the filter(s). This setting of the RAF would
> happen in IndexSearcher.doSearch.
>
> > if a filter can support random access API, we should use it
> > -----------------------------------------------------------
> >
> >                 Key: LUCENE-1536
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
> >             Project: Lucene - Java
> >          Issue Type: Improvement
> >          Components: Search
> >    Affects Versions: 2.4
> >            Reporter: Michael McCandless
> >            Assignee: Michael McCandless
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: LUCENE-1536.patch
> >
> >
> > I ran some performance tests, comparing applying a filter via
> > random-access API instead of current trunk's iterator API.
> > This was inspired by LUCENE-1476, where we realized deletions should
> > really be implemented just like a filter, but then in testing found
> > that switching deletions to iterator was a very sizable performance
> > hit.
> > Some notes on the test:
> >   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> >     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
> >   * I test across multiple queries.  1-X means an OR query, eg 1-4
> >     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> >     AND 3 AND 4.  "u s" means "united states" (phrase search).
> >   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> >     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
> >     100 (filter=null, control)).
> >   * Method high means I use random-access filter API in
> >     IndexSearcher's main loop.  Method low means I use random-access
> >     filter API down in SegmentTermDocs (just like deleted docs
> >     today).
> >   * Baseline (QPS) is current trunk, where filter is applied as iterator
> up
> >     "high" (ie in IndexSearcher's search loop).
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>