You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sriram Sankar <sa...@gmail.com> on 2013/08/21 02:24:10 UTC
Re: Performance measurements

Just returned from a vacation and getting back to this.  I am trying to
understand why filters can be faster when the filter is used only once.
 From what I've read, it looks like filters give performance gains when
cached and called repeatedly.  Also, since they build a bit set for the
entire list of docs, I'm wondering how they can be used in an early
termination use case - wouldn't the filters still go through all the docs?

With respect to my usage, I'm pretty sure I only called the
score(Collector) method and not the one without arguments (where the real
cost of scoring is incurred).  The ConstantScorer (and ConstantScoreQuery)
only seem to save time when score() without parameters is called.  Is it
possible that I'm calling score() without realizing it?

Thanks,

Sriram



On Thu, Jul 25, 2013 at 1:56 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> In addition, although I am a bit beyond my expertise here, I believe you
> should be able to take any query object, including one returned from a
> query parser, wrap it with a ConstantScoreQuery, and then search on the CSQ
> to avoid all the scoring overhead.
>
> For example, "*:*" is super fast even though it matches everything - no
> scoring.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Arjen van der Meijden
> Sent: Thursday, July 25, 2013 3:06 PM
>
> To: java-user@lucene.apache.org
> Subject: Re: Performance measurements
>
> Hi Sriram,
>
> I don't see any obvious mistakes, although you don't need to create a
> FilteredQuery: There are plenty of search-methods on the IndexSearcher
> that accept both a query (your TermQuery) and a filter (your TermsFilter).
>
> The way I understand Filters (but I have no advanced in-depth knowledge
> of them) is that they are very similar to Queries.
> Queries are used for two tasks; matching a item and giving some measure
> of how "well" it matched (i.e. the score).
> Filters are used only for matching, but I doubt there is very much
> difference from a technical point of view between the to ways of
> matching items.
>
> I'll leave more detailed explanations to others, as I might make too
> many mistakes or just assume I know something I actually don't :)
>
> Best regards,
>
> Arjen
>
> On 25-7-2013 19:56 Sriram Sankar wrote:
>
>> Thanks everyone.  I'm trying this out:
>>
>>  So searching would become:
>>> - Create a Query with only your termA
>>> - Create a TermsFilter with all your termB's
>>> - execute your preferred search-method with both the query and the filter
>>>
>>
>> I don't the get the same results as before - and am still debugging.  But
>> I'm including before and after code in case someone is able to see a
>> problem with what I'm doing.
>>
>> I'm also looking for docs on how filters work (or will read the code). But
>> at a high level, is the filter fully created when the Filter object is
>> created?  Or is it incrementally built during traversal (when next() and
>> advance() are called on the filters).  Reason for this question is related
>> to early termination.
>>
>> The two versions of code (query based and filter based) are shown below -
>> let me know if you see a problem with either.  Ignore any minor syntactic
>> errors that may have got introduced as I simplified my code for inclusion
>> here.
>>
>> Thanks,
>>
>> Sriram.
>>
>>
>> QUERY APPROACH:
>>
>> BooleanQuery orTerms = new BooleanQuery();
>> for (int i = 0; i < orCount; ++i) {
>>      TermQuery orArg = new TermQuery(new Term("conn",
>>       Integer.toString(connection[i]**)));
>>      BooleanClause cl = new BooleanClause(orArg,
>> BooleanClause.Occur.SHOULD);
>>      orTerms.add(cl);
>> }
>> TermQuery tq = new TermQuery(new Term("name", name));
>> BooleanQuery query = new BooleanQuery();
>> query.add(new BooleanClause(tq, BooleanClause.Occur.MUST));
>> query.add(new BooleanClause(orTerms, BooleanClause.Occur.MUST));
>>
>> FILTER APPROACH:
>>
>> List<Term> orTerms = new ArrayList<Term>();
>> for (int i = 0; i < orCount; ++i) {
>>      terms.add(new Term("conn",
>>         Integer.toString(connection[i]**)));
>> }
>> TermsFilter conns = new TermsFilter(terms);
>> TermQuery tq = new TermQuery(new Term("name", name));
>> FilteredQuery query = new FilteredQuery(tq, conns);
>>
>>
>>
>> On Thu, Jul 25, 2013 at 12:14 AM, Arjen van der Meijden <
>> acmmailing@tweakers.net> wrote:
>>
>>  On 24-7-2013 21:58 Sriram Sankar wrote:
>>>
>>>  On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky <
>>>> jack@basetechnology.com
>>>>
>>>>> **wrote:
>>>>>
>>>>
>>>>  Scoring has been a major focus of Lucene. Non-scored filters are also
>>>>>
>>>>> available, but the query parsers are focused (exclusively) on
>>>>> scored-search.
>>>>>
>>>>>
>>>>>  When you say "filter" do you mean a step performed after retrieval?
>>>>  Or is
>>>> it yet another retrieval operation?
>>>>
>>>>
>>> He is really referring to the Filters available as an addition to
>>> retrieval. The ones you supply with the search-method:
>>> http://lucene.apache.org/core/****4_4_0/core/org/apache/**lucene/**<http://lucene.apache.org/core/**4_4_0/core/org/apache/lucene/**>
>>> search/IndexSearcher.html#****search%28org.apache.lucene.**
>>> search.Query,%20org.apache.****lucene.search.Filter,%20int%**29<
>>> http://lucene.apache.org/**core/4_4_0/core/org/apache/**
>>> lucene/search/IndexSearcher.**html#search%28org.apache.**
>>> lucene.search.Query,%20org.**apache.lucene.search.Filter,%**20int%29<http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/IndexSearcher.html#search%28org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter,%20int%29>
>>> >
>>>
>>> Unfortunately the documentation of Lucene is a bit fragmented, but
>>> basically they limit the scope of your search domain (i.e. reduce the
>>> available set of documents) during the processing of a query. So it
>>> basically becomes (query) AND (filters).
>>>
>>> There are several useful implementations available for the filters. But
>>> in
>>> your case you can just create a single TermsFilter (its in the queries
>>> module/package) which is simply a OR-list like the one in your example
>>> (similar to a basic IN in sql):
>>>
>>> http://lucene.apache.org/core/****4_4_0/queries/org/apache/**<http://lucene.apache.org/core/**4_4_0/queries/org/apache/**>
>>> lucene/queries/TermsFilter.****html<http://lucene.apache.org/**
>>> core/4_4_0/queries/org/apache/**lucene/queries/TermsFilter.**html<http://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/TermsFilter.html>
>>> >
>>>
>>> So searching would become:
>>> - Create a Query with only your termA
>>> - Create a TermsFilter with all your termB's
>>> - execute your preferred search-method with both the query and the filter
>>>
>>> If you where interested in the scores of each result, this would not work
>>> too well since all scores will only be based on the query that only
>>> contains termA... But since you don't care about that, this should be get
>>> you a big performance gain.
>>>
>>> Best regards,
>>>
>>> Arjen
>>>
>>>
>>> ------------------------------****----------------------------**
>>> --**---------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>>> java-user-**unsubscribe@lucene.apache.org<ja...@lucene.apache.org>
>>> >
>>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>>> java-user-help@lucene.**apache.org <ja...@lucene.apache.org>>
>>>
>>>
>>>
>>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
>