You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/06/09 23:10:02 UTC

[jira] [Commented] (LUCENE-6539) Add DocValuesNumbersQuery, like DocValuesTermsQuery but works only with long values

    [ https://issues.apache.org/jira/browse/LUCENE-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579566#comment-14579566 ] 

Adrien Grand commented on LUCENE-6539:
--------------------------------------

This new query looks good to me. However instead of keeping adding such queries to core, I think we should consider moving all our doc values queries to misc since they have complicated trade-offs and are only useful in expert use-cases?

{code}
+  private static Set<Long> toSet(Long[] array) {
+    Set<Long> numbers = new HashSet<>();
+    for(Long number : array) {
+      numbers.add(number);
+    }
+    return numbers;
+  }
{code}

FYI you don't need this helper and could do just: {{new HashSet<Long>(Arrays.asList(array))}}.

bq. in certain cases (many terms/numbers and fewish matching hits) it should be faster than using TermsQuery

This comment got me confused: I think in general these queries are more efficient when they match _many_ documents, ie. even when an equivalent TermsQuery would not be used as a lead iterator in a conjunction? I think the only case when such a query matching few documents would be useful would be in a prohibited clause since these prohibited clauses can never be used to lead iteration anyway and are only used in a random-access fashion?

> Add DocValuesNumbersQuery, like DocValuesTermsQuery but works only with long values
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-6539
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6539
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk, 5.3
>
>         Attachments: LUCENE-6539.patch
>
>
> This query accepts any document where any of the provided set of longs
> was indexed into the specified field as a numeric DV field
> (NumericDocValuesField or SortedNumericDocValuesField).  You can use
> it instead of DocValuesTermsQuery when you have field values that can
> be represented as longs.
> Like DocValuesTermsQuery, this is slowish in general, since it doesn't
> use an inverted data structure, but in certain cases (many
> terms/numbers and fewish matching hits) it should be faster than using
> TermsQuery because it's done as a "post filter" when other (faster)
> query clauses are MUST'd with it.
> In such cases it should also be faster than DocValuesTermsQuery since
> it skips having to resolve terms -> ords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org