You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/08/06 05:46:12 UTC

[jira] [Updated] (SOLR-6318) QParser for TermsFilter

     [ https://issues.apache.org/jira/browse/SOLR-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-6318:
-------------------------------

    Attachment: SOLR-6318__terms_QParser.patch

Here it is, with test.
>From the javadoc:

bq. Finds documents whose specified field has any of the specified values. It's like TermQParserPlugin but multi-valued, and supports a variety of internal algorithms. Parameters: f: The field name (mandatory) separator: the separator delimiting the values in the query string. By default it's a " " which is special in that it splits on any consecutive whitespace. method: Any of termsFilter (default), booleanQuery, automaton, docValuesTermsFilter. Note that if no values are specified then the query matches no documents.

It would be cool if somebody did some benchmarking that would allow us to choose between some of the algorithms based on heuristics... but this is fine for now.  For example use method=X when the number of values is > some value.  And use docValuesTermsFilter if docValues is enabled.  Note that DocValuesTermsFilter (trunk) is known as FieldCacheTermsFilter on 4x.  On 4x this feature doesn't support DocValues (just FieldCache) whereas on trunk it supports both depending on wether you indexed DocValues or not (I think).  That method is also limited to single valued fields, but there's no explicit check.

I'll commit this in a couple days, pending input.

> QParser for TermsFilter
> -----------------------
>
>                 Key: SOLR-6318
>                 URL: https://issues.apache.org/jira/browse/SOLR-6318
>             Project: Solr
>          Issue Type: New Feature
>          Components: query parsers
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 4.10
>
>         Attachments: SOLR-6318__terms_QParser.patch
>
>
> Some applications require filtering documents by a large number of terms.  It's often related to security filtering.  Naively this is done this way:
> {noformat}
>     fq={!df=myfield q.op=OR}code1 code2 code3 code4 code5...
> {noformat}
> And this ends up being a BooleanQuery.  Users then wind up hitting BooleaQuery.maxClauseCount (sometimes in production, sadly) and they up it to a huge number to get the job done.
> Solr should offer a QParser based on TermsFilter.  I propose it be named "terms" (plural of term), and have a "separator" option defaulting to a space.  When it's a space, the values also get trimmed, which wouldn't otherwise happen.  The analysis logic should be the same as that for "term" QParser which is to call FieldType.readableToIndexed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org