You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hendrik Haddorp <he...@gmx.net> on 2017/07/20 13:20:06 UTC

finds all documents without a value for field

Hi,

the Solr 6.6. ref guide states that to "finds all documents without a 
value for field" you can use:
-field:[* TO *]

While this is true I'm wondering why it is recommended to use a range 
query instead of simply:
-field:*

regards,
Hendrik

Re: finds all documents without a value for field

Posted by Hendrik Haddorp <he...@gmx.net>.
forgot the link with the statement:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html

On 20.07.2017 15:20, Hendrik Haddorp wrote:
> Hi,
>
> the Solr 6.6. ref guide states that to "finds all documents without a 
> value for field" you can use:
> -field:[* TO *]
>
> While this is true I'm wondering why it is recommended to use a range 
> query instead of simply:
> -field:*
>
> regards,
> Hendrik


Re: finds all documents without a value for field

Posted by Erick Erickson <er...@gmail.com>.
One other possibility is to create a second boolean field "has_terms"
or something and just add an fq clause like "&fq=has_terms:false"....

On Thu, Jul 20, 2017 at 12:00 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 7/20/2017 7:20 AM, Hendrik Haddorp wrote:
>> the Solr 6.6. ref guide states that to "finds all documents without a
>> value for field" you can use:
>> -field:[* TO *]
>>
>> While this is true I'm wondering why it is recommended to use a range
>> query instead of simply:
>> -field:*
>
> Performance.
>
> A wildcard is expanded to all possible term values for that field.  If
> the field has millions of possible terms, then the query object created
> at the Lucene level will quite literally have millions of terms in it.
> No matter how you approach a query with those characteristics, it's
> going to be slow, for both getting the terms list and executing the query.
>
> A full range query might be somewhat slow when there are many possible
> values, but it's a lot faster than a wildcard in those cases.
>
> If the field is only used by a handful of documents and has very few
> possible values, then it might be faster than a range query ... but this
> is not common, so the recommended way to do this is with a range query.
>
> Thanks,
> Shawn
>

Re: finds all documents without a value for field

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/20/2017 3:27 PM, Hendrik Haddorp wrote:
> If the range query is so much better shouldn't the Solr query parser
> create a range query for a token query that only contains the
> wildcard? For the *:* case it does already contain a special path. 

The *:* query is a special string.  Although it *looks* like it has a
wildcard for the field and a wildcard for the value, this is now how the
query parser treats that string.  It is a special "all documents" query
that is *highly* optimized to execute very quickly.

Although it probably could be possible to optimize "field:*" queries to
a range query, there are certain situations in which the wildcard query
*is* the best option ... so if Solr were to optimize it, it might in
fact be *slower*.  Instead of having this optimization, Solr lets you do
whatever you want with the available syntax, even if it's not the best
option.

I can't think of any downside to the optimization for "*:*", which is
very likely why that string is treated specially.

Something to note: You cannot specify a wildcard for the fieldname.  So
"*:searchterm" queries do not work.

Thanks,
Shawn


Re: finds all documents without a value for field

Posted by Hendrik Haddorp <he...@gmx.net>.
If the range query is so much better shouldn't the Solr query parser 
create a range query for a token query that only contains the wildcard? 
For the *:* case it does already contain a special path.

On 20.07.2017 21:00, Shawn Heisey wrote:
> On 7/20/2017 7:20 AM, Hendrik Haddorp wrote:
>> the Solr 6.6. ref guide states that to "finds all documents without a
>> value for field" you can use:
>> -field:[* TO *]
>>
>> While this is true I'm wondering why it is recommended to use a range
>> query instead of simply:
>> -field:*
> Performance.
>
> A wildcard is expanded to all possible term values for that field.  If
> the field has millions of possible terms, then the query object created
> at the Lucene level will quite literally have millions of terms in it.
> No matter how you approach a query with those characteristics, it's
> going to be slow, for both getting the terms list and executing the query.
>
> A full range query might be somewhat slow when there are many possible
> values, but it's a lot faster than a wildcard in those cases.
>
> If the field is only used by a handful of documents and has very few
> possible values, then it might be faster than a range query ... but this
> is not common, so the recommended way to do this is with a range query.
>
> Thanks,
> Shawn
>


Re: finds all documents without a value for field

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/20/2017 7:20 AM, Hendrik Haddorp wrote:
> the Solr 6.6. ref guide states that to "finds all documents without a
> value for field" you can use:
> -field:[* TO *]
>
> While this is true I'm wondering why it is recommended to use a range
> query instead of simply:
> -field:*

Performance.

A wildcard is expanded to all possible term values for that field.  If
the field has millions of possible terms, then the query object created
at the Lucene level will quite literally have millions of terms in it. 
No matter how you approach a query with those characteristics, it's
going to be slow, for both getting the terms list and executing the query.

A full range query might be somewhat slow when there are many possible
values, but it's a lot faster than a wildcard in those cases.

If the field is only used by a handful of documents and has very few
possible values, then it might be faster than a range query ... but this
is not common, so the recommended way to do this is with a range query.

Thanks,
Shawn