You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Karen Loughran <k....@qub.ac.uk> on 2008/01/14 15:30:28 UTC

field:(-null) returns records where field was not specified


Hi all,

We are indexing different types of documents, some with certain fields set and 
some without, some fields sometimes in both.

If a particular field is missing in a newly added record, I would have 
expected the query:

field_name:(-null)

not to return this particular record in the response, ie, I'm assuming the 
field is set to null.

But the response we see includes empty docs:

......
....
..
<doc>
 </doc>
<doc>
 </doc>
<doc>
 </doc>
etc, etc....
..
....

Can someone explain why field_name:(-null) returns the records where 
field_name is missing ?

We note that if we do the range operation we can get a response without the 
records with no field_name:

field_name:[* TO *]

Many thanks
Karen

Re: field:(-null) returns records where field was not specified

Posted by Karen Loughran <k....@qub.ac.uk>.
Thanks Chris, this is useful, we can you the query format you suggest,

Karen

On Tuesday 15 January 2008 01:13:14 Chris Hostetter wrote:
> Several things in this thread should be clarified (note: order of
> quotations munged for clarity)...
>
> : I had read this page.  But I'm not using the "NOT" operator,  I'm using
> : the "-" operator.  I'm assuming there is a subtle difference between them
> : in that NOT qualifies something else, hence needs 2 terms.  Isn't the "-"
> : operator supposed to be a complement to the "+" operator, ie. excludes
> : something rather than requiring it ?
>
> "The NOT operator" and "the - operator" are in fact the same thing ... the
> duplicate syntax comes from Lucene trying to appease people that
> want boolean style operator synta (AND/OR/NOT) even though the query
> parser is not a boolean syntax.
>
> : > Have you seen this page?
> : > http://lucene.apache.org/java/docs/queryparsersyntax.html
> : >
> : > From that page:
> : > Note: The NOT operator cannot be used with just one term. For example,
> : > the following search will return no results:
> : > NOT "jakarta apache"
>
> In Solr, the query parser can in fact support purely negative queries, by
> internally transforming the query, this is noted on the Solr query syntax
> wiki...
>
> http://wiki.apache.org/solr/SolrQuerySyntax
>
> : > > field_name:(-null)
>
> "null" is not a special keyword, if you look at the debugging output when
> doing that query you'll see that it is the same as:   -field_name:null
> ... which is a search for all docs containing the string "null" in the
> field "field_name".
>
> : The *:* (star colon star) means "all records". The trick is to use (*:*
> : AND -field:[* TO *]). It's silly, but there it is.
>
> as i mentioned, you can do pure wildcard queries now, so a simple search
> for -field_name:[* TO *] will find all docs that have no indexed values
> for that field at all.
>
> : A performance note: we switched from empty fields to fields with a
> : standard 'empty' value. This way we don't have to do a range check to
> : find records with empty fields.
>
> Your milage may vary depending on how many docs you have with "no value"
> ... this also issn't practical when dealing with numeric, boolean, or date
> based fields.  (and depending on how much churn there is in your index,
> the filterCache can probably make the difference negliable on average
> anyway).
>
>
>
>
> -Hoss



RE: field:(-null) returns records where field was not specified

Posted by Chris Hostetter <ho...@fucit.org>.
Several things in this thread should be clarified (note: order of 
quotations munged for clarity)...

: I had read this page.  But I'm not using the "NOT" operator,  I'm using the
: "-" operator.  I'm assuming there is a subtle difference between them in
: that NOT qualifies something else, hence needs 2 terms.  Isn't the "-" 
: operator supposed to be a complement to the "+" operator, ie. excludes
: something rather than requiring it ?

"The NOT operator" and "the - operator" are in fact the same thing ... the 
duplicate syntax comes from Lucene trying to appease people that 
want boolean style operator synta (AND/OR/NOT) even though the query 
parser is not a boolean syntax.

: > Have you seen this page?
: > http://lucene.apache.org/java/docs/queryparsersyntax.html
: >
: > From that page:
: > Note: The NOT operator cannot be used with just one term. For example, 
: > the following search will return no results:
: > NOT "jakarta apache"

In Solr, the query parser can in fact support purely negative queries, by 
internally transforming the query, this is noted on the Solr query syntax 
wiki...

http://wiki.apache.org/solr/SolrQuerySyntax

: > > field_name:(-null)

"null" is not a special keyword, if you look at the debugging output when 
doing that query you'll see that it is the same as:   -field_name:null  
... which is a search for all docs containing the string "null" in the 
field "field_name".

: The *:* (star colon star) means "all records". The trick is to use (*:* AND
: -field:[* TO *]). It's silly, but there it is.

as i mentioned, you can do pure wildcard queries now, so a simple search 
for -field_name:[* TO *] will find all docs that have no indexed values 
for that field at all.

: A performance note: we switched from empty fields to fields with a standard
: 'empty' value. This way we don't have to do a range check to find records
: with empty fields.

Your milage may vary depending on how many docs you have with "no value" 
... this also issn't practical when dealing with numeric, boolean, or date 
based fields.  (and depending on how much churn there is in your index, 
the filterCache can probably make the difference negliable on average 
anyway).




-Hoss


RE: field:(-null) returns records where field was not specified

Posted by Lance Norskog <go...@gmail.com>.
The *:* (star colon star) means "all records". The trick is to use (*:* AND
-field:[* TO *]). It's silly, but there it is.

A performance note: we switched from empty fields to fields with a standard
'empty' value. This way we don't have to do a range check to find records
with empty fields.

Lance Norskog

-----Original Message-----
From: Karen Loughran [mailto:k.loughran@qub.ac.uk] 
Sent: Monday, January 14, 2008 7:51 AM
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: field:(-null) returns records where field was not specified


Hi Erik, thanks for your reply,

I had read this page.  But I'm not using the "NOT" operator,  I'm using the
"-" operator.  I'm assuming there is a subtle difference between them in
that NOT qualifies something else, hence needs 2 terms.  Isn't the "-" 
operator supposed to be a complement to the "+" operator, ie. excludes
something rather than requiring it ?

thanks
Karen



On Monday 14 January 2008 15:14:05 Erick Erickson wrote:
> Have you seen this page?
> http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> From that page:
> Note: The NOT operator cannot be used with just one term. For example, 
> the following search will return no results:
> NOT "jakarta apache"
>
>
> Erick
>
> On Jan 14, 2008 9:30 AM, Karen Loughran <k....@qub.ac.uk> wrote:
> > Hi all,
> >
> > We are indexing different types of documents, some with certain 
> > fields set and some without, some fields sometimes in both.
> >
> > If a particular field is missing in a newly added record, I would 
> > have expected the query:
> >
> > field_name:(-null)
> >
> > not to return this particular record in the response, ie, I'm 
> > assuming the field is set to null.
> >
> > But the response we see includes empty docs:
> >
> > ......
> > ....
> > ..
> > <doc>
> >  </doc>
> > <doc>
> >  </doc>
> > <doc>
> >  </doc>
> > etc, etc....
> > ..
> > ....
> >
> > Can someone explain why field_name:(-null) returns the records where 
> > field_name is missing ?
> >
> > We note that if we do the range operation we can get a response 
> > without the records with no field_name:
> >
> > field_name:[* TO *]
> >
> > Many thanks
> > Karen



No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.0/1218 - Release Date: 1/10/2008
1:32 PM
 


Re: field:(-null) returns records where field was not specified

Posted by Karen Loughran <k....@qub.ac.uk>.
Hi Erik, thanks for your reply,

I had read this page.  But I'm not using the "NOT" operator,  I'm using 
the "-" operator.  I'm assuming there is a subtle difference between them in 
that NOT qualifies something else, hence needs 2 terms.  Isn't the "-" 
operator supposed to be a complement to the "+" operator, ie. excludes 
something rather than requiring it ?

thanks
Karen



On Monday 14 January 2008 15:14:05 Erick Erickson wrote:
> Have you seen this page?
> http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> From that page:
> Note: The NOT operator cannot be used with just one term. For example, the
> following search will return no results:
> NOT "jakarta apache"
>
>
> Erick
>
> On Jan 14, 2008 9:30 AM, Karen Loughran <k....@qub.ac.uk> wrote:
> > Hi all,
> >
> > We are indexing different types of documents, some with certain fields
> > set and
> > some without, some fields sometimes in both.
> >
> > If a particular field is missing in a newly added record, I would have
> > expected the query:
> >
> > field_name:(-null)
> >
> > not to return this particular record in the response, ie, I'm assuming
> > the field is set to null.
> >
> > But the response we see includes empty docs:
> >
> > ......
> > ....
> > ..
> > <doc>
> >  </doc>
> > <doc>
> >  </doc>
> > <doc>
> >  </doc>
> > etc, etc....
> > ..
> > ....
> >
> > Can someone explain why field_name:(-null) returns the records where
> > field_name is missing ?
> >
> > We note that if we do the range operation we can get a response without
> > the
> > records with no field_name:
> >
> > field_name:[* TO *]
> >
> > Many thanks
> > Karen



Re: field:(-null) returns records where field was not specified

Posted by Erick Erickson <er...@gmail.com>.
Have you seen this page?
http://lucene.apache.org/java/docs/queryparsersyntax.html

>From that page:
Note: The NOT operator cannot be used with just one term. For example, the
following search will return no results:
NOT "jakarta apache"


Erick


On Jan 14, 2008 9:30 AM, Karen Loughran <k....@qub.ac.uk> wrote:

>
>
> Hi all,
>
> We are indexing different types of documents, some with certain fields set
> and
> some without, some fields sometimes in both.
>
> If a particular field is missing in a newly added record, I would have
> expected the query:
>
> field_name:(-null)
>
> not to return this particular record in the response, ie, I'm assuming the
> field is set to null.
>
> But the response we see includes empty docs:
>
> ......
> ....
> ..
> <doc>
>  </doc>
> <doc>
>  </doc>
> <doc>
>  </doc>
> etc, etc....
> ..
> ....
>
> Can someone explain why field_name:(-null) returns the records where
> field_name is missing ?
>
> We note that if we do the range operation we can get a response without
> the
> records with no field_name:
>
> field_name:[* TO *]
>
> Many thanks
> Karen
>