You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tim Eck <te...@terracottatech.com> on 2012/02/16 21:58:46 UTC

query for documents WITHOUT a field?

My apologies if this answer is readily available someplace, I've searched
around and not found a definitive answer. 

 

I'd like to run a query for documents that _do not_ contain particular
indexed fields to implement something like a SQL-like query where a column
is null. 

 

I understand I could possibly use a magic value to represent "null", but
the data I'm searching doesn't led itself to reserving a value for null. I
also understand I could add an extra field to hold this boolean isNull
state but would love a better solution :-) 

 

TIA

 


Re: query for documents WITHOUT a field?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Right another level of BooleanQuery that is a SHOULD clause, with TWO terms: 
a MUST of MatchAllDocsQuery and a MUST_NOT of the TermRangeQuery for 
"allergies" with null for both start and end.

Actually, there is a new filter that you can use to detect empty fields down 
at that level. See
https://issues.apache.org/jira/browse/LUCENE-4386

I think it is:

new ConstantScoreQuery(new FieldValueFilter(fieldname, false))

Use a SHOULD of that rather than a second level of BooleanQuery. Let us know 
if it actually works!

-- Jack Krupansky

-----Original Message----- 
From: Vitaly Funstein
Sent: Thursday, October 25, 2012 8:55 PM
To: java-user@lucene.apache.org
Subject: Re: query for documents WITHOUT a field?

This is the QueryParser syntax, right? So an API equivalent for the not
null case would be something like this?

BooleanQuery q = new BooleanQuery();
q.add(new BooleanClause(new TermQuery(new Term("first_name", "Zed")),
Occur.SHOULD));
q.add(new BooleanClause(new TermRangeQuery("allergies", null, null, true,
true), Occur.SHOULD));

Whereas, for "IS NULL" the TermRangeQuery above would need to be wrapped in
another BooleanClause with Occur.MUST_NOT?

On Thu, Oct 25, 2012 at 5:29 PM, Jack Krupansky 
<ja...@basetechnology.com>wrote:

> "OR allergies IS NULL" would be "OR (*:* -allergies:[* TO *])" in
> Lucene/Solr.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Vitaly Funstein
> Sent: Thursday, October 25, 2012 8:25 PM
> To: java-user@lucene.apache.org
> Subject: Re: query for documents WITHOUT a field?
>
>
> Sorry for resurrecting an old thread, but how would one go about writing a
> Lucene query similar to this?
>
> SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL
>
> An AND case would be easy since one would just use a simple TermQuery with
> a FieldValueFilter added, but what about other boolean cases? Admittedly,
> this is a contrived example, but the point here is that it seems that 
> since
> filters are always applied to results after they are returned, how would
> one go about making the null-ness of a field part of the query logic?
>
> On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>  I already mentioned that pseudo NULL term, but the user asked for another
>> solution...
>> --
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, 28213 Bremen
>> http://www.thetaphi.de
>>
>>
>>
>> Jamie Johnson <je...@gmail.com> schrieb:
>>
>> Another possible solution is while indexing insert a custom token
>> which is impossible to show up in the index otherwise, then do the
>> filter based on that token.
>>
>>
>> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> > As the documentation states:
>> > Lucene is an inverted index that does not have per-document fields. It
>> only
>> > knows terms pointing to documents. The query you are searching is a >
>> query
>> > that returns all documents which have no term. To execute this query, 
>> > it
>> > will get the term index and iterate all terms of a field, mark those in
>> > a
>> > bitset and negates that. The filter/query I told you uses the 
>> > FieldCache
>> to
>> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different)
>> there
>> > is another fieldcache that returns exactly that bitset. The filter
>> mentioned
>> > only uses that bitset from this new fieldcache. Fieldcache is populated
>> on
>> > first access and keeps alive as long as the underlying index segment is
>> open
>> > (means as long as IndexReader is open and the parts of the index is not
>> > refreshed). If you are also sorting against your fields or doing other
>> > queries using FieldCache, there is no overhead, otherwise the bitset is
>> > populated on first access to the filter.
>> >
>> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo
>> term is
>> > the only solution (and also much faster on the first access in Lucene
>> 3.6).
>> > Later accesses hitting the cache in 3.6 will be faster, of course.
>> >
>> > Another hacky way to achieve the same results is (works with almost any
>> > Lucene version):
>> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
>> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do 
>> > a
>> > full term index scan without caching :-). You may use
>> CachingWrapperFilter
>> > with PrefixFilter instead.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> >> -----Original Message-----
>> >> From: Tim Eck [mailto:timeck@gmail.com]
>> >> Sent: Thursday, February 16, 2012 10:14 PM
>> >> To: java-user@lucene.apache.org
>> >> Subject: RE: query for documents WITHOUT a field?
>> >>
>> >> Thanks for the fast response. I'll certainly have a look at the >>
>> upcoming
>> > 3.6.x
>> >> release. What is the expected performance for using a negated filter?
>> >> In particular does it defeat the index in any way and require a full
>> index
>> > scan?
>> >> Is it different between regular fields and numeric fields?
>> >>
>> >> For 3.5 and earlier though, is there any suggestion other than magic
>> > values?
>> >>
>> >> -----Original Message-----
>> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> >> Sent: Thursday, February 16, 2012 1:07 PM
>> >> To: java-user@lucene.apache.org
>> >> Subject: RE: query for documents WITHOUT a field?
>> >>
>> >> Lucene 3.6 will have a FieldValueFilter that can be negated:
>> >>
>> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
>> >>
>> >> (see http://goo.gl/wyjxn)
>> >>
>> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
>> > Jenkins:
>> >> http://goo.gl/Ka0gr
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: uwe@thetaphi.de
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Tim Eck 
>> >> > [mailto:teck@terracottatech.**com<te...@terracottatech.com>
>> ]
>> >> > Sent: Thursday, February 16, 2012 9:59 PM
>> >> > To: java-user@lucene.apache.org
>> >> > Subject: query for documents WITHOUT a field?
>> >> >
>> >> > My apologies if this answer is readily available someplace, I've
>> >> > searched around and not found a definitive answer.
>> >> >
>> >> >
>> >> >
>> >> > I'd like to run a query for documents that _do not_ contain >> >
>> particular
>> >> indexed
>> >> > fields to implement something like a SQL-like query where a column 
>> >> > is
>> >> null.
>> >> >
>> >> >
>> >> >
>> >> > I understand I could possibly use a magic value to represent "null",
>> >> > but
>> >> the data
>> >> > I'm searching doesn't led itself to reserving a value for null. I >>
>> > also
>> >> understand I
>> >> > could add an extra field to hold this boolean isNull state but would
>> >> > love
>> >> a better
>> >> > solution :-)
>> >> >
>> >> >
>> >> >
>> >> > TIA
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >>____________________________**_________________
>>
>> >> To unsubscribe, e-mail: 
>> >> java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> >> For additional commands, e-mail: 
>> >> java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>> >>
>> >>
>> >>
>> >>____________________________**_________________
>>
>> >> To unsubscribe, e-mail: 
>> >> java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> >> For additional commands, e-mail: 
>> >> java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>> >
>> >
>> >_____________________________**________________
>>
>> > To unsubscribe, e-mail: 
>> > java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> > For additional commands, e-mail: 
>> > java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>> >
>>
>> ______________________________**_______________
>>
>> To unsubscribe, e-mail: 
>> java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> For additional commands, e-mail: 
>> java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
>
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query for documents WITHOUT a field?

Posted by Vitaly Funstein <vf...@gmail.com>.
This is the QueryParser syntax, right? So an API equivalent for the not
null case would be something like this?

BooleanQuery q = new BooleanQuery();
q.add(new BooleanClause(new TermQuery(new Term("first_name", "Zed")),
Occur.SHOULD));
q.add(new BooleanClause(new TermRangeQuery("allergies", null, null, true,
true), Occur.SHOULD));

Whereas, for "IS NULL" the TermRangeQuery above would need to be wrapped in
another BooleanClause with Occur.MUST_NOT?

On Thu, Oct 25, 2012 at 5:29 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> "OR allergies IS NULL" would be "OR (*:* -allergies:[* TO *])" in
> Lucene/Solr.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Vitaly Funstein
> Sent: Thursday, October 25, 2012 8:25 PM
> To: java-user@lucene.apache.org
> Subject: Re: query for documents WITHOUT a field?
>
>
> Sorry for resurrecting an old thread, but how would one go about writing a
> Lucene query similar to this?
>
> SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL
>
> An AND case would be easy since one would just use a simple TermQuery with
> a FieldValueFilter added, but what about other boolean cases? Admittedly,
> this is a contrived example, but the point here is that it seems that since
> filters are always applied to results after they are returned, how would
> one go about making the null-ness of a field part of the query logic?
>
> On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>  I already mentioned that pseudo NULL term, but the user asked for another
>> solution...
>> --
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, 28213 Bremen
>> http://www.thetaphi.de
>>
>>
>>
>> Jamie Johnson <je...@gmail.com> schrieb:
>>
>> Another possible solution is while indexing insert a custom token
>> which is impossible to show up in the index otherwise, then do the
>> filter based on that token.
>>
>>
>> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> > As the documentation states:
>> > Lucene is an inverted index that does not have per-document fields. It
>> only
>> > knows terms pointing to documents. The query you are searching is a >
>> query
>> > that returns all documents which have no term. To execute this query, it
>> > will get the term index and iterate all terms of a field, mark those in
>> > a
>> > bitset and negates that. The filter/query I told you uses the FieldCache
>> to
>> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different)
>> there
>> > is another fieldcache that returns exactly that bitset. The filter
>> mentioned
>> > only uses that bitset from this new fieldcache. Fieldcache is populated
>> on
>> > first access and keeps alive as long as the underlying index segment is
>> open
>> > (means as long as IndexReader is open and the parts of the index is not
>> > refreshed). If you are also sorting against your fields or doing other
>> > queries using FieldCache, there is no overhead, otherwise the bitset is
>> > populated on first access to the filter.
>> >
>> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo
>> term is
>> > the only solution (and also much faster on the first access in Lucene
>> 3.6).
>> > Later accesses hitting the cache in 3.6 will be faster, of course.
>> >
>> > Another hacky way to achieve the same results is (works with almost any
>> > Lucene version):
>> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
>> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
>> > full term index scan without caching :-). You may use
>> CachingWrapperFilter
>> > with PrefixFilter instead.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> >> -----Original Message-----
>> >> From: Tim Eck [mailto:timeck@gmail.com]
>> >> Sent: Thursday, February 16, 2012 10:14 PM
>> >> To: java-user@lucene.apache.org
>> >> Subject: RE: query for documents WITHOUT a field?
>> >>
>> >> Thanks for the fast response. I'll certainly have a look at the >>
>> upcoming
>> > 3.6.x
>> >> release. What is the expected performance for using a negated filter?
>> >> In particular does it defeat the index in any way and require a full
>> index
>> > scan?
>> >> Is it different between regular fields and numeric fields?
>> >>
>> >> For 3.5 and earlier though, is there any suggestion other than magic
>> > values?
>> >>
>> >> -----Original Message-----
>> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> >> Sent: Thursday, February 16, 2012 1:07 PM
>> >> To: java-user@lucene.apache.org
>> >> Subject: RE: query for documents WITHOUT a field?
>> >>
>> >> Lucene 3.6 will have a FieldValueFilter that can be negated:
>> >>
>> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
>> >>
>> >> (see http://goo.gl/wyjxn)
>> >>
>> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
>> > Jenkins:
>> >> http://goo.gl/Ka0gr
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: uwe@thetaphi.de
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Tim Eck [mailto:teck@terracottatech.**com<te...@terracottatech.com>
>> ]
>> >> > Sent: Thursday, February 16, 2012 9:59 PM
>> >> > To: java-user@lucene.apache.org
>> >> > Subject: query for documents WITHOUT a field?
>> >> >
>> >> > My apologies if this answer is readily available someplace, I've
>> >> > searched around and not found a definitive answer.
>> >> >
>> >> >
>> >> >
>> >> > I'd like to run a query for documents that _do not_ contain >> >
>> particular
>> >> indexed
>> >> > fields to implement something like a SQL-like query where a column is
>> >> null.
>> >> >
>> >> >
>> >> >
>> >> > I understand I could possibly use a magic value to represent "null",
>> >> > but
>> >> the data
>> >> > I'm searching doesn't led itself to reserving a value for null. I >>
>> > also
>> >> understand I
>> >> > could add an extra field to hold this boolean isNull state but would
>> >> > love
>> >> a better
>> >> > solution :-)
>> >> >
>> >> >
>> >> >
>> >> > TIA
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >>____________________________**_________________
>>
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> >> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>> >>
>> >>
>> >>
>> >>____________________________**_________________
>>
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> >> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>> >
>> >
>> >_____________________________**________________
>>
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> > For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>> >
>>
>> ______________________________**_______________
>>
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
>
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
>

Re: query for documents WITHOUT a field?

Posted by Jack Krupansky <ja...@basetechnology.com>.
"OR allergies IS NULL" would be "OR (*:* -allergies:[* TO *])" in 
Lucene/Solr.

-- Jack Krupansky

-----Original Message----- 
From: Vitaly Funstein
Sent: Thursday, October 25, 2012 8:25 PM
To: java-user@lucene.apache.org
Subject: Re: query for documents WITHOUT a field?

Sorry for resurrecting an old thread, but how would one go about writing a
Lucene query similar to this?

SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL

An AND case would be easy since one would just use a simple TermQuery with
a FieldValueFilter added, but what about other boolean cases? Admittedly,
this is a contrived example, but the point here is that it seems that since
filters are always applied to results after they are returned, how would
one go about making the null-ness of a field part of the query logic?

On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> I already mentioned that pseudo NULL term, but the user asked for another
> solution...
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> Jamie Johnson <je...@gmail.com> schrieb:
>
> Another possible solution is while indexing insert a custom token
> which is impossible to show up in the index otherwise, then do the
> filter based on that token.
>
>
> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > As the documentation states:
> > Lucene is an inverted index that does not have per-document fields. It
> only
> > knows terms pointing to documents. The query you are searching is a 
> > query
> > that returns all documents which have no term. To execute this query, it
> > will get the term index and iterate all terms of a field, mark those in 
> > a
> > bitset and negates that. The filter/query I told you uses the FieldCache
> to
> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different)
> there
> > is another fieldcache that returns exactly that bitset. The filter
> mentioned
> > only uses that bitset from this new fieldcache. Fieldcache is populated
> on
> > first access and keeps alive as long as the underlying index segment is
> open
> > (means as long as IndexReader is open and the parts of the index is not
> > refreshed). If you are also sorting against your fields or doing other
> > queries using FieldCache, there is no overhead, otherwise the bitset is
> > populated on first access to the filter.
> >
> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo
> term is
> > the only solution (and also much faster on the first access in Lucene
> 3.6).
> > Later accesses hitting the cache in 3.6 will be faster, of course.
> >
> > Another hacky way to achieve the same results is (works with almost any
> > Lucene version):
> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
> > full term index scan without caching :-). You may use
> CachingWrapperFilter
> > with PrefixFilter instead.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Tim Eck [mailto:timeck@gmail.com]
> >> Sent: Thursday, February 16, 2012 10:14 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: query for documents WITHOUT a field?
> >>
> >> Thanks for the fast response. I'll certainly have a look at the 
> >> upcoming
> > 3.6.x
> >> release. What is the expected performance for using a negated filter?
> >> In particular does it defeat the index in any way and require a full
> index
> > scan?
> >> Is it different between regular fields and numeric fields?
> >>
> >> For 3.5 and earlier though, is there any suggestion other than magic
> > values?
> >>
> >> -----Original Message-----
> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> >> Sent: Thursday, February 16, 2012 1:07 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: query for documents WITHOUT a field?
> >>
> >> Lucene 3.6 will have a FieldValueFilter that can be negated:
> >>
> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
> >>
> >> (see http://goo.gl/wyjxn)
> >>
> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
> > Jenkins:
> >> http://goo.gl/Ka0gr
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>
> >> > -----Original Message-----
> >> > From: Tim Eck [mailto:teck@terracottatech.com]
> >> > Sent: Thursday, February 16, 2012 9:59 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: query for documents WITHOUT a field?
> >> >
> >> > My apologies if this answer is readily available someplace, I've
> >> > searched around and not found a definitive answer.
> >> >
> >> >
> >> >
> >> > I'd like to run a query for documents that _do not_ contain 
> >> > particular
> >> indexed
> >> > fields to implement something like a SQL-like query where a column is
> >> null.
> >> >
> >> >
> >> >
> >> > I understand I could possibly use a magic value to represent "null",
> >> > but
> >> the data
> >> > I'm searching doesn't led itself to reserving a value for null. I 
> >> > also
> >> understand I
> >> > could add an extra field to hold this boolean isNull state but would
> >> > love
> >> a better
> >> > solution :-)
> >> >
> >> >
> >> >
> >> > TIA
> >> >
> >> >
> >>
> >>
> >>
> >>_____________________________________________
>
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>_____________________________________________
>
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >_____________________________________________
>
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> _____________________________________________
>
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query for documents WITHOUT a field?

Posted by Vitaly Funstein <vf...@gmail.com>.
Sorry for resurrecting an old thread, but how would one go about writing a
Lucene query similar to this?

SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL

An AND case would be easy since one would just use a simple TermQuery with
a FieldValueFilter added, but what about other boolean cases? Admittedly,
this is a contrived example, but the point here is that it seems that since
filters are always applied to results after they are returned, how would
one go about making the null-ness of a field part of the query logic?

On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> I already mentioned that pseudo NULL term, but the user asked for another
> solution...
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> Jamie Johnson <je...@gmail.com> schrieb:
>
> Another possible solution is while indexing insert a custom token
> which is impossible to show up in the index otherwise, then do the
> filter based on that token.
>
>
> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > As the documentation states:
> > Lucene is an inverted index that does not have per-document fields. It
> only
> > knows terms pointing to documents. The query you are searching is a query
> > that returns all documents which have no term. To execute this query, it
> > will get the term index and iterate all terms of a field, mark those in a
> > bitset and negates that. The filter/query I told you uses the FieldCache
> to
> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different)
> there
> > is another fieldcache that returns exactly that bitset. The filter
> mentioned
> > only uses that bitset from this new fieldcache. Fieldcache is populated
> on
> > first access and keeps alive as long as the underlying index segment is
> open
> > (means as long as IndexReader is open and the parts of the index is not
> > refreshed). If you are also sorting against your fields or doing other
> > queries using FieldCache, there is no overhead, otherwise the bitset is
> > populated on first access to the filter.
> >
> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo
> term is
> > the only solution (and also much faster on the first access in Lucene
> 3.6).
> > Later accesses hitting the cache in 3.6 will be faster, of course.
> >
> > Another hacky way to achieve the same results is (works with almost any
> > Lucene version):
> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
> > full term index scan without caching :-). You may use
> CachingWrapperFilter
> > with PrefixFilter instead.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Tim Eck [mailto:timeck@gmail.com]
> >> Sent: Thursday, February 16, 2012 10:14 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: query for documents WITHOUT a field?
> >>
> >> Thanks for the fast response. I'll certainly have a look at the upcoming
> > 3.6.x
> >> release. What is the expected performance for using a negated filter?
> >> In particular does it defeat the index in any way and require a full
> index
> > scan?
> >> Is it different between regular fields and numeric fields?
> >>
> >> For 3.5 and earlier though, is there any suggestion other than magic
> > values?
> >>
> >> -----Original Message-----
> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> >> Sent: Thursday, February 16, 2012 1:07 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: query for documents WITHOUT a field?
> >>
> >> Lucene 3.6 will have a FieldValueFilter that can be negated:
> >>
> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
> >>
> >> (see http://goo.gl/wyjxn)
> >>
> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
> > Jenkins:
> >> http://goo.gl/Ka0gr
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>
> >> > -----Original Message-----
> >> > From: Tim Eck [mailto:teck@terracottatech.com]
> >> > Sent: Thursday, February 16, 2012 9:59 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: query for documents WITHOUT a field?
> >> >
> >> > My apologies if this answer is readily available someplace, I've
> >> > searched around and not found a definitive answer.
> >> >
> >> >
> >> >
> >> > I'd like to run a query for documents that _do not_ contain particular
> >> indexed
> >> > fields to implement something like a SQL-like query where a column is
> >> null.
> >> >
> >> >
> >> >
> >> > I understand I could possibly use a magic value to represent "null",
> >> > but
> >> the data
> >> > I'm searching doesn't led itself to reserving a value for null. I also
> >> understand I
> >> > could add an extra field to hold this boolean isNull state but would
> >> > love
> >> a better
> >> > solution :-)
> >> >
> >> >
> >> >
> >> > TIA
> >> >
> >> >
> >>
> >>
> >>
> >>_____________________________________________
>
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>_____________________________________________
>
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >_____________________________________________
>
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> _____________________________________________
>
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: query for documents WITHOUT a field?

Posted by Uwe Schindler <uw...@thetaphi.de>.
I already mentioned that pseudo NULL term, but the user asked for another solution...
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Jamie Johnson <je...@gmail.com> schrieb:

Another possible solution is while indexing insert a custom token
which is impossible to show up in the index otherwise, then do the
filter based on that token.


On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> As the documentation states:
> Lucene is an inverted index that does not have per-document fields. It only
> knows terms pointing to documents. The query you are searching is a query
> that returns all documents which have no term. To execute this query, it
> will get the term index and iterate all terms of a field, mark those in a
> bitset and negates that. The filter/query I told you uses the FieldCache to
> do this. Since 3.6 (also in 3.5, but there it is buggy/API different) there
> is another fieldcache that returns exactly that bitset. The filter mentioned
> only uses that bitset from this new fieldcache. Fieldcache is populated on
> first access and keeps alive as long as the underlying index segment is open
> (means as long as IndexReader is open and the parts of the index is not
> refreshed). If you are also sorting against your fields or doing other
> queries using FieldCache, there is no overhead, otherwise the bitset is
> populated on first access to the filter.
>
> Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo term is
> the only solution (and also much faster on the first access in Lucene 3.6).
> Later accesses hitting the cache in 3.6 will be faster, of course.
>
> Another hacky way to achieve the same results is (works with almost any
> Lucene version):
> BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
> PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
> full term index scan without caching :-). You may use CachingWrapperFilter
> with PrefixFilter instead.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Tim Eck [mailto:timeck@gmail.com]
>> Sent: Thursday, February 16, 2012 10:14 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: query for documents WITHOUT a field?
>>
>> Thanks for the fast response. I'll certainly have a look at the upcoming
> 3.6.x
>> release. What is the expected performance for using a negated filter?
>> In particular does it defeat the index in any way and require a full index
> scan?
>> Is it different between regular fields and numeric fields?
>>
>> For 3.5 and earlier though, is there any suggestion other than magic
> values?
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Thursday, February 16, 2012 1:07 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: query for documents WITHOUT a field?
>>
>> Lucene 3.6 will have a FieldValueFilter that can be negated:
>>
>> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
>>
>> (see http://goo.gl/wyjxn)
>>
>> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
> Jenkins:
>> http://goo.gl/Ka0gr
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Tim Eck [mailto:teck@terracottatech.com]
>> > Sent: Thursday, February 16, 2012 9:59 PM
>> > To: java-user@lucene.apache.org
>> > Subject: query for documents WITHOUT a field?
>> >
>> > My apologies if this answer is readily available someplace, I've
>> > searched around and not found a definitive answer.
>> >
>> >
>> >
>> > I'd like to run a query for documents that _do not_ contain particular
>> indexed
>> > fields to implement something like a SQL-like query where a column is
>> null.
>> >
>> >
>> >
>> > I understand I could possibly use a magic value to represent "null",
>> > but
>> the data
>> > I'm searching doesn't led itself to reserving a value for null. I also
>> understand I
>> > could add an extra field to hold this boolean isNull state but would
>> > love
>> a better
>> > solution :-)
>> >
>> >
>> >
>> > TIA
>> >
>> >
>>
>>
>>
>>_____________________________________________

>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>_____________________________________________

>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>_____________________________________________

> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

_____________________________________________

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query for documents WITHOUT a field?

Posted by Jamie Johnson <je...@gmail.com>.
Another possible solution is while indexing insert a custom token
which is impossible to show up in the index otherwise, then do the
filter based on that token.


On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> As the documentation states:
> Lucene is an inverted index that does not have per-document fields. It only
> knows terms pointing to documents. The query you are searching is a query
> that returns all documents which have no term. To execute this query, it
> will get the term index and iterate all terms of a field, mark those in a
> bitset and negates that. The filter/query I told you uses the FieldCache to
> do this. Since 3.6 (also in 3.5, but there it is buggy/API different) there
> is another fieldcache that returns exactly that bitset. The filter mentioned
> only uses that bitset from this new fieldcache. Fieldcache is populated on
> first access and keeps alive as long as the underlying index segment is open
> (means as long as IndexReader is open and the parts of the index is not
> refreshed). If you are also sorting against your fields or doing other
> queries using FieldCache, there is no overhead, otherwise the bitset is
> populated on first access to the filter.
>
> Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo term is
> the only solution (and also much faster on the first access in Lucene 3.6).
> Later accesses hitting the cache in 3.6 will be faster, of course.
>
> Another hacky way to achieve the same results is (works with almost any
> Lucene version):
> BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
> PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
> full term index scan without caching :-). You may use CachingWrapperFilter
> with PrefixFilter instead.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Tim Eck [mailto:timeck@gmail.com]
>> Sent: Thursday, February 16, 2012 10:14 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: query for documents WITHOUT a field?
>>
>> Thanks for the fast response. I'll certainly have a look at the upcoming
> 3.6.x
>> release. What is the expected performance for using a negated filter?
>> In particular does it defeat the index in any way and require a full index
> scan?
>> Is it different between regular fields and numeric fields?
>>
>> For 3.5 and earlier though, is there any suggestion other than magic
> values?
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Thursday, February 16, 2012 1:07 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: query for documents WITHOUT a field?
>>
>> Lucene 3.6 will have a FieldValueFilter that can be negated:
>>
>> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
>>
>> (see http://goo.gl/wyjxn)
>>
>> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
> Jenkins:
>> http://goo.gl/Ka0gr
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Tim Eck [mailto:teck@terracottatech.com]
>> > Sent: Thursday, February 16, 2012 9:59 PM
>> > To: java-user@lucene.apache.org
>> > Subject: query for documents WITHOUT a field?
>> >
>> > My apologies if this answer is readily available someplace, I've
>> > searched around and not found a definitive answer.
>> >
>> >
>> >
>> > I'd like to run a query for documents that _do not_ contain particular
>> indexed
>> > fields to implement something like a SQL-like query where a column is
>> null.
>> >
>> >
>> >
>> > I understand I could possibly use a magic value to represent "null",
>> > but
>> the data
>> > I'm searching doesn't led itself to reserving a value for null. I also
>> understand I
>> > could add an extra field to hold this boolean isNull state but would
>> > love
>> a better
>> > solution :-)
>> >
>> >
>> >
>> > TIA
>> >
>> >
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: query for documents WITHOUT a field?

Posted by Uwe Schindler <uw...@thetaphi.de>.
As the documentation states:
Lucene is an inverted index that does not have per-document fields. It only
knows terms pointing to documents. The query you are searching is a query
that returns all documents which have no term. To execute this query, it
will get the term index and iterate all terms of a field, mark those in a
bitset and negates that. The filter/query I told you uses the FieldCache to
do this. Since 3.6 (also in 3.5, but there it is buggy/API different) there
is another fieldcache that returns exactly that bitset. The filter mentioned
only uses that bitset from this new fieldcache. Fieldcache is populated on
first access and keeps alive as long as the underlying index segment is open
(means as long as IndexReader is open and the parts of the index is not
refreshed). If you are also sorting against your fields or doing other
queries using FieldCache, there is no overhead, otherwise the bitset is
populated on first access to the filter.

Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo term is
the only solution (and also much faster on the first access in Lucene 3.6).
Later accesses hitting the cache in 3.6 will be faster, of course.

Another hacky way to achieve the same results is (works with almost any
Lucene version):
BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
full term index scan without caching :-). You may use CachingWrapperFilter
with PrefixFilter instead. 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tim Eck [mailto:timeck@gmail.com]
> Sent: Thursday, February 16, 2012 10:14 PM
> To: java-user@lucene.apache.org
> Subject: RE: query for documents WITHOUT a field?
> 
> Thanks for the fast response. I'll certainly have a look at the upcoming
3.6.x
> release. What is the expected performance for using a negated filter?
> In particular does it defeat the index in any way and require a full index
scan?
> Is it different between regular fields and numeric fields?
> 
> For 3.5 and earlier though, is there any suggestion other than magic
values?
> 
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Thursday, February 16, 2012 1:07 PM
> To: java-user@lucene.apache.org
> Subject: RE: query for documents WITHOUT a field?
> 
> Lucene 3.6 will have a FieldValueFilter that can be negated:
> 
> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
> 
> (see http://goo.gl/wyjxn)
> 
> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
Jenkins:
> http://goo.gl/Ka0gr
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Tim Eck [mailto:teck@terracottatech.com]
> > Sent: Thursday, February 16, 2012 9:59 PM
> > To: java-user@lucene.apache.org
> > Subject: query for documents WITHOUT a field?
> >
> > My apologies if this answer is readily available someplace, I've
> > searched around and not found a definitive answer.
> >
> >
> >
> > I'd like to run a query for documents that _do not_ contain particular
> indexed
> > fields to implement something like a SQL-like query where a column is
> null.
> >
> >
> >
> > I understand I could possibly use a magic value to represent "null",
> > but
> the data
> > I'm searching doesn't led itself to reserving a value for null. I also
> understand I
> > could add an extra field to hold this boolean isNull state but would
> > love
> a better
> > solution :-)
> >
> >
> >
> > TIA
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: query for documents WITHOUT a field?

Posted by Tim Eck <ti...@gmail.com>.
Thanks for the fast response. I'll certainly have a look at the upcoming
3.6.x release. What is the expected performance for using a negated filter?
In particular does it defeat the index in any way and require a full index
scan? Is it different between regular fields and numeric fields?

For 3.5 and earlier though, is there any suggestion other than magic values?

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Thursday, February 16, 2012 1:07 PM
To: java-user@lucene.apache.org
Subject: RE: query for documents WITHOUT a field?

Lucene 3.6 will have a FieldValueFilter that can be negated:

Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));

(see http://goo.gl/wyjxn)

Lucen 3.5 does not yet have it, you can download 3.6 snapshots from Jenkins:
http://goo.gl/Ka0gr

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tim Eck [mailto:teck@terracottatech.com]
> Sent: Thursday, February 16, 2012 9:59 PM
> To: java-user@lucene.apache.org
> Subject: query for documents WITHOUT a field?
> 
> My apologies if this answer is readily available someplace, I've searched
> around and not found a definitive answer.
> 
> 
> 
> I'd like to run a query for documents that _do not_ contain particular
indexed
> fields to implement something like a SQL-like query where a column is
null.
> 
> 
> 
> I understand I could possibly use a magic value to represent "null", but
the data
> I'm searching doesn't led itself to reserving a value for null. I also
understand I
> could add an extra field to hold this boolean isNull state but would love
a better
> solution :-)
> 
> 
> 
> TIA
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: query for documents WITHOUT a field?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Lucene 3.6 will have a FieldValueFilter that can be negated:

Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));

(see http://goo.gl/wyjxn)

Lucen 3.5 does not yet have it, you can download 3.6 snapshots from Jenkins:
http://goo.gl/Ka0gr

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Tim Eck [mailto:teck@terracottatech.com]
> Sent: Thursday, February 16, 2012 9:59 PM
> To: java-user@lucene.apache.org
> Subject: query for documents WITHOUT a field?
> 
> My apologies if this answer is readily available someplace, I've searched
> around and not found a definitive answer.
> 
> 
> 
> I'd like to run a query for documents that _do not_ contain particular
indexed
> fields to implement something like a SQL-like query where a column is
null.
> 
> 
> 
> I understand I could possibly use a magic value to represent "null", but
the data
> I'm searching doesn't led itself to reserving a value for null. I also
understand I
> could add an extra field to hold this boolean isNull state but would love
a better
> solution :-)
> 
> 
> 
> TIA
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org