You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by thogau <th...@thogau.net> on 2008/03/11 15:16:54 UTC

Searching for null (empty) fields, how to use -field:[* TO *]

Hi,


I browsed the forum searching for a way to make a query that retrieves
document that do not have any value for a given field (say MY_FIELD_NAME). 


I read several posts advising to use this syntax : -MY_FIELD_NAME:[* TO *]
However, I am not able to have it working...


I have 2 documents, the first one has a value for the field MY_FIELD_NAME 
(In Luke, I can see the value) and the second one has no value for it (In
Luke, I can see &lt;not available&gt;)


I would expect the query MY_FIELD_NAME:[* TO *] to retrieve the document
which has a value for the field MY_FIELD_NAME but it doesn't (Nevertheless
MY_FIELD_NAME:[a* TO z*] retrieves it)


Also, I would expect the query -MY_FIELD_NAME:[* TO *] to retrieve the
document which has a NO value for the field MY_FIELD_NAME but it doesn't
either...


I guess I am missing something obvious but I am stuck... Anybody can help to
understand what I am doing wrong?

-- 
View this message in context: http://www.nabble.com/Searching-for-null-%28empty%29-fields%2C-how-to-use--field%3A-*-TO-*--tp15976538p15976538.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by German Kondolf <ge...@gmail.com>.
Yes, my index is a "full-snapshot" created every "n" hours, there are
no incremental updates, so I decided to make another MatchAllDocsQuery
taking advantage that my index is read-only and basically removing
this checks.

Regards

Ger
german.kondolf@gmail.com

On Tue, Mar 11, 2008 at 11:54 AM, Yonik Seeley <yo...@apache.org> wrote:
> On Tue, Mar 11, 2008 at 10:41 AM, German Kondolf
>  <ge...@gmail.com> wrote:
>  > *:* is parsed as a MatchAllDocsQuery?
>  >
>  >  I've got some preformance issues in Lucene 2.2 because
>  >  MatchAllDocsQuery ask for a "isDeleted()" for every document, I didn't
>  >  tried it in 2.3.
>
>  That will still be the case in 2.3 (and it's a synchronized call... ouch).
>  That's one of the reasons why read-only IndexReaders would be a good idea.
>
>  -Yonik
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by Yonik Seeley <yo...@apache.org>.
On Tue, Mar 11, 2008 at 10:41 AM, German Kondolf
<ge...@gmail.com> wrote:
> *:* is parsed as a MatchAllDocsQuery?
>
>  I've got some preformance issues in Lucene 2.2 because
>  MatchAllDocsQuery ask for a "isDeleted()" for every document, I didn't
>  tried it in 2.3.

That will still be the case in 2.3 (and it's a synchronized call... ouch).
That's one of the reasons why read-only IndexReaders would be a good idea.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by German Kondolf <ge...@gmail.com>.
*:* is parsed as a MatchAllDocsQuery?

I've got some preformance issues in Lucene 2.2 because
MatchAllDocsQuery ask for a "isDeleted()" for every document, I didn't
tried it in 2.3.

On Tue, Mar 11, 2008 at 11:34 AM, Mark Miller <ma...@gmail.com> wrote:
> You cannot have a purely negative query like you can in Solr.
>
>  Try: *:* -MY_FIELD_NAME:[* TO *]
>
> thogau wrote:
>  > Hi,
>  >
>  >
>  > I browsed the forum searching for a way to make a query that retrieves
>  > document that do not have any value for a given field (say MY_FIELD_NAME).
>  >
>  >
>  > I read several posts advising to use this syntax : -MY_FIELD_NAME:[* TO *]
>  > However, I am not able to have it working...
>  >
>  >
>  > I have 2 documents, the first one has a value for the field MY_FIELD_NAME
>  > (In Luke, I can see the value) and the second one has no value for it (In
>  > Luke, I can see&lt;not available&gt;)
>  >
>  >
>  > I would expect the query MY_FIELD_NAME:[* TO *] to retrieve the document
>  > which has a value for the field MY_FIELD_NAME but it doesn't (Nevertheless
>  > MY_FIELD_NAME:[a* TO z*] retrieves it)
>  >
>  >
>  > Also, I would expect the query -MY_FIELD_NAME:[* TO *] to retrieve the
>  > document which has a NO value for the field MY_FIELD_NAME but it doesn't
>  > either...
>  >
>  >
>  > I guess I am missing something obvious but I am stuck... Anybody can help to
>  > understand what I am doing wrong?
>  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by thogau <th...@thogau.net>.
Thanks for your suggestion markmiller. When I try this query, I get both
documents as hits. The one with the field having a value and also the one
with the field not set...
Any idea why?


markrmiller wrote:
> 
> You cannot have a purely negative query like you can in Solr.
> 
> Try: *:* -MY_FIELD_NAME:[* TO *]
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-for-null-%28empty%29-fields%2C-how-to-use--field%3A-*-TO-*--tp15976538p16000127.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by Mark Miller <ma...@gmail.com>.
You cannot have a purely negative query like you can in Solr.

Try: *:* -MY_FIELD_NAME:[* TO *]
thogau wrote:
> Hi,
>
>
> I browsed the forum searching for a way to make a query that retrieves
> document that do not have any value for a given field (say MY_FIELD_NAME).
>
>
> I read several posts advising to use this syntax : -MY_FIELD_NAME:[* TO *]
> However, I am not able to have it working...
>
>
> I have 2 documents, the first one has a value for the field MY_FIELD_NAME
> (In Luke, I can see the value) and the second one has no value for it (In
> Luke, I can see&lt;not available&gt;)
>
>
> I would expect the query MY_FIELD_NAME:[* TO *] to retrieve the document
> which has a value for the field MY_FIELD_NAME but it doesn't (Nevertheless
> MY_FIELD_NAME:[a* TO z*] retrieves it)
>
>
> Also, I would expect the query -MY_FIELD_NAME:[* TO *] to retrieve the
> document which has a NO value for the field MY_FIELD_NAME but it doesn't
> either...
>
>
> I guess I am missing something obvious but I am stuck... Anybody can help to
> understand what I am doing wrong?
>
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by German Kondolf <ge...@gmail.com>.
Hi,

I was looking for the same functionality, after a short googling
didn't find a solution, I assume it must exist but I finally decided
to "fill" those empty fields with a representative "null value",
"__null__", this is possible only if you know previously ALL the
fields.

I'd like to know if there is another way that negating any possible
value (-FIELD: * TO *), it seems to be "heavier" than a specified
null-value to handle it.

Regards,

Germán Kondolf
german.kondolf@gmail.com

On Tue, Mar 11, 2008 at 11:16 AM, thogau <th...@thogau.net> wrote:
>
>  Hi,
>
>
>  I browsed the forum searching for a way to make a query that retrieves
>  document that do not have any value for a given field (say MY_FIELD_NAME).
>
>
>  I read several posts advising to use this syntax : -MY_FIELD_NAME:[* TO *]
>  However, I am not able to have it working...
>
>
>  I have 2 documents, the first one has a value for the field MY_FIELD_NAME
>  (In Luke, I can see the value) and the second one has no value for it (In
>  Luke, I can see &lt;not available&gt;)
>
>
>  I would expect the query MY_FIELD_NAME:[* TO *] to retrieve the document
>  which has a value for the field MY_FIELD_NAME but it doesn't (Nevertheless
>  MY_FIELD_NAME:[a* TO z*] retrieves it)
>
>
>  Also, I would expect the query -MY_FIELD_NAME:[* TO *] to retrieve the
>  document which has a NO value for the field MY_FIELD_NAME but it doesn't
>  either...
>
>
>  I guess I am missing something obvious but I am stuck... Anybody can help to
>  understand what I am doing wrong?
>
>  --
>  View this message in context: http://www.nabble.com/Searching-for-null-%28empty%29-fields%2C-how-to-use--field%3A-*-TO-*--tp15976538p15976538.html
>  Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by thogau <th...@thogau.net>.
Thanks Erick, I ended up by following your second suggestion.
It has been a bit tricky since I had to plug into a MapConverter but it
works as expected.
Thanks to all.

--thogau



You could also think about making a filter, probably when you open
your searcher. You can use TermDocs/TermEnum to find all of the documents
that *do* have entries for your field, assemble those into a filter, then
invert that filter. Keep the filter around and use it whenever you need
to. Perhaps CachingWrapperFilter would help here (although I've never
used the latter).

Another possibility is to index a field only for those documents that
don't have any value for MY_FIELD_NAME. So when indexing a doc, you
have something like
if (has MY_FIELD_NAME) {
   doc.add("MY_FIELD_NAME", <real value>);
} else {
   doc.add("NO_MY_FIELD_NAME", "no");
}

Now finding docs without your field really is just searching on
NO_MY_FIELD_NAME:no

Your index would be very slightly bigger in this instance....

FWIW
Erick
-- 
View this message in context: http://www.nabble.com/Searching-for-null-%28empty%29-fields%2C-how-to-use--field%3A-*-TO-*--tp15976538p16002412.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by Frank Geary <fg...@acquiremedia.com>.
Since my index cannot be re-indexed easily, I had to go with Erick's first
suggestion.  I thought others might be interested in an example of the code
(I did this using Lucene 2.4.1):

// This code worked best to deletes documents with a null field value...
BooleanQuery nullFieldsOnlyQuery = new BooleanQuery();

MatchAllDocsQuery matchAllDocsQuery = new MatchAllDocsQuery();

// ConstantScoreRangeQuery does not throw a BooleanQuery.TooManyClauses
exception.
// A regular RangeQuery does throws a BooleanQuery.TooManyClauses exception.
// Obviously the range may be different depending on the nature of the field
involved.
ConstantScoreRangeQuery nonNullFieldsRangeQuery = new
ConstantScoreRangeQuery( theFieldName,
                                                                                                                               
"0", //zero
                                                                                                                               
"z",                                                                                                                                                
true,							                                                true);

// Load up the BooleanQuery
nullFieldsOnlyQuery.add( new MatchAllDocsQuery(), BooleanClause.Occur.MUST
);
nullFieldsOnlyQuery.add( nonNullFieldsRangeQuery,
BooleanClause.Occur.MUST_NOT );

// Delete the documents from the archive index
indexWriter.deleteDocuments( nullFieldsOnlyQuery );
// End of code that worked best to delete documents with a null field value.

// The following code did NOT always work to delete documents with a null
field value.
// I did get the code below to delete documents with a null field value in
my Windows XP
// environment when I simply added a document with "" for the field value. 
However, it did 
// not work in my CentOS environment.  I imagine this failure had more to do
with the way
// those documents with a null field value ended up erroneously being
indexed in CentOS 
// rather than with the difference in the OS itself.
Term[] fieldTerms = new Term[ 1 ];
fieldTerms[0] = new Term( theFieldName, "" );
indexWriter.deleteDocuments( fieldTerms );
// End of code that did NOT always work for me.

Frank Geary
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Searching-for-null-empty-fields-how-to-use-field-TO-tp552476p1603932.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Searching for null (empty) fields, how to use -field:[* TO *]

Posted by Erick Erickson <er...@gmail.com>.
You could also think about making a filter, probably when you open
your searcher. You can use TermDocs/TermEnum to find all of the documents
that *do* have entries for your field, assemble those into a filter, then
invert that filter. Keep the filter around and use it whenever you need
to. Perhaps CachingWrapperFilter would help here (although I've never
used the latter).

Another possibility is to index a field only for those documents that
don't have any value for MY_FIELD_NAME. So when indexing a doc, you
have something like
if (has MY_FIELD_NAME) {
   doc.add("MY_FIELD_NAME", <real value>);
} else {
   doc.add("NO_MY_FIELD_NAME", "no");
}

Now finding docs without your field really is just searching on
NO_MY_FIELD_NAME:no

Your index would be very slightly bigger in this instance....

FWIW
Erick

On Tue, Mar 11, 2008 at 10:16 AM, thogau <th...@thogau.net> wrote:

>
> Hi,
>
>
> I browsed the forum searching for a way to make a query that retrieves
> document that do not have any value for a given field (say MY_FIELD_NAME).
>
>
> I read several posts advising to use this syntax : -MY_FIELD_NAME:[* TO *]
> However, I am not able to have it working...
>
>
> I have 2 documents, the first one has a value for the field MY_FIELD_NAME
> (In Luke, I can see the value) and the second one has no value for it (In
> Luke, I can see &lt;not available&gt;)
>
>
> I would expect the query MY_FIELD_NAME:[* TO *] to retrieve the document
> which has a value for the field MY_FIELD_NAME but it doesn't (Nevertheless
> MY_FIELD_NAME:[a* TO z*] retrieves it)
>
>
> Also, I would expect the query -MY_FIELD_NAME:[* TO *] to retrieve the
> document which has a NO value for the field MY_FIELD_NAME but it doesn't
> either...
>
>
> I guess I am missing something obvious but I am stuck... Anybody can help
> to
> understand what I am doing wrong?
>
> --
> View this message in context:
> http://www.nabble.com/Searching-for-null-%28empty%29-fields%2C-how-to-use--field%3A-*-TO-*--tp15976538p15976538.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>