You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rajnish kamboj <ra...@gmail.com> on 2017/07/15 05:15:46 UTC

How to fetch documents for which field is not defined

Hi
Does Lucene provide any API to fetch documents for which a field is not
defined.

Example
Document1 : field1=value1, field2=value2,field3=value3

Document2 : field1=value4, field2=value4

I want a query to get documents for which field3 is not defined. In example
it should return Document2.

Regards
Rajnish

Re: How to fetch documents for which field is not defined

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
How about Solr's exists function query? How does it work?function queries are now part of Lucene (org.apache.lucene.queries.function.) right?
Ahmet


On Sunday, July 16, 2017, 11:19:40 AM GMT+3, Trejkaz <tr...@trypticon.org> wrote:


On Sat, Jul 15, 2017 at 8:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> That is the "Solr" answer. But it is slow like hell.
>
> In Lucene there is a natove query named FieldValueQuery already for this.
> It requires DocValues enabled for the field.
>
> IMHO, the best and fastest variant (also to Solr users) is to add a separate
> multivalued string field named 'fieldnames' where you index all field named
> that have a value. After that you can query on this using the field name.
> Elasticsearch is doing the field name approach for exists/not exists by default.

The catch is, you usually have to analyse a field to determine whether
it has a value. Apparently Elasticsearch's field existence query does
not do this, so it considers blank text to be a value, which is not
the same as what the user expected when they did the query.

We *were* using FieldValueQuery, but since moving to Lucene 6 we have
stopped using uninverting reader, so that option doesn't cover all
fields, and fields like "content" aren't really practical to put in

DocValues...


The approach to add a fieldnames field works, but is fiddly at
indexing-time, because now you have to use TokenStream for all fields,
so that you can read one token from each field to test whether there
is one before you add the whole document. I guess it's at least easier
to understand how it works at query-time.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to fetch documents for which field is not defined

Posted by Trejkaz <tr...@trypticon.org>.
On Sat, Jul 15, 2017 at 8:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> That is the "Solr" answer. But it is slow like hell.
>
> In Lucene there is a natove query named FieldValueQuery already for this.
> It requires DocValues enabled for the field.
>
> IMHO, the best and fastest variant (also to Solr users) is to add a separate
> multivalued string field named 'fieldnames' where you index all field named
> that have a value. After that you can query on this using the field name.
> Elasticsearch is doing the field name approach for exists/not exists by default.

The catch is, you usually have to analyse a field to determine whether
it has a value. Apparently Elasticsearch's field existence query does
not do this, so it considers blank text to be a value, which is not
the same as what the user expected when they did the query.

We *were* using FieldValueQuery, but since moving to Lucene 6 we have
stopped using uninverting reader, so that option doesn't cover all
fields, and fields like "content" aren't really practical to put in
DocValues...

The approach to add a fieldnames field works, but is fiddly at
indexing-time, because now you have to use TokenStream for all fields,
so that you can read one token from each field to test whether there
is one before you add the whole document. I guess it's at least easier
to understand how it works at query-time.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to fetch documents for which field is not defined

Posted by Uwe Schindler <uw...@thetaphi.de>.
That is the "Solr" answer. But it is slow like hell.

In Lucene there is a natove query named FieldValueQuery already for this. It requires DocValues enabled for the field. 

IMHO, the best and fastest variant (also to Solr users) is to add a separate multivalued string field named 'fieldnames' where you index all field named that have a value. After that you can query on this using the field name. Elasticsearch is doing the field name approach for exists/not exists by default.

Uwe

Am 15. Juli 2017 11:56:16 MESZ schrieb Ahmet Arslan <io...@yahoo.com.INVALID>:
>Hi,
>Yes, here it is:  q=+*:* -field3:[* TO *]
>Ahmet
>On Saturday, July 15, 2017, 8:16:00 AM GMT+3, Rajnish kamboj
><ra...@gmail.com> wrote:
>
>
>Hi
>Does Lucene provide any API to fetch documents for which a field is not
>defined.
>
>Example
>Document1 : field1=value1, field2=value2,field3=value3
>
>Document2 : field1=value4, field2=value4
>
>I want a query to get documents for which field3 is not defined. In
>example
>it should return Document2.
>
>Regards
>Rajnish

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: How to fetch documents for which field is not defined

Posted by Rajnish kamboj <ra...@gmail.com>.
Thanks..
Which lucene version supports this and what is the performance of such
queries on large set of documents.



On Sat, 15 Jul 2017 at 3:38 PM, Ahmet Arslan <io...@yahoo.com.invalid>
wrote:

> Hi,
> As an alternative, function queries can also be used.exists function may
> be more intuitive.
> q={!func}(not(exists(field3))
> On Saturday, July 15, 2017, 1:01:04 PM GMT+3, Rajnish kamboj <
> rajnishk7.info@gmail.com> wrote:
>
>
> Ok, I will check.
>
> On Sat, 15 Jul 2017 at 3:26 PM, Ahmet Arslan <io...@yahoo.com> wrote:
>
> > Hi,
> >
> > Yes, here it is:  q=+*:* -field3:[* TO *]
> >
> > Ahmet
> >
> > On Saturday, July 15, 2017, 8:16:00 AM GMT+3, Rajnish kamboj <
> > rajnishk7.info@gmail.com> wrote:
> >
> >
> > Hi
> > Does Lucene provide any API to fetch documents for which a field is not
> > defined.
> >
> > Example
> > Document1 : field1=value1, field2=value2,field3=value3
> >
> > Document2 : field1=value4, field2=value4
> >
> > I want a query to get documents for which field3 is not defined. In
> example
> > it should return Document2.
> >
> > Regards
> > Rajnish
> >

Re: How to fetch documents for which field is not defined

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,
As an alternative, function queries can also be used.exists function may be more intuitive.
q={!func}(not(exists(field3))
On Saturday, July 15, 2017, 1:01:04 PM GMT+3, Rajnish kamboj <ra...@gmail.com> wrote:


Ok, I will check.

On Sat, 15 Jul 2017 at 3:26 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hi,
>
> Yes, here it is:  q=+*:* -field3:[* TO *]
>
> Ahmet
>
> On Saturday, July 15, 2017, 8:16:00 AM GMT+3, Rajnish kamboj <
> rajnishk7.info@gmail.com> wrote:
>
>
> Hi
> Does Lucene provide any API to fetch documents for which a field is not
> defined.
>
> Example
> Document1 : field1=value1, field2=value2,field3=value3
>
> Document2 : field1=value4, field2=value4
>
> I want a query to get documents for which field3 is not defined. In example
> it should return Document2.
>
> Regards
> Rajnish
>

Re: How to fetch documents for which field is not defined

Posted by Rajnish kamboj <ra...@gmail.com>.
Ok, I will check.

On Sat, 15 Jul 2017 at 3:26 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hi,
>
> Yes, here it is:  q=+*:* -field3:[* TO *]
>
> Ahmet
>
> On Saturday, July 15, 2017, 8:16:00 AM GMT+3, Rajnish kamboj <
> rajnishk7.info@gmail.com> wrote:
>
>
> Hi
> Does Lucene provide any API to fetch documents for which a field is not
> defined.
>
> Example
> Document1 : field1=value1, field2=value2,field3=value3
>
> Document2 : field1=value4, field2=value4
>
> I want a query to get documents for which field3 is not defined. In example
> it should return Document2.
>
> Regards
> Rajnish
>

Re: How to fetch documents for which field is not defined

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,
Yes, here it is:  q=+*:* -field3:[* TO *]
Ahmet
On Saturday, July 15, 2017, 8:16:00 AM GMT+3, Rajnish kamboj <ra...@gmail.com> wrote:


Hi
Does Lucene provide any API to fetch documents for which a field is not
defined.

Example
Document1 : field1=value1, field2=value2,field3=value3

Document2 : field1=value4, field2=value4

I want a query to get documents for which field3 is not defined. In example
it should return Document2.

Regards
Rajnish