You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Hans Lund <ha...@gmail.com> on 2016/12/07 13:39:02 UTC

FieldValueQuery

Hi All

As far as I can see FieldValueQuery ends up with fetching Bits from
DocValues.

But I'm having the need for similar functionality for Fields without
DocValue like String and TextFields and was wondering if some has had the
same issue and found a good solution.

I'm also having problems with figuring out what the purpose of the query is
from usage perspective as it is a highly specialized query for questions
like find docs that can be sort on field "foo".

For now I've circumvented it by extending the IndexWriter and within the
addDocument method create a new binaryDocValueField
with empty ByteRefs for all IndexableField having DocValueType ==
DocValueTypes.NONE.

It works but is not a pretty solution, but is there any alternatives?

/Hans Lund

Re: FieldValueQuery

Posted by Hans Lund <ha...@gmail.com>.

Of cause! Almost too obvious  ;-) thx alot - I'll spend some time wondering
why that didn't pop up in my mind as a solution.




On Thu, Dec 8, 2016 at 5:16 PM, Adrien Grand <jp...@gmail.com> wrote:

> Le jeu. 8 déc. 2016 à 16:42, Hans Lund <ha...@gmail.com> a écrit :
>
> > That would be a solution for sure - but it has the drawback of doubling
> the
> > indexed fields pr document.
> >
>
> If you want to do it for all fields, you could use the name of the field as
> a value, for instance has_field:foo, has_field:bar, etc. This is how
> Elasticsearch implements its exists query. One interesting thing to note is
> that fields that appear in all documents will require very little storage
> since the encoding we use for postings lists is optimized when all docs
> have a given value.
>

Re: FieldValueQuery

Posted by Adrien Grand <jp...@gmail.com>.

Le jeu. 8 déc. 2016 à 16:42, Hans Lund <ha...@gmail.com> a écrit :

> That would be a solution for sure - but it has the drawback of doubling the
> indexed fields pr document.
>

If you want to do it for all fields, you could use the name of the field as
a value, for instance has_field:foo, has_field:bar, etc. This is how
Elasticsearch implements its exists query. One interesting thing to note is
that fields that appear in all documents will require very little storage
since the encoding we use for postings lists is optimized when all docs
have a given value.

Re: FieldValueQuery

Posted by Hans Lund <ha...@gmail.com>.

That would be a solution for sure - but it has the drawback of doubling the
indexed fields pr document.
Looking at the field stats where this is needed we have around 600 fields
pr "document" -
Most of them already having doc values and adding 600 new fields instead of
15 BinaryDocValueField also seems like a 'non' pretty solution?
adding to the concrete complexity - Document creation is done in a
plug-able manner, so fields can be added from someone else code ;-)

As of now I just extended the indexwriter - analyzing the IndexableFields
during updateDocument and adding the BinarydocValueField where needed,
it works but looping through the fields collecting fieldNames that needs a
docValue ... hmm.

A better approach could be extending the Document, letting the iterator do
the inspection and emit the needed marker fields ?
(It would make testing the StringField vs BinaryDocValueField strategy very
simple;-))

What are the drawbacks of having such a 'marker' docValue having no actual
value?

Hans Lund

On Thu, Dec 8, 2016 at 2:51 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Unlike for doc values fields, Lucene does not store this information
> (which documents have a given indexed field) efficiently and so there
> is no query for it.
>
> If this is important to you, you could add another field for each
> indexed field?  E.g. if the document has field foo, you would also
> index has_field_foo e.g. as a StringField with the same text token
> like "1".  Then at search time you can do a TermQuery on
> has_field_foo:1.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Dec 7, 2016 at 8:39 AM, Hans Lund <ha...@gmail.com> wrote:
> > Hi All
> >
> > As far as I can see FieldValueQuery ends up with fetching Bits from
> > DocValues.
> >
> > But I'm having the need for similar functionality for Fields without
> > DocValue like String and TextFields and was wondering if some has had the
> > same issue and found a good solution.
> >
> > I'm also having problems with figuring out what the purpose of the query
> is
> > from usage perspective as it is a highly specialized query for questions
> > like find docs that can be sort on field "foo".
> >
> > For now I've circumvented it by extending the IndexWriter and within the
> > addDocument method create a new binaryDocValueField
> > with empty ByteRefs for all IndexableField having DocValueType ==
> > DocValueTypes.NONE.
> >
> > It works but is not a pretty solution, but is there any alternatives?
> >
> > /Hans Lund
>

Re: FieldValueQuery

Posted by Michael McCandless <lu...@mikemccandless.com>.

Unlike for doc values fields, Lucene does not store this information
(which documents have a given indexed field) efficiently and so there
is no query for it.

If this is important to you, you could add another field for each
indexed field?  E.g. if the document has field foo, you would also
index has_field_foo e.g. as a StringField with the same text token
like "1".  Then at search time you can do a TermQuery on
has_field_foo:1.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Dec 7, 2016 at 8:39 AM, Hans Lund <ha...@gmail.com> wrote:
> Hi All
>
> As far as I can see FieldValueQuery ends up with fetching Bits from
> DocValues.
>
> But I'm having the need for similar functionality for Fields without
> DocValue like String and TextFields and was wondering if some has had the
> same issue and found a good solution.
>
> I'm also having problems with figuring out what the purpose of the query is
> from usage perspective as it is a highly specialized query for questions
> like find docs that can be sort on field "foo".
>
> For now I've circumvented it by extending the IndexWriter and within the
> addDocument method create a new binaryDocValueField
> with empty ByteRefs for all IndexableField having DocValueType ==
> DocValueTypes.NONE.
>
> It works but is not a pretty solution, but is there any alternatives?
>
> /Hans Lund

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org