You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dominik Safaric <do...@gmail.com> on 2018/02/28 14:53:36 UTC

BinaryDocValues prefix bytes

Hi,

I'm having an index where I'm storing a binary doc value being equal to a
serialized 8 byte value. The values are consumed by a custom Query
implementation, using LeafReader.getBinaryDocValues().

However, what I found is the following. To each binary doc value returned
by BinaryDocValues.get(docID), a sequence of two bytes of appended. In
particular, at the first position it is always a byte equal to 1, whereas
at the second position always a byte equal to 8. Hence, the length of the
retrieved byte array is always equal to 10, and not 8 as stored.

Could please someone explain why are these bytes being appended at the head
of the array, where are these bytes appended and how to get the original
value?

Kind regards,
Dominik

Re: BinaryDocValues prefix bytes

Posted by Ryan Ernst <ry...@iernst.net>.
This is how Elasticsearch encodes binary values. The first value a vint
containing the number of values for the field. In Lucene, binary doc values
do not have a concept of "multi valued"; the data is opaque.

On Wed, Feb 28, 2018 at 8:25 AM Dominik Safaric <do...@gmail.com>
wrote:

> No I'm not. The values are being stored through ElasticSearch into a
> binary doc value as a base 64 encoded string.
>
> 2018-02-28 16:00 GMT+01:00 David Smiley <da...@gmail.com>:
>
>> This can't be; it must be a bug.  Perhaps you are saving away the
>> BytesRef by reference across multiple invocations?  That won't work; you
>> may have to clone/copy it.
>>
>> On Wed, Feb 28, 2018 at 9:53 AM Dominik Safaric <do...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm having an index where I'm storing a binary doc value being equal to
>>> a serialized 8 byte value. The values are consumed by a custom Query
>>> implementation, using LeafReader.getBinaryDocValues().
>>>
>>> However, what I found is the following. To each binary doc value
>>> returned by BinaryDocValues.get(docID), a sequence of two bytes of
>>> appended. In particular, at the first position it is always a byte equal to
>>> 1, whereas at the second position always a byte equal to 8. Hence, the
>>> length of the retrieved byte array is always equal to 10, and not 8 as
>>> stored.
>>>
>>> Could please someone explain why are these bytes being appended at the
>>> head of the array, where are these bytes appended and how to get the
>>> original value?
>>>
>>> Kind regards,
>>> Dominik
>>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

Re: BinaryDocValues prefix bytes

Posted by Dominik Safaric <do...@gmail.com>.
No I'm not. The values are being stored through ElasticSearch into a binary
doc value as a base 64 encoded string.

2018-02-28 16:00 GMT+01:00 David Smiley <da...@gmail.com>:

> This can't be; it must be a bug.  Perhaps you are saving away the BytesRef
> by reference across multiple invocations?  That won't work; you may have to
> clone/copy it.
>
> On Wed, Feb 28, 2018 at 9:53 AM Dominik Safaric <do...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm having an index where I'm storing a binary doc value being equal to a
>> serialized 8 byte value. The values are consumed by a custom Query
>> implementation, using LeafReader.getBinaryDocValues().
>>
>> However, what I found is the following. To each binary doc value returned
>> by BinaryDocValues.get(docID), a sequence of two bytes of appended. In
>> particular, at the first position it is always a byte equal to 1, whereas
>> at the second position always a byte equal to 8. Hence, the length of the
>> retrieved byte array is always equal to 10, and not 8 as stored.
>>
>> Could please someone explain why are these bytes being appended at the
>> head of the array, where are these bytes appended and how to get the
>> original value?
>>
>> Kind regards,
>> Dominik
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.
> solrenterprisesearchserver.com
>

Re: BinaryDocValues prefix bytes

Posted by David Smiley <da...@gmail.com>.
This can't be; it must be a bug.  Perhaps you are saving away the BytesRef
by reference across multiple invocations?  That won't work; you may have to
clone/copy it.

On Wed, Feb 28, 2018 at 9:53 AM Dominik Safaric <do...@gmail.com>
wrote:

> Hi,
>
> I'm having an index where I'm storing a binary doc value being equal to a
> serialized 8 byte value. The values are consumed by a custom Query
> implementation, using LeafReader.getBinaryDocValues().
>
> However, what I found is the following. To each binary doc value returned
> by BinaryDocValues.get(docID), a sequence of two bytes of appended. In
> particular, at the first position it is always a byte equal to 1, whereas
> at the second position always a byte equal to 8. Hence, the length of the
> retrieved byte array is always equal to 10, and not 8 as stored.
>
> Could please someone explain why are these bytes being appended at the
> head of the array, where are these bytes appended and how to get the
> original value?
>
> Kind regards,
> Dominik
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com