You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Mikhail Khludnev <mk...@griddynamics.com> on 2014/01/07 19:09:35 UTC

Re: Iterating BinaryDocValues

Joel,

I tried to hack it straightforwardly, but found no free gain there. The
only attempt I can suggest is to try to reuse bytes in
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401right
now it allocates bytes every time, which beside of GC can also impact
memory access locality. Could you try fix memory waste and repeat
performance test?

Have a good hack!


On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <jo...@gmail.com> wrote:

>
> Hi,
>
> I'm looking for a faster way to perform large scale docId -> bytesRef
> lookups for BinaryDocValues.
>
> I'm finding that I can't get the performance that I need from the random
> access seek in the BinaryDocValues interface.
>
> I'm wondering if sequentially scanning the docValues would be a faster
> approach. I have a BitSet of matching docs, so if I sequentially moved
> through the docValues I could test each one against that bitset.
>
> Wondering if that approach would be faster for bulk extracts and how
> tricky it would be to add an iterator to the BinaryDocValues interface?
>
> Thanks,
> Joel
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Iterating BinaryDocValues

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
> I don't think we should add such a method. Doc values are commonly
> read from collectors, so why do we need a method that works on top of
> a DocIdSetIterator?
on collect(docnum) collectors can access docvalues via currently present
get(), however it's not the most efficient way to access them. Let me refer
to Shai's talk at Dublin. (if you need to calculate DV facets for few field
- my assumption), the best way to store bitset, and loop by DV fields and
for every field scan a column looping by that docset, rather than loop by
docs. Solr does it by the same manner, old UnInvertFields for sure, and
recent DVFacets too, I guess. So, such column scan method makes sense from
performance consideration.

> I'm also curious how specialized implementations
> could make this method faster than the default implementation?
it can own bytes[] and write into (without ThreadLocal). Let me provide you
a scratch later.

btw,
I repeated by micro benchmark with removed random() call from the loop.

..
reuse: true took:117944 ms
...
reuse: false took:131507 ms

it seems like gain from reusing bytes[] is 10% (I understand that it's
incorrect impl, let's think how to improve it). Let me attach samples
screenshot. It's mmap directory and 4.5 codec.

Joel, it answers to your consideration: seek is not hot at all.

I'm attaching test and impl diff if you want to reproduce the measurement.



On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com> wrote:

> I don't think we should add such a method. Doc values are commonly
> read from collectors, so why do we need a method that works on top of
> a DocIdSetIterator? I'm also curious how specialized implementations
> could make this method faster than the default implementation?
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Iterating BinaryDocValues

Posted by Joel Bernstein <jo...@gmail.com>.
Bulk extracting full unsorted result sets from Solr. You give Solr a query
and it dumps the full result in a single call. The result set streaming is
in place, but throughput is not as good as I would like it.

Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jan 10, 2014 at 3:24 PM, Robert Muir <rc...@gmail.com> wrote:

> what are you doing with the data?
>
>
> On Fri, Jan 10, 2014 at 3:23 PM, Joel Bernstein <jo...@gmail.com>wrote:
>
>> I'll provide a little more context. I'm working on bulk extracting
>> BinaryDocValues. My initial performance test was with in-memory
>> binaryDocValues, but I think the end game is actually disk-based
>> binaryDocValues.
>>
>> I was able to perform around 1 million docId->BytesRef lookups per-second
>> with in-memory BinaryDocValues. Since I need to get the values for multiple
>> fields for each document, this bogs down pretty quickly.
>>
>> I'm wondering if there is a way to increase this throughput. Since
>> filling a BytesRef is pretty fast, I was assuming it was the seek that was
>> taking the time, but I didn't verify this. The first thing that came to
>> mind is iterating the docValues in such a way that the next docValue could
>> be loaded without a seek. But I haven't dug into how the BinaryDocValues
>> are formatted so I'm not sure if this would help or not. Also there could
>> be something else besides the seek that is limiting the throughput.
>>
>>
>>
>>
>>
>>
>>
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> Yeah, i dont think its from newer docvalues-using code like yours shai.
>>>
>>> instead the problems i had doing this are historical, because e.g.
>>> fieldcache pointed to large arrays and consumers were lazy about it,
>>> knowing that there reference pointed to bytes that would remain valid
>>> across invocations.
>>>
>>> we just have to remove these assumptions. I don't apologize for not
>>> doing this, as you show, its some small % improvement (which we should go
>>> and get back!), but i went with safety first initially rather than bugs.
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:
>>>
>>>> I agree with Robert. We should leave cloning BytesRefs to whoever needs
>>>> that, and not penalize everyone else who don't need it. I must say I didn't
>>>> know I can "own" those BytesRefs and I clone them whenever I need to. I
>>>> think I was bitten by one of the other APIs, so I assumed returned
>>>> BytesRefs are not "mine" across all the APIs.
>>>>
>>>> Shai
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>>> the problem is really simpler to solve actually.
>>>>>
>>>>> Look at the comments in the code, it tells you why it is this way:
>>>>>
>>>>>           // NOTE: we could have one buffer, but various consumers
>>>>> (e.g. FieldComparatorSource)
>>>>>           // assume "they" own the bytes after calling this!
>>>>>
>>>>> That is what we should fix. There is no need to make bulk APIs or even
>>>>> change the public api in any way (other than javadocs).
>>>>>
>>>>> We just move the clone'ing out of the codec, and require the consumer
>>>>> to do it, same as termsenum or other apis. The codec part is extremely
>>>>> simple here, its even the way i had it initially.
>>>>>
>>>>> But at the time (and even still now) this comes with some risk of
>>>>> bugs. So initially I removed the reuse and went with a more conservative
>>>>> approach to start with.
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>>>> mkhludnev@griddynamics.com> wrote:
>>>>>
>>>>>> Adrian,
>>>>>>
>>>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>>>> ByteRef that provides 10% gain.
>>>>>> ...
>>>>>> bulkGet took:101630 ms
>>>>>> ...
>>>>>> get took:114422 ms
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>>>
>>>>>>> I don't think we should add such a method. Doc values are commonly
>>>>>>> read from collectors, so why do we need a method that works on top of
>>>>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>>>>> could make this method faster than the default implementation?
>>>>>>>
>>>>>>> --
>>>>>>> Adrien
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely yours
>>>>>> Mikhail Khludnev
>>>>>> Principal Engineer,
>>>>>> Grid Dynamics
>>>>>>
>>>>>> <http://www.griddynamics.com>
>>>>>>  <mk...@griddynamics.com>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Joel Bernstein <jo...@gmail.com>.
The text value for the field. It's a generic bulk extract tool so whatever
the user loads.

Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jan 10, 2014 at 4:08 PM, Robert Muir <rc...@gmail.com> wrote:

> what are you storing in the binaryDV?
>
>
> On Fri, Jan 10, 2014 at 3:44 PM, Joel Bernstein <jo...@gmail.com>wrote:
>
>> For the test I ran, I just timed the number of docId->bytesRef lookups I
>> could do in a second.
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Fri, Jan 10, 2014 at 3:41 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> Are you sure its not the wire serialization etc causing the bottleneck
>>> (e.g. converting to utf-8 string and back, network traffic, json encoding,
>>> etc etc?)
>>>
>>>
>>> On Fri, Jan 10, 2014 at 3:39 PM, Joel Bernstein <jo...@gmail.com>wrote:
>>>
>>>> Bulk extracting full unsorted result sets from Solr. You give Solr a
>>>> query and it dumps the full result in a single call. The result set
>>>> streaming is in place, but throughput is not as good as I would like it.
>>>>
>>>> Joel Bernstein
>>>> Search Engineer at Heliosearch
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 3:24 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>>> what are you doing with the data?
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 3:23 PM, Joel Bernstein <jo...@gmail.com>wrote:
>>>>>
>>>>>> I'll provide a little more context. I'm working on bulk extracting
>>>>>> BinaryDocValues. My initial performance test was with in-memory
>>>>>> binaryDocValues, but I think the end game is actually disk-based
>>>>>> binaryDocValues.
>>>>>>
>>>>>> I was able to perform around 1 million docId->BytesRef lookups
>>>>>> per-second with in-memory BinaryDocValues. Since I need to get the values
>>>>>> for multiple fields for each document, this bogs down pretty quickly.
>>>>>>
>>>>>> I'm wondering if there is a way to increase this throughput. Since
>>>>>> filling a BytesRef is pretty fast, I was assuming it was the seek that was
>>>>>> taking the time, but I didn't verify this. The first thing that came to
>>>>>> mind is iterating the docValues in such a way that the next docValue could
>>>>>> be loaded without a seek. But I haven't dug into how the BinaryDocValues
>>>>>> are formatted so I'm not sure if this would help or not. Also there could
>>>>>> be something else besides the seek that is limiting the throughput.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Joel Bernstein
>>>>>> Search Engineer at Heliosearch
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com>wrote:
>>>>>>
>>>>>>> Yeah, i dont think its from newer docvalues-using code like yours
>>>>>>> shai.
>>>>>>>
>>>>>>> instead the problems i had doing this are historical, because e.g.
>>>>>>> fieldcache pointed to large arrays and consumers were lazy about it,
>>>>>>> knowing that there reference pointed to bytes that would remain valid
>>>>>>> across invocations.
>>>>>>>
>>>>>>> we just have to remove these assumptions. I don't apologize for not
>>>>>>> doing this, as you show, its some small % improvement (which we should go
>>>>>>> and get back!), but i went with safety first initially rather than bugs.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com>wrote:
>>>>>>>
>>>>>>>> I agree with Robert. We should leave cloning BytesRefs to whoever
>>>>>>>> needs that, and not penalize everyone else who don't need it. I must say I
>>>>>>>> didn't know I can "own" those BytesRefs and I clone them whenever I need
>>>>>>>> to. I think I was bitten by one of the other APIs, so I assumed returned
>>>>>>>> BytesRefs are not "mine" across all the APIs.
>>>>>>>>
>>>>>>>> Shai
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> the problem is really simpler to solve actually.
>>>>>>>>>
>>>>>>>>> Look at the comments in the code, it tells you why it is this way:
>>>>>>>>>
>>>>>>>>>           // NOTE: we could have one buffer, but various consumers
>>>>>>>>> (e.g. FieldComparatorSource)
>>>>>>>>>           // assume "they" own the bytes after calling this!
>>>>>>>>>
>>>>>>>>> That is what we should fix. There is no need to make bulk APIs or
>>>>>>>>> even change the public api in any way (other than javadocs).
>>>>>>>>>
>>>>>>>>> We just move the clone'ing out of the codec, and require the
>>>>>>>>> consumer to do it, same as termsenum or other apis. The codec part is
>>>>>>>>> extremely simple here, its even the way i had it initially.
>>>>>>>>>
>>>>>>>>> But at the time (and even still now) this comes with some risk of
>>>>>>>>> bugs. So initially I removed the reuse and went with a more conservative
>>>>>>>>> approach to start with.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>>>>>>>> mkhludnev@griddynamics.com> wrote:
>>>>>>>>>
>>>>>>>>>> Adrian,
>>>>>>>>>>
>>>>>>>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>>>>>>>> ByteRef that provides 10% gain.
>>>>>>>>>> ...
>>>>>>>>>> bulkGet took:101630 ms
>>>>>>>>>> ...
>>>>>>>>>> get took:114422 ms
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't think we should add such a method. Doc values are
>>>>>>>>>>> commonly
>>>>>>>>>>> read from collectors, so why do we need a method that works on
>>>>>>>>>>> top of
>>>>>>>>>>> a DocIdSetIterator? I'm also curious how specialized
>>>>>>>>>>> implementations
>>>>>>>>>>> could make this method faster than the default implementation?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Adrien
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Sincerely yours
>>>>>>>>>> Mikhail Khludnev
>>>>>>>>>> Principal Engineer,
>>>>>>>>>> Grid Dynamics
>>>>>>>>>>
>>>>>>>>>> <http://www.griddynamics.com>
>>>>>>>>>>  <mk...@griddynamics.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Robert Muir <rc...@gmail.com>.
what are you storing in the binaryDV?


On Fri, Jan 10, 2014 at 3:44 PM, Joel Bernstein <jo...@gmail.com> wrote:

> For the test I ran, I just timed the number of docId->bytesRef lookups I
> could do in a second.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Fri, Jan 10, 2014 at 3:41 PM, Robert Muir <rc...@gmail.com> wrote:
>
>> Are you sure its not the wire serialization etc causing the bottleneck
>> (e.g. converting to utf-8 string and back, network traffic, json encoding,
>> etc etc?)
>>
>>
>> On Fri, Jan 10, 2014 at 3:39 PM, Joel Bernstein <jo...@gmail.com>wrote:
>>
>>> Bulk extracting full unsorted result sets from Solr. You give Solr a
>>> query and it dumps the full result in a single call. The result set
>>> streaming is in place, but throughput is not as good as I would like it.
>>>
>>> Joel Bernstein
>>> Search Engineer at Heliosearch
>>>
>>>
>>> On Fri, Jan 10, 2014 at 3:24 PM, Robert Muir <rc...@gmail.com> wrote:
>>>
>>>> what are you doing with the data?
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 3:23 PM, Joel Bernstein <jo...@gmail.com>wrote:
>>>>
>>>>> I'll provide a little more context. I'm working on bulk extracting
>>>>> BinaryDocValues. My initial performance test was with in-memory
>>>>> binaryDocValues, but I think the end game is actually disk-based
>>>>> binaryDocValues.
>>>>>
>>>>> I was able to perform around 1 million docId->BytesRef lookups
>>>>> per-second with in-memory BinaryDocValues. Since I need to get the values
>>>>> for multiple fields for each document, this bogs down pretty quickly.
>>>>>
>>>>> I'm wondering if there is a way to increase this throughput. Since
>>>>> filling a BytesRef is pretty fast, I was assuming it was the seek that was
>>>>> taking the time, but I didn't verify this. The first thing that came to
>>>>> mind is iterating the docValues in such a way that the next docValue could
>>>>> be loaded without a seek. But I haven't dug into how the BinaryDocValues
>>>>> are formatted so I'm not sure if this would help or not. Also there could
>>>>> be something else besides the seek that is limiting the throughput.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Joel Bernstein
>>>>> Search Engineer at Heliosearch
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>>
>>>>>> Yeah, i dont think its from newer docvalues-using code like yours
>>>>>> shai.
>>>>>>
>>>>>> instead the problems i had doing this are historical, because e.g.
>>>>>> fieldcache pointed to large arrays and consumers were lazy about it,
>>>>>> knowing that there reference pointed to bytes that would remain valid
>>>>>> across invocations.
>>>>>>
>>>>>> we just have to remove these assumptions. I don't apologize for not
>>>>>> doing this, as you show, its some small % improvement (which we should go
>>>>>> and get back!), but i went with safety first initially rather than bugs.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:
>>>>>>
>>>>>>> I agree with Robert. We should leave cloning BytesRefs to whoever
>>>>>>> needs that, and not penalize everyone else who don't need it. I must say I
>>>>>>> didn't know I can "own" those BytesRefs and I clone them whenever I need
>>>>>>> to. I think I was bitten by one of the other APIs, so I assumed returned
>>>>>>> BytesRefs are not "mine" across all the APIs.
>>>>>>>
>>>>>>> Shai
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com>wrote:
>>>>>>>
>>>>>>>> the problem is really simpler to solve actually.
>>>>>>>>
>>>>>>>> Look at the comments in the code, it tells you why it is this way:
>>>>>>>>
>>>>>>>>           // NOTE: we could have one buffer, but various consumers
>>>>>>>> (e.g. FieldComparatorSource)
>>>>>>>>           // assume "they" own the bytes after calling this!
>>>>>>>>
>>>>>>>> That is what we should fix. There is no need to make bulk APIs or
>>>>>>>> even change the public api in any way (other than javadocs).
>>>>>>>>
>>>>>>>> We just move the clone'ing out of the codec, and require the
>>>>>>>> consumer to do it, same as termsenum or other apis. The codec part is
>>>>>>>> extremely simple here, its even the way i had it initially.
>>>>>>>>
>>>>>>>> But at the time (and even still now) this comes with some risk of
>>>>>>>> bugs. So initially I removed the reuse and went with a more conservative
>>>>>>>> approach to start with.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>>>>>>> mkhludnev@griddynamics.com> wrote:
>>>>>>>>
>>>>>>>>> Adrian,
>>>>>>>>>
>>>>>>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>>>>>>> ByteRef that provides 10% gain.
>>>>>>>>> ...
>>>>>>>>> bulkGet took:101630 ms
>>>>>>>>> ...
>>>>>>>>> get took:114422 ms
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> I don't think we should add such a method. Doc values are commonly
>>>>>>>>>> read from collectors, so why do we need a method that works on
>>>>>>>>>> top of
>>>>>>>>>> a DocIdSetIterator? I'm also curious how specialized
>>>>>>>>>> implementations
>>>>>>>>>> could make this method faster than the default implementation?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Adrien
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sincerely yours
>>>>>>>>> Mikhail Khludnev
>>>>>>>>> Principal Engineer,
>>>>>>>>> Grid Dynamics
>>>>>>>>>
>>>>>>>>> <http://www.griddynamics.com>
>>>>>>>>>  <mk...@griddynamics.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Joel Bernstein <jo...@gmail.com>.
For the test I ran, I just timed the number of docId->bytesRef lookups I
could do in a second.

Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jan 10, 2014 at 3:41 PM, Robert Muir <rc...@gmail.com> wrote:

> Are you sure its not the wire serialization etc causing the bottleneck
> (e.g. converting to utf-8 string and back, network traffic, json encoding,
> etc etc?)
>
>
> On Fri, Jan 10, 2014 at 3:39 PM, Joel Bernstein <jo...@gmail.com>wrote:
>
>> Bulk extracting full unsorted result sets from Solr. You give Solr a
>> query and it dumps the full result in a single call. The result set
>> streaming is in place, but throughput is not as good as I would like it.
>>
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Fri, Jan 10, 2014 at 3:24 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> what are you doing with the data?
>>>
>>>
>>> On Fri, Jan 10, 2014 at 3:23 PM, Joel Bernstein <jo...@gmail.com>wrote:
>>>
>>>> I'll provide a little more context. I'm working on bulk extracting
>>>> BinaryDocValues. My initial performance test was with in-memory
>>>> binaryDocValues, but I think the end game is actually disk-based
>>>> binaryDocValues.
>>>>
>>>> I was able to perform around 1 million docId->BytesRef lookups
>>>> per-second with in-memory BinaryDocValues. Since I need to get the values
>>>> for multiple fields for each document, this bogs down pretty quickly.
>>>>
>>>> I'm wondering if there is a way to increase this throughput. Since
>>>> filling a BytesRef is pretty fast, I was assuming it was the seek that was
>>>> taking the time, but I didn't verify this. The first thing that came to
>>>> mind is iterating the docValues in such a way that the next docValue could
>>>> be loaded without a seek. But I haven't dug into how the BinaryDocValues
>>>> are formatted so I'm not sure if this would help or not. Also there could
>>>> be something else besides the seek that is limiting the throughput.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Joel Bernstein
>>>> Search Engineer at Heliosearch
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>>> Yeah, i dont think its from newer docvalues-using code like yours shai.
>>>>>
>>>>> instead the problems i had doing this are historical, because e.g.
>>>>> fieldcache pointed to large arrays and consumers were lazy about it,
>>>>> knowing that there reference pointed to bytes that would remain valid
>>>>> across invocations.
>>>>>
>>>>> we just have to remove these assumptions. I don't apologize for not
>>>>> doing this, as you show, its some small % improvement (which we should go
>>>>> and get back!), but i went with safety first initially rather than bugs.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:
>>>>>
>>>>>> I agree with Robert. We should leave cloning BytesRefs to whoever
>>>>>> needs that, and not penalize everyone else who don't need it. I must say I
>>>>>> didn't know I can "own" those BytesRefs and I clone them whenever I need
>>>>>> to. I think I was bitten by one of the other APIs, so I assumed returned
>>>>>> BytesRefs are not "mine" across all the APIs.
>>>>>>
>>>>>> Shai
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com>wrote:
>>>>>>
>>>>>>> the problem is really simpler to solve actually.
>>>>>>>
>>>>>>> Look at the comments in the code, it tells you why it is this way:
>>>>>>>
>>>>>>>           // NOTE: we could have one buffer, but various consumers
>>>>>>> (e.g. FieldComparatorSource)
>>>>>>>           // assume "they" own the bytes after calling this!
>>>>>>>
>>>>>>> That is what we should fix. There is no need to make bulk APIs or
>>>>>>> even change the public api in any way (other than javadocs).
>>>>>>>
>>>>>>> We just move the clone'ing out of the codec, and require the
>>>>>>> consumer to do it, same as termsenum or other apis. The codec part is
>>>>>>> extremely simple here, its even the way i had it initially.
>>>>>>>
>>>>>>> But at the time (and even still now) this comes with some risk of
>>>>>>> bugs. So initially I removed the reuse and went with a more conservative
>>>>>>> approach to start with.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>>>>>> mkhludnev@griddynamics.com> wrote:
>>>>>>>
>>>>>>>> Adrian,
>>>>>>>>
>>>>>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>>>>>> ByteRef that provides 10% gain.
>>>>>>>> ...
>>>>>>>> bulkGet took:101630 ms
>>>>>>>> ...
>>>>>>>> get took:114422 ms
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> I don't think we should add such a method. Doc values are commonly
>>>>>>>>> read from collectors, so why do we need a method that works on top
>>>>>>>>> of
>>>>>>>>> a DocIdSetIterator? I'm also curious how specialized
>>>>>>>>> implementations
>>>>>>>>> could make this method faster than the default implementation?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Adrien
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sincerely yours
>>>>>>>> Mikhail Khludnev
>>>>>>>> Principal Engineer,
>>>>>>>> Grid Dynamics
>>>>>>>>
>>>>>>>> <http://www.griddynamics.com>
>>>>>>>>  <mk...@griddynamics.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Robert Muir <rc...@gmail.com>.
what are you doing with the data?


On Fri, Jan 10, 2014 at 3:23 PM, Joel Bernstein <jo...@gmail.com> wrote:

> I'll provide a little more context. I'm working on bulk extracting
> BinaryDocValues. My initial performance test was with in-memory
> binaryDocValues, but I think the end game is actually disk-based
> binaryDocValues.
>
> I was able to perform around 1 million docId->BytesRef lookups per-second
> with in-memory BinaryDocValues. Since I need to get the values for multiple
> fields for each document, this bogs down pretty quickly.
>
> I'm wondering if there is a way to increase this throughput. Since filling
> a BytesRef is pretty fast, I was assuming it was the seek that was taking
> the time, but I didn't verify this. The first thing that came to mind is
> iterating the docValues in such a way that the next docValue could be
> loaded without a seek. But I haven't dug into how the BinaryDocValues are
> formatted so I'm not sure if this would help or not. Also there could be
> something else besides the seek that is limiting the throughput.
>
>
>
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com> wrote:
>
>> Yeah, i dont think its from newer docvalues-using code like yours shai.
>>
>> instead the problems i had doing this are historical, because e.g.
>> fieldcache pointed to large arrays and consumers were lazy about it,
>> knowing that there reference pointed to bytes that would remain valid
>> across invocations.
>>
>> we just have to remove these assumptions. I don't apologize for not doing
>> this, as you show, its some small % improvement (which we should go and get
>> back!), but i went with safety first initially rather than bugs.
>>
>>
>>
>> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:
>>
>>> I agree with Robert. We should leave cloning BytesRefs to whoever needs
>>> that, and not penalize everyone else who don't need it. I must say I didn't
>>> know I can "own" those BytesRefs and I clone them whenever I need to. I
>>> think I was bitten by one of the other APIs, so I assumed returned
>>> BytesRefs are not "mine" across all the APIs.
>>>
>>> Shai
>>>
>>>
>>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com> wrote:
>>>
>>>> the problem is really simpler to solve actually.
>>>>
>>>> Look at the comments in the code, it tells you why it is this way:
>>>>
>>>>           // NOTE: we could have one buffer, but various consumers
>>>> (e.g. FieldComparatorSource)
>>>>           // assume "they" own the bytes after calling this!
>>>>
>>>> That is what we should fix. There is no need to make bulk APIs or even
>>>> change the public api in any way (other than javadocs).
>>>>
>>>> We just move the clone'ing out of the codec, and require the consumer
>>>> to do it, same as termsenum or other apis. The codec part is extremely
>>>> simple here, its even the way i had it initially.
>>>>
>>>> But at the time (and even still now) this comes with some risk of bugs.
>>>> So initially I removed the reuse and went with a more conservative approach
>>>> to start with.
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>>> mkhludnev@griddynamics.com> wrote:
>>>>
>>>>> Adrian,
>>>>>
>>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>>> ByteRef that provides 10% gain.
>>>>> ...
>>>>> bulkGet took:101630 ms
>>>>> ...
>>>>> get took:114422 ms
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>>
>>>>>> I don't think we should add such a method. Doc values are commonly
>>>>>> read from collectors, so why do we need a method that works on top of
>>>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>>>> could make this method faster than the default implementation?
>>>>>>
>>>>>> --
>>>>>> Adrien
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>>  <mk...@griddynamics.com>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Joel Bernstein <jo...@gmail.com>.
I'll provide a little more context. I'm working on bulk extracting
BinaryDocValues. My initial performance test was with in-memory
binaryDocValues, but I think the end game is actually disk-based
binaryDocValues.

I was able to perform around 1 million docId->BytesRef lookups per-second
with in-memory BinaryDocValues. Since I need to get the values for multiple
fields for each document, this bogs down pretty quickly.

I'm wondering if there is a way to increase this throughput. Since filling
a BytesRef is pretty fast, I was assuming it was the seek that was taking
the time, but I didn't verify this. The first thing that came to mind is
iterating the docValues in such a way that the next docValue could be
loaded without a seek. But I haven't dug into how the BinaryDocValues are
formatted so I'm not sure if this would help or not. Also there could be
something else besides the seek that is limiting the throughput.








Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com> wrote:

> Yeah, i dont think its from newer docvalues-using code like yours shai.
>
> instead the problems i had doing this are historical, because e.g.
> fieldcache pointed to large arrays and consumers were lazy about it,
> knowing that there reference pointed to bytes that would remain valid
> across invocations.
>
> we just have to remove these assumptions. I don't apologize for not doing
> this, as you show, its some small % improvement (which we should go and get
> back!), but i went with safety first initially rather than bugs.
>
>
>
> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:
>
>> I agree with Robert. We should leave cloning BytesRefs to whoever needs
>> that, and not penalize everyone else who don't need it. I must say I didn't
>> know I can "own" those BytesRefs and I clone them whenever I need to. I
>> think I was bitten by one of the other APIs, so I assumed returned
>> BytesRefs are not "mine" across all the APIs.
>>
>> Shai
>>
>>
>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> the problem is really simpler to solve actually.
>>>
>>> Look at the comments in the code, it tells you why it is this way:
>>>
>>>           // NOTE: we could have one buffer, but various consumers (e.g.
>>> FieldComparatorSource)
>>>           // assume "they" own the bytes after calling this!
>>>
>>> That is what we should fix. There is no need to make bulk APIs or even
>>> change the public api in any way (other than javadocs).
>>>
>>> We just move the clone'ing out of the codec, and require the consumer to
>>> do it, same as termsenum or other apis. The codec part is extremely simple
>>> here, its even the way i had it initially.
>>>
>>> But at the time (and even still now) this comes with some risk of bugs.
>>> So initially I removed the reuse and went with a more conservative approach
>>> to start with.
>>>
>>>
>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Adrian,
>>>>
>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>> ByteRef that provides 10% gain.
>>>> ...
>>>> bulkGet took:101630 ms
>>>> ...
>>>> get took:114422 ms
>>>>
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>
>>>>> I don't think we should add such a method. Doc values are commonly
>>>>> read from collectors, so why do we need a method that works on top of
>>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>>> could make this method faster than the default implementation?
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <mk...@griddynamics.com>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Robert Muir <rc...@gmail.com>.
I made an issue: https://issues.apache.org/jira/browse/LUCENE-5393


On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <rc...@gmail.com> wrote:

> Yeah, i dont think its from newer docvalues-using code like yours shai.
>
> instead the problems i had doing this are historical, because e.g.
> fieldcache pointed to large arrays and consumers were lazy about it,
> knowing that there reference pointed to bytes that would remain valid
> across invocations.
>
> we just have to remove these assumptions. I don't apologize for not doing
> this, as you show, its some small % improvement (which we should go and get
> back!), but i went with safety first initially rather than bugs.
>
>
>
> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:
>
>> I agree with Robert. We should leave cloning BytesRefs to whoever needs
>> that, and not penalize everyone else who don't need it. I must say I didn't
>> know I can "own" those BytesRefs and I clone them whenever I need to. I
>> think I was bitten by one of the other APIs, so I assumed returned
>> BytesRefs are not "mine" across all the APIs.
>>
>> Shai
>>
>>
>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> the problem is really simpler to solve actually.
>>>
>>> Look at the comments in the code, it tells you why it is this way:
>>>
>>>           // NOTE: we could have one buffer, but various consumers (e.g.
>>> FieldComparatorSource)
>>>           // assume "they" own the bytes after calling this!
>>>
>>> That is what we should fix. There is no need to make bulk APIs or even
>>> change the public api in any way (other than javadocs).
>>>
>>> We just move the clone'ing out of the codec, and require the consumer to
>>> do it, same as termsenum or other apis. The codec part is extremely simple
>>> here, its even the way i had it initially.
>>>
>>> But at the time (and even still now) this comes with some risk of bugs.
>>> So initially I removed the reuse and went with a more conservative approach
>>> to start with.
>>>
>>>
>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Adrian,
>>>>
>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>> ByteRef that provides 10% gain.
>>>> ...
>>>> bulkGet took:101630 ms
>>>> ...
>>>> get took:114422 ms
>>>>
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com>wrote:
>>>>
>>>>> I don't think we should add such a method. Doc values are commonly
>>>>> read from collectors, so why do we need a method that works on top of
>>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>>> could make this method faster than the default implementation?
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <mk...@griddynamics.com>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Posted by Robert Muir <rc...@gmail.com>.
Yeah, i dont think its from newer docvalues-using code like yours shai.

instead the problems i had doing this are historical, because e.g.
fieldcache pointed to large arrays and consumers were lazy about it,
knowing that there reference pointed to bytes that would remain valid
across invocations.

we just have to remove these assumptions. I don't apologize for not doing
this, as you show, its some small % improvement (which we should go and get
back!), but i went with safety first initially rather than bugs.


On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <se...@gmail.com> wrote:

> I agree with Robert. We should leave cloning BytesRefs to whoever needs
> that, and not penalize everyone else who don't need it. I must say I didn't
> know I can "own" those BytesRefs and I clone them whenever I need to. I
> think I was bitten by one of the other APIs, so I assumed returned
> BytesRefs are not "mine" across all the APIs.
>
> Shai
>
>
> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com> wrote:
>
>> the problem is really simpler to solve actually.
>>
>> Look at the comments in the code, it tells you why it is this way:
>>
>>           // NOTE: we could have one buffer, but various consumers (e.g.
>> FieldComparatorSource)
>>           // assume "they" own the bytes after calling this!
>>
>> That is what we should fix. There is no need to make bulk APIs or even
>> change the public api in any way (other than javadocs).
>>
>> We just move the clone'ing out of the codec, and require the consumer to
>> do it, same as termsenum or other apis. The codec part is extremely simple
>> here, its even the way i had it initially.
>>
>> But at the time (and even still now) this comes with some risk of bugs.
>> So initially I removed the reuse and went with a more conservative approach
>> to start with.
>>
>>
>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>> mkhludnev@griddynamics.com> wrote:
>>
>>> Adrian,
>>>
>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses ByteRef
>>> that provides 10% gain.
>>> ...
>>> bulkGet took:101630 ms
>>> ...
>>> get took:114422 ms
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com> wrote:
>>>
>>>> I don't think we should add such a method. Doc values are commonly
>>>> read from collectors, so why do we need a method that works on top of
>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>> could make this method faster than the default implementation?
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mk...@griddynamics.com>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>>
>

Re: Iterating BinaryDocValues

Posted by Shai Erera <se...@gmail.com>.
I agree with Robert. We should leave cloning BytesRefs to whoever needs
that, and not penalize everyone else who don't need it. I must say I didn't
know I can "own" those BytesRefs and I clone them whenever I need to. I
think I was bitten by one of the other APIs, so I assumed returned
BytesRefs are not "mine" across all the APIs.

Shai


On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <rc...@gmail.com> wrote:

> the problem is really simpler to solve actually.
>
> Look at the comments in the code, it tells you why it is this way:
>
>           // NOTE: we could have one buffer, but various consumers (e.g.
> FieldComparatorSource)
>           // assume "they" own the bytes after calling this!
>
> That is what we should fix. There is no need to make bulk APIs or even
> change the public api in any way (other than javadocs).
>
> We just move the clone'ing out of the codec, and require the consumer to
> do it, same as termsenum or other apis. The codec part is extremely simple
> here, its even the way i had it initially.
>
> But at the time (and even still now) this comes with some risk of bugs. So
> initially I removed the reuse and went with a more conservative approach to
> start with.
>
>
> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Adrian,
>>
>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses ByteRef
>> that provides 10% gain.
>> ...
>> bulkGet took:101630 ms
>> ...
>> get took:114422 ms
>>
>>
>>
>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com> wrote:
>>
>>> I don't think we should add such a method. Doc values are commonly
>>> read from collectors, so why do we need a method that works on top of
>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>> could make this method faster than the default implementation?
>>>
>>> --
>>> Adrien
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>

Re: Iterating BinaryDocValues

Posted by Robert Muir <rc...@gmail.com>.
the problem is really simpler to solve actually.

Look at the comments in the code, it tells you why it is this way:

          // NOTE: we could have one buffer, but various consumers (e.g.
FieldComparatorSource)
          // assume "they" own the bytes after calling this!

That is what we should fix. There is no need to make bulk APIs or even
change the public api in any way (other than javadocs).

We just move the clone'ing out of the codec, and require the consumer to do
it, same as termsenum or other apis. The codec part is extremely simple
here, its even the way i had it initially.

But at the time (and even still now) this comes with some risk of bugs. So
initially I removed the reuse and went with a more conservative approach to
start with.


On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Adrian,
>
> Please find bulkGet() scratch. It's ugly copy-paste, just reuses ByteRef
> that provides 10% gain.
> ...
> bulkGet took:101630 ms
> ...
> get took:114422 ms
>
>
>
> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com> wrote:
>
>> I don't think we should add such a method. Doc values are commonly
>> read from collectors, so why do we need a method that works on top of
>> a DocIdSetIterator? I'm also curious how specialized implementations
>> could make this method faster than the default implementation?
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

Re: Iterating BinaryDocValues

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Adrian,

Please find bulkGet() scratch. It's ugly copy-paste, just reuses ByteRef
that provides 10% gain.
...
bulkGet took:101630 ms
...
get took:114422 ms



On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <jp...@gmail.com> wrote:

> I don't think we should add such a method. Doc values are commonly
> read from collectors, so why do we need a method that works on top of
> a DocIdSetIterator? I'm also curious how specialized implementations
> could make this method faster than the default implementation?
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Iterating BinaryDocValues

Posted by Adrien Grand <jp...@gmail.com>.
I don't think we should add such a method. Doc values are commonly
read from collectors, so why do we need a method that works on top of
a DocIdSetIterator? I'm also curious how specialized implementations
could make this method faster than the default implementation?

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Iterating BinaryDocValues

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Adrian,

Thanks for your answer! You highlighted the point which I completely
missed! Again, I see that noob questions are quite useful.

On Fri, Jan 10, 2014 at 10:09 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Mikhail,
>
> These new byte[] are done on purpose because the contract on the
> BytesRef class is that it is only a reference to bytes stored
> somewhere, not a buffer. So it is always legal to change offset,
> length and the bytes reference, but the content of the byte[] array
> may only be changed if you own the BytesRef object.

This aspect of usage isn't obvious to me at all. At least it's worth to
clarify this rule in javadoc!


> So for example, it
> is legal for a BytesRefIterator to use a private BytesRef as a mutable
> buffer and to expose it in calls to next() but it is illegal to modify
> the content of the byte[] in BinaryDocValues.get since the bytes are
> provided by consumers of the API.
>
> If we would like to be able to remove these byte[] creations, I think
> we would either need:
>  1. to use a byte buffer instead of a BytesRef object,
>  2. or to make the BytesRef objects private to the BinaryDocValues
> instances and have a "BytesRef get(int docId)" method instead of "void
> get(int docId, BytesRef ref)"
>
> Although option 1 would avoid these byte[] creations, it would make
> the paged-bytes-based implementations a bit slower (eg.
> MemoryDocValuesFormat exposes binary doc values through paged bytes)
> because they would need to copy bytes instead of just changing
> pointers. With option 2, the BinaryDocValues object wouldn't be
> thread-safe by nature anymore, but I don't think this is an issue as
> we already store them in thread-locals.
>

Agree 1 is not fancy, but I feel uncomfortable relying on ThreadLocals. I
started from what Joel suggested - from column reading API, but lately I
rejected it as unnecessary. It might make sense to look on it again.

public abstract class BinaryDocValues {
  .....
  /**
   * the purpose is an efficient sequential memory access
   * @throws IOException if anything sad happens
   * */
  public BytesRefIterator getColumn(final DocIdSetIterator docIds) throws
IOException{
    return new BytesRefIterator (){

      private BytesRef ref = new BytesRef();

      @Override
      public BytesRef next() throws IOException {
        int doc = docIds.nextDoc();
        if(doc==DocIdSetIterator.NO_MORE_DOCS){
          return null;
        } else {
          get(doc, ref);
        }
        return ref;
      }
    };
  }
 }

It's a concrete impl which backed on current method. We don't need to amend
all currently existing implementations. Certain impls can provide an
efficient Iterator.

Anyway, whether you like getColumn() or not, if you raise an issue I can
work on patch.


>
> On Thu, Jan 9, 2014 at 5:27 PM, Mikhail Khludnev
> <mk...@griddynamics.com> wrote:
> > Don't you think it's worth to raise a jira regarding those 'new bytes[]'
> ?
> > I'm able to provide a patch if you wish.
> >
> >
> > On Wed, Jan 8, 2014 at 2:02 PM, Mikhail Khludnev
> > <mk...@griddynamics.com> wrote:
> >>
> >> FWIW,
> >>
> >> Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short
> >> binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap
> >> directory.
> >> I wonder why it doesn't reads into incoming bytes
> >>
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
> >>
> >>
> >>
> >> On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless
> >> <lu...@mikemccandless.com> wrote:
> >>>
> >>> Going sequentially should help, if the pages are not hot (in the OS's
> IO
> >>> cache).
> >>>
> >>> You can also use a different DVFormat, e.g. Direct, but this holds all
> >>> bytes in RAM.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
> >>> <mk...@griddynamics.com> wrote:
> >>> > Joel,
> >>> >
> >>> > I tried to hack it straightforwardly, but found no free gain there.
> The
> >>> > only
> >>> > attempt I can suggest is to try to reuse bytes in
> >>> >
> >>> >
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
> >>> > right now it allocates bytes every time, which beside of GC can also
> >>> > impact
> >>> > memory access locality. Could you try fix memory waste and repeat
> >>> > performance test?
> >>> >
> >>> > Have a good hack!
> >>> >
> >>> >
> >>> > On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <jo...@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I'm looking for a faster way to perform large scale docId ->
> bytesRef
> >>> >> lookups for BinaryDocValues.
> >>> >>
> >>> >> I'm finding that I can't get the performance that I need from the
> >>> >> random
> >>> >> access seek in the BinaryDocValues interface.
> >>> >>
> >>> >> I'm wondering if sequentially scanning the docValues would be a
> faster
> >>> >> approach. I have a BitSet of matching docs, so if I sequentially
> moved
> >>> >> through the docValues I could test each one against that bitset.
> >>> >>
> >>> >> Wondering if that approach would be faster for bulk extracts and how
> >>> >> tricky it would be to add an iterator to the BinaryDocValues
> >>> >> interface?
> >>> >>
> >>> >> Thanks,
> >>> >> Joel
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Sincerely yours
> >>> > Mikhail Khludnev
> >>> > Principal Engineer,
> >>> > Grid Dynamics
> >>> >
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> >
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Iterating BinaryDocValues

Posted by Adrien Grand <jp...@gmail.com>.
Hi Mikhail,

These new byte[] are done on purpose because the contract on the
BytesRef class is that it is only a reference to bytes stored
somewhere, not a buffer. So it is always legal to change offset,
length and the bytes reference, but the content of the byte[] array
may only be changed if you own the BytesRef object. So for example, it
is legal for a BytesRefIterator to use a private BytesRef as a mutable
buffer and to expose it in calls to next() but it is illegal to modify
the content of the byte[] in BinaryDocValues.get since the bytes are
provided by consumers of the API.

If we would like to be able to remove these byte[] creations, I think
we would either need:
 1. to use a byte buffer instead of a BytesRef object,
 2. or to make the BytesRef objects private to the BinaryDocValues
instances and have a "BytesRef get(int docId)" method instead of "void
get(int docId, BytesRef ref)"

Although option 1 would avoid these byte[] creations, it would make
the paged-bytes-based implementations a bit slower (eg.
MemoryDocValuesFormat exposes binary doc values through paged bytes)
because they would need to copy bytes instead of just changing
pointers. With option 2, the BinaryDocValues object wouldn't be
thread-safe by nature anymore, but I don't think this is an issue as
we already store them in thread-locals.


On Thu, Jan 9, 2014 at 5:27 PM, Mikhail Khludnev
<mk...@griddynamics.com> wrote:
> Don't you think it's worth to raise a jira regarding those 'new bytes[]' ?
> I'm able to provide a patch if you wish.
>
>
> On Wed, Jan 8, 2014 at 2:02 PM, Mikhail Khludnev
> <mk...@griddynamics.com> wrote:
>>
>> FWIW,
>>
>> Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short
>> binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap
>> directory.
>> I wonder why it doesn't reads into incoming bytes
>> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
>>
>>
>>
>> On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>>
>>> Going sequentially should help, if the pages are not hot (in the OS's IO
>>> cache).
>>>
>>> You can also use a different DVFormat, e.g. Direct, but this holds all
>>> bytes in RAM.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
>>> <mk...@griddynamics.com> wrote:
>>> > Joel,
>>> >
>>> > I tried to hack it straightforwardly, but found no free gain there. The
>>> > only
>>> > attempt I can suggest is to try to reuse bytes in
>>> >
>>> > https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
>>> > right now it allocates bytes every time, which beside of GC can also
>>> > impact
>>> > memory access locality. Could you try fix memory waste and repeat
>>> > performance test?
>>> >
>>> > Have a good hack!
>>> >
>>> >
>>> > On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <jo...@gmail.com>
>>> > wrote:
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm looking for a faster way to perform large scale docId -> bytesRef
>>> >> lookups for BinaryDocValues.
>>> >>
>>> >> I'm finding that I can't get the performance that I need from the
>>> >> random
>>> >> access seek in the BinaryDocValues interface.
>>> >>
>>> >> I'm wondering if sequentially scanning the docValues would be a faster
>>> >> approach. I have a BitSet of matching docs, so if I sequentially moved
>>> >> through the docValues I could test each one against that bitset.
>>> >>
>>> >> Wondering if that approach would be faster for bulk extracts and how
>>> >> tricky it would be to add an iterator to the BinaryDocValues
>>> >> interface?
>>> >>
>>> >> Thanks,
>>> >> Joel
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Sincerely yours
>>> > Mikhail Khludnev
>>> > Principal Engineer,
>>> > Grid Dynamics
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Iterating BinaryDocValues

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Don't you think it's worth to raise a jira regarding those 'new bytes[]' ?
I'm able to provide a patch if you wish.


On Wed, Jan 8, 2014 at 2:02 PM, Mikhail Khludnev <mkhludnev@griddynamics.com
> wrote:

> FWIW,
>
> Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short
> binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap
> directory.
> I wonder why it doesn't reads into incoming bytes
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
>
>
>
> On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Going sequentially should help, if the pages are not hot (in the OS's IO
>> cache).
>>
>> You can also use a different DVFormat, e.g. Direct, but this holds all
>> bytes in RAM.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
>> <mk...@griddynamics.com> wrote:
>> > Joel,
>> >
>> > I tried to hack it straightforwardly, but found no free gain there. The
>> only
>> > attempt I can suggest is to try to reuse bytes in
>> >
>> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
>> > right now it allocates bytes every time, which beside of GC can also
>> impact
>> > memory access locality. Could you try fix memory waste and repeat
>> > performance test?
>> >
>> > Have a good hack!
>> >
>> >
>> > On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <jo...@gmail.com>
>> wrote:
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I'm looking for a faster way to perform large scale docId -> bytesRef
>> >> lookups for BinaryDocValues.
>> >>
>> >> I'm finding that I can't get the performance that I need from the
>> random
>> >> access seek in the BinaryDocValues interface.
>> >>
>> >> I'm wondering if sequentially scanning the docValues would be a faster
>> >> approach. I have a BitSet of matching docs, so if I sequentially moved
>> >> through the docValues I could test each one against that bitset.
>> >>
>> >> Wondering if that approach would be faster for bulk extracts and how
>> >> tricky it would be to add an iterator to the BinaryDocValues interface?
>> >>
>> >> Thanks,
>> >> Joel
>> >
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > Principal Engineer,
>> > Grid Dynamics
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Iterating BinaryDocValues

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
FWIW,

Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short
binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap
directory.
I wonder why it doesn't reads into incoming bytes
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401



On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Going sequentially should help, if the pages are not hot (in the OS's IO
> cache).
>
> You can also use a different DVFormat, e.g. Direct, but this holds all
> bytes in RAM.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
> <mk...@griddynamics.com> wrote:
> > Joel,
> >
> > I tried to hack it straightforwardly, but found no free gain there. The
> only
> > attempt I can suggest is to try to reuse bytes in
> >
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
> > right now it allocates bytes every time, which beside of GC can also
> impact
> > memory access locality. Could you try fix memory waste and repeat
> > performance test?
> >
> > Have a good hack!
> >
> >
> > On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <jo...@gmail.com>
> wrote:
> >>
> >>
> >> Hi,
> >>
> >> I'm looking for a faster way to perform large scale docId -> bytesRef
> >> lookups for BinaryDocValues.
> >>
> >> I'm finding that I can't get the performance that I need from the random
> >> access seek in the BinaryDocValues interface.
> >>
> >> I'm wondering if sequentially scanning the docValues would be a faster
> >> approach. I have a BitSet of matching docs, so if I sequentially moved
> >> through the docValues I could test each one against that bitset.
> >>
> >> Wondering if that approach would be faster for bulk extracts and how
> >> tricky it would be to add an iterator to the BinaryDocValues interface?
> >>
> >> Thanks,
> >> Joel
> >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Iterating BinaryDocValues

Posted by Michael McCandless <lu...@mikemccandless.com>.
Going sequentially should help, if the pages are not hot (in the OS's IO cache).

You can also use a different DVFormat, e.g. Direct, but this holds all
bytes in RAM.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
<mk...@griddynamics.com> wrote:
> Joel,
>
> I tried to hack it straightforwardly, but found no free gain there. The only
> attempt I can suggest is to try to reuse bytes in
> https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
> right now it allocates bytes every time, which beside of GC can also impact
> memory access locality. Could you try fix memory waste and repeat
> performance test?
>
> Have a good hack!
>
>
> On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein <jo...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> I'm looking for a faster way to perform large scale docId -> bytesRef
>> lookups for BinaryDocValues.
>>
>> I'm finding that I can't get the performance that I need from the random
>> access seek in the BinaryDocValues interface.
>>
>> I'm wondering if sequentially scanning the docValues would be a faster
>> approach. I have a BitSet of matching docs, so if I sequentially moved
>> through the docValues I could test each one against that bitset.
>>
>> Wondering if that approach would be faster for bulk extracts and how
>> tricky it would be to add an iterator to the BinaryDocValues interface?
>>
>> Thanks,
>> Joel
>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org