You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Toke Eskildsen <te...@statsbiblioteket.dk> on 2016/12/11 12:04:58 UTC

Re: Why Two Levels of Indirection in BytesRefHash class ?

Adrien Grand <jp...@gmail.com> wrote:
> That would work if you are only interested in using BytesRefHash as a hash
> set for byte[]. However these incremental ids are useful if you want to
> associate data with each byte[]: you can create parallel arrays and use the
> ids returned by the BytesRefHash as indices in these arrays.

That could be solved by prepending the stored BytesRef with the counter value, then using a fixed +4 delta to the offset to get the BytesRef. Same space requirements as now, but with one less level of indirection meaning less CPU-cache invalidation.

However, this removes the nice property of providing insertion-order iterability of the DocValues in the structure, so it would be quite a change to current code.

One optimization, while we are on the subject, is to exploit the indirection. As the bytesStarts are monotonic incremental offsets in the ByteBlockPool, there is no need to store the length of the BytesRefs. They can be calculated with bytesStarts[id+1] - bytesStarts[id]. This saves 1-2 bytes per entry and upholds memory locality, so it should have the same performance as now (needs to be tested of course).

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org