You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael McCandless (Jira)" <ji...@apache.org> on 2021/10/28 13:19:00 UTC

[jira] [Commented] (LUCENE-9673) The level of IntBlockPool slice is always 1

    [ https://issues.apache.org/jira/browse/LUCENE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435401#comment-17435401 ] 

Michael McCandless commented on LUCENE-9673:
--------------------------------------------

OK phew catching up on this issue again [~mashudong]! Sorry for the crazy long delay.

It turns out nothing in Lucene's {{core}} uses any of this complex growable {{int[]}} logic – only {{MemoryIndex}} does (today anyways).  {{core}}'s {{int[]}} allocation need are are simpler: just allocating 1, 2 or 3 ints per new term encountered during indexing (depending on docs, freqs, prox are enabled). For {{byte[]}} storage, we do still use/need the growing slices to account for longer and shorter vInt encoded postings lists.

I will open a follow-on issue to promote this out of {{core}} into {{MemoryIndex}}.

For this issue let's just fix this sneaky {{IntBlockPool}} performance bug!

Oh and I also found this long-standing {{TODO}}:
{noformat}
   // TODO: figure out why this is 2*streamCount here. streamCount should be enough?{noformat}
And indeed it is over-allocating – we are wasting half of the {{int[]}} RAM we are allocating!  I fixed that, tests pass.  So this will be a little RAM efficiency improvement for {{IndexWriter}}.

Separately, I wonder if we could run a static "locally dead code detector" from gradle that would crawl the source graph dependencies, excluding tests?  I.e. this code was not technically dead, since unit tests were indeed exercising it, and another Lucene module was also using it, but nothing in Lucene's {{core}} was in fact using it.  I wish such code were automatically removed from our repository, or proposed to be moved out to the module that really needs it :)  Sort of a source code garbage collector ...

> The level of IntBlockPool slice is always 1 
> --------------------------------------------
>
>                 Key: LUCENE-9673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9673
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>            Reporter: mashudong
>            Priority: Minor
>         Attachments: LUCENE-9673.patch
>
>
> First slice is allocated by IntBlockPoo.newSlice(), and its level is 1,
>  
> {code:java}
> private int newSlice(final int size) {
>  if (intUpto > INT_BLOCK_SIZE-size) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
>  
>  final int upto = intUpto;
>  intUpto += size;
>  buffer[intUpto-1] = 1;
>  return upto;
> }{code}
>  
>  
> If one slice is not enough, IntBlockPoo.allocSlice() is called to allocate more slices,
> as the following code shows, level is 1, newLevel is NEXT_LEVEL_ARRAY[0] which is also 1.
>  
> The result is the level of IntBlockPool slice is always 1, the first slice is  2 bytes long, and all subsequent slices are 4 bytes long.
>  
> {code:java}
> private static final int[] NEXT_LEVEL_ARRAY = {1, 2, 3, 4, 5, 6, 7, 8, 9, 9};
> private int allocSlice(final int[] slice, final int sliceOffset) {
>  final int level = slice[sliceOffset];
>  final int newLevel = NEXT_LEVEL_ARRAY[level - 1];
>  final int newSize = LEVEL_SIZE_ARRAY[newLevel];
>  // Maybe allocate another block
>  if (intUpto > INT_BLOCK_SIZE - newSize) {
>  nextBuffer();
>  assert assertSliceBuffer(buffer);
>  }
> final int newUpto = intUpto;
>  final int offset = newUpto + intOffset;
>  intUpto += newSize;
>  // Write forwarding address at end of last slice:
>  slice[sliceOffset] = offset;
> // Write new level:
>  buffer[intUpto - 1] = newLevel;
> return newUpto;
>  } 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org