You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2013/06/19 18:38:55 UTC

Apparent odd interaction between autoCommit values and indexing ram buffer

I've run into something a little odd that's been happening for a while.

The apparent symptoms: Two index segments are created every time an 
autoCommit (hard, not soft) happens during a DIH full-import.

Here's the directory listing from the first few minutes of importing, 
and a related INFOSTREAM:

http://apaste.info/22ue
https://dl.dropboxusercontent.com/u/97770508/INFOSTREAM-s1build.txt

The INFOSTREAM file has cruft from before, so if you search for "3g8" in 
the file, you'll be at the beginning of the relevant section.

I brought this up without resolution on the dev list last December. 
After some discussion in #solr-dev yesterday and some poking around with 
branch_4x, I think I might have figured out (at a high level) what's 
going on.

My 'ramBufferSizeMB' value is 48, and my autoCommit maxDocs is 25000. 
My documents probably tend to be 1-2kb, with some increasing a little 
beyond that.

Looking at the numDocs for each segment, here's what I think is happening:

The autoCommit kicks in after the first 25000 docs (25002 to be 
precise), but the ram buffer isn't emptied. The next 3339 documents get 
indexed, at which point the ram buffer fills up, so it flushes another 
segment.  Then it does another 21674 docs to approximately reach 25000 
for autoCommit, which forces another segment flush, but without emptying 
the buffer.  lather, rinse, repeat.

Each pair of numDocs values after the initial 25002 does add up to 
approximately 25000.

If I'm right about what's happening here, then here's the big question: 
Should the ram buffer be emptied when autoCommit triggers?  I think that 
it should, but can it be done without drastically affecting performance? 
  I haven't looked at the code ... I expect that it'll take me forever 
to understand it well enough to figure out if I'm right or wrong.

Re: Apparent odd interaction between autoCommit values and indexing ram buffer

Posted by Shawn Heisey <so...@elyograg.org>.
On 6/19/2013 10:38 AM, Shawn Heisey wrote:
> Looking at the numDocs for each segment, here's what I think is happening:
>
> The autoCommit kicks in after the first 25000 docs (25002 to be
> precise), but the ram buffer isn't emptied. The next 3339 documents get
> indexed, at which point the ram buffer fills up, so it flushes another
> segment.  Then it does another 21674 docs to approximately reach 25000
> for autoCommit, which forces another segment flush, but without emptying
> the buffer.  lather, rinse, repeat.

I seem to be wrong about it being strictly related to ramBufferSizeMB. 
Today I bumped the buffer up to 256MB, restarted Solr, and started 
another full-import.

If I were completely right about the buffer interaction, this should 
have resulted in a few somewhat equal sized segments being created 
before creating a small one.  It didn't change anything - it's still two 
segments per autocommit, one of which is around 3000 docs and the other 
adds to that to make about 25000.

There's still something weird going on, but now I know that I don't 
completely understand it.  I hope someone can shed some light.

Thanks,
Shawn