You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2013/06/19 18:38:55 UTC
Apparent odd interaction between autoCommit values and indexing ram
buffer
I've run into something a little odd that's been happening for a while.
The apparent symptoms: Two index segments are created every time an
autoCommit (hard, not soft) happens during a DIH full-import.
Here's the directory listing from the first few minutes of importing,
and a related INFOSTREAM:
http://apaste.info/22ue
https://dl.dropboxusercontent.com/u/97770508/INFOSTREAM-s1build.txt
The INFOSTREAM file has cruft from before, so if you search for "3g8" in
the file, you'll be at the beginning of the relevant section.
I brought this up without resolution on the dev list last December.
After some discussion in #solr-dev yesterday and some poking around with
branch_4x, I think I might have figured out (at a high level) what's
going on.
My 'ramBufferSizeMB' value is 48, and my autoCommit maxDocs is 25000.
My documents probably tend to be 1-2kb, with some increasing a little
beyond that.
Looking at the numDocs for each segment, here's what I think is happening:
The autoCommit kicks in after the first 25000 docs (25002 to be
precise), but the ram buffer isn't emptied. The next 3339 documents get
indexed, at which point the ram buffer fills up, so it flushes another
segment. Then it does another 21674 docs to approximately reach 25000
for autoCommit, which forces another segment flush, but without emptying
the buffer. lather, rinse, repeat.
Each pair of numDocs values after the initial 25002 does add up to
approximately 25000.
If I'm right about what's happening here, then here's the big question:
Should the ram buffer be emptied when autoCommit triggers? I think that
it should, but can it be done without drastically affecting performance?
I haven't looked at the code ... I expect that it'll take me forever
to understand it well enough to figure out if I'm right or wrong.
Re: Apparent odd interaction between autoCommit values and indexing
ram buffer
Posted by Shawn Heisey <so...@elyograg.org>.
On 6/19/2013 10:38 AM, Shawn Heisey wrote:
> Looking at the numDocs for each segment, here's what I think is happening:
>
> The autoCommit kicks in after the first 25000 docs (25002 to be
> precise), but the ram buffer isn't emptied. The next 3339 documents get
> indexed, at which point the ram buffer fills up, so it flushes another
> segment. Then it does another 21674 docs to approximately reach 25000
> for autoCommit, which forces another segment flush, but without emptying
> the buffer. lather, rinse, repeat.
I seem to be wrong about it being strictly related to ramBufferSizeMB.
Today I bumped the buffer up to 256MB, restarted Solr, and started
another full-import.
If I were completely right about the buffer interaction, this should
have resulted in a few somewhat equal sized segments being created
before creating a small one. It didn't change anything - it's still two
segments per autocommit, one of which is around 3000 docs and the other
adds to that to make about 25000.
There's still something weird going on, but now I know that I don't
completely understand it. I hope someone can shed some light.
Thanks,
Shawn