You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2014/07/30 15:27:49 UTC

Index size increase after upgrade to 4.9?

Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a
third-party plugin to a new version that's compatible with Solr 4.9.

After the index was rebuilt, each shard was 28GB ... but before the
upgrade, each shard was only 20GB.  The number of documents per shard
(16.4 million) actually went *down* a little bit, and the config/schema
hasn't changed.

Could this be explained by the new Solr version?  I've also asked the
third-party plugin company about this problem.

Thanks,
Shawn


Re: Index size increase after upgrade to 4.9?

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/30/2014 10:00 AM, Shawn Heisey wrote:
> It may turn out that this is actually a bug in merging, where old
> segments are not getting deleted.  I noticed in the optimized index that
> there is a single large segment of about 20GB and a bunch of other
> segments that are all older than the single large segment.  I'm manually
> optimizing that index again to see what happens.  I'll probably need do
> the rebuild again with infoStream enabled.

The second optimize did not delete those old segments.  I also did an
optimize on another shard, and saw the same problem there.  A full
rebuild will take close to twelve hours, or possibly longer once I
enable infoStream.  I will open an issue, and if the problem persists,
attach the infoStream.

This problem likely does not affect the actual size of the index loaded
into Lucene, just the amount of disk space taken, though I cannot
confirm that statement.

Has anyone else noticed an increase in the size of on-disk indexes after
an upgrade to 4.9, or the presence of older segments after an optimize
(forceMerge)?  My rebuilds use DIH, in case that matters.

One possible trigger might be that I am indexing into an index directory
originally built a previous Solr version.  Before I start the
dataimport, I do delete all docs and issue a commit, but I am not
deleting the entire index directory.

Thanks,
Shawn


Re: Index size increase after upgrade to 4.9?

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/30/2014 9:16 AM, Shawn Heisey wrote:
> On 7/30/2014 9:10 AM, Erick Erickson wrote:
>> I assume you've optimized? Or otherwise insured that there aren't
>> any deleted docs....
> It's all straight indexing with DIH from MySQL, so there really are no
> deleted docs, but about an hour after the rebuild finished, one of the
> shards did get optimized by my SolrJ code.  The size is still 28GB.

It may turn out that this is actually a bug in merging, where old
segments are not getting deleted.  I noticed in the optimized index that
there is a single large segment of about 20GB and a bunch of other
segments that are all older than the single large segment.  I'm manually
optimizing that index again to see what happens.  I'll probably need do
the rebuild again with infoStream enabled.

Thanks,
Shawn


Re: Index size increase after upgrade to 4.9?

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/30/2014 9:10 AM, Erick Erickson wrote:
> I assume you've optimized? Or otherwise insured that there aren't
> any deleted docs....

It's all straight indexing with DIH from MySQL, so there really are no
deleted docs, but about an hour after the rebuild finished, one of the
shards did get optimized by my SolrJ code.  The size is still 28GB.

Thanks,
Shawn


Re: Index size increase after upgrade to 4.9?

Posted by Erick Erickson <er...@gmail.com>.
I assume you've optimized? Or otherwise insured that there aren't
any deleted docs....

Best,
Erick


On Wed, Jul 30, 2014 at 6:27 AM, Shawn Heisey <so...@elyograg.org> wrote:

> Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a
> third-party plugin to a new version that's compatible with Solr 4.9.
>
> After the index was rebuilt, each shard was 28GB ... but before the
> upgrade, each shard was only 20GB.  The number of documents per shard
> (16.4 million) actually went *down* a little bit, and the config/schema
> hasn't changed.
>
> Could this be explained by the new Solr version?  I've also asked the
> third-party plugin company about this problem.
>
> Thanks,
> Shawn
>
>