You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Petersen <ro...@buy.com> on 2011/03/16 22:10:30 UTC

i don't get why my index didn't grow more...

OK I have a 30 gb index where there are lots of sparsly populated int
fields and then one title field and one catchall field with title and
everything else we want as keywords, the catchall field.  I figure it is
the biggest field in our documents which as I mentioned is otherwise
composed of a variety if int fields and a title.

 

So my puzzlement is that my biggest field is copied into a double
metaphone field and now I added another copyfield to also copy the
catchall field into a newly created soundex field for an experiment to
compare the effectiveness of the two.  I expected the index to grow by
at least 25% to 30%, but it barely grew at all.  Can someone explain
this to me?  Thanks!  J

 


Re: i don't get why my index didn't grow more...

Posted by Erick Erickson <er...@gmail.com>.
This page: http://lucene.apache.org/java/3_0_2/fileformats.html#file-names,
when combined with what Yonik said may help you figure it out...

And if you're still stumped, please post the <fieldType> and <field>
definitions you used....

Best
Erick

On Wed, Mar 16, 2011 at 5:10 PM, Robert Petersen <ro...@buy.com> wrote:
> OK I have a 30 gb index where there are lots of sparsly populated int
> fields and then one title field and one catchall field with title and
> everything else we want as keywords, the catchall field.  I figure it is
> the biggest field in our documents which as I mentioned is otherwise
> composed of a variety if int fields and a title.
>
>
>
> So my puzzlement is that my biggest field is copied into a double
> metaphone field and now I added another copyfield to also copy the
> catchall field into a newly created soundex field for an experiment to
> compare the effectiveness of the two.  I expected the index to grow by
> at least 25% to 30%, but it barely grew at all.  Can someone explain
> this to me?  Thanks!  J
>
>
>
>

Re: i don't get why my index didn't grow more...

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Without even looking at the different segment files, things look odd:
You say that you optimize every day, yet I see segments up to 4 days old.
Also look at all the segments_??? files... each represents a commit
point of the index.
So it looks like you have 16 snapshots (or commit points) of the index.
Do you have a deletion policy configured to do this for some reason?

Anyway, this is why when you changed how you index, you didn't see
much of a size increase (comparatively).

-Yonik
http://lucidimagination.com



On Wed, Mar 16, 2011 at 7:46 PM, Robert Petersen <ro...@buy.com> wrote:
> Thanks for the reply Yonik, Here are the results of Ls -l on the master server index folder, also please note we have hundreds of those small sparsely populated fields and I run optimize once a day at midnight.  We index 24/7 off a queue at a clip of about 200K docs per hour so the index has had hundreds of commits since last night at midnight.

[...]

Re: i don't get why my index didn't grow more...

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Mar 16, 2011 at 5:10 PM, Robert Petersen <ro...@buy.com> wrote:
> OK I have a 30 gb index where there are lots of sparsly populated int
> fields and then one title field and one catchall field with title and
> everything else we want as keywords, the catchall field.  I figure it is
> the biggest field in our documents which as I mentioned is otherwise
> composed of a variety if int fields and a title.
>
>
>
> So my puzzlement is that my biggest field is copied into a double
> metaphone field and now I added another copyfield to also copy the
> catchall field into a newly created soundex field for an experiment to
> compare the effectiveness of the two.  I expected the index to grow by
> at least 25% to 30%, but it barely grew at all.  Can someone explain
> this to me?  Thanks!  J

I assume you reindexed everything?

Anyway, the size of indexed fields generally grows sub-linearly (as
opposed to stored fields which is exactly linear).
But if it really barely grew at all, this could point to other parts
of the index taking up much more space than you realize.

If you could do an "ls -l" of your index directory, we might be able
to see what parts of the index are using up the most space.

-Yonik
http://lucidimagination.com