You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2010/02/16 02:57:25 UTC
Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

Same experience here as Tom. Disk I/O becomes bottleneck with large indexes
(or multiple shards per server) with less memory. Frequent updates to
indexes can make the I/O bottleneck worse.

Peter

On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West <tb...@gmail.com>wrote:

>
> Hi Chris,
>
> In our experience with large indexes (about 200-300GB) , we found most of
> our bottlenecks involved disk I/O.  We found that if our experimental
> indexes were too small, that much  of the index could fit in cache, and so
> our test results  were not applicable to our larger indexes.  On the other
> hand, once we started building our test indexes so they were significantly
> larger than the amount of memory available for OS disk caching, we could
> see
> results that extrapolated out to the large index.
>
> Tom Burton-West
> www.hathitrust.org
>
>
> ryguasu wrote:
> >
> > I'd like to try some experiments to see if I can improve search
> > performance by changing analysis (e.g. adding/removing word bigrams or
> > commongrams), or by changing how I map my source records into Lucene
> > documents. The problem is that my index currently is about 1TB in size
> > and takes about 2-3 weeks to build, so if I have to rebuild the entire
> > index in order to test each potential improvement, then I'm going to
> > be waiting around a lot.
> >
> > One option is to test potential performance improvements by building
> > indexes not for the full dataset, but rather for, say, a 1% sample of
> > the full dataset. (That is, I'll just index 1% of the source records.)
> > I would build one small control index, and then n small test indexes,
> > one for each intervention I wish to try. The hope would be that, if an
> > indexing intervention significantly improves performance for the small
> > indexes, then it would also significantly improve performance of the
> > full dataset. (Similarly, you'd hope that if an intervention *didn't*
> > significantly improve performance on the small indexes, then it would
> > *not* significantly improve performance of the full dataset.) This
> > would allow me to quickly accept and reject interventions (as least
> > provisionally), and only try applying the most obviously promising
> > ones to the full dataset.
> >
> > Any thoughts on how naive this is? Does it sound more like a way to
> > save time, or like a way to waste time misleading myself?
> >
> > Cheers,
> > Chris
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Can-you-use-reduced-sized-test-indexes-to-predict-performance-gains--for-a-larger-index--tp27571524p27598602.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>