You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tim Sell <tr...@gmail.com> on 2009/07/08 13:25:40 UTC

All in one index, or multiple indexes?

Hi,
I am wondering if it is common to have just one very large index, or
multiple smaller indexes specialized for different content types.

We currently have multiple smaller indexes, although one of them is
much larger then the others. We are considering merging them, to allow
the convenience of searching across multiple types at once and get
them back in one list. The largest of the current indexes has a couple
of types that belong together, it has just one text field, and it is
usually quite short and is similar to product names (words like "The"
matter). Another index I would merge with this one, has multiple text
fields (also quite short).

We of course would still like to be able to get specific types. Is
doing filtering on just one type a big performance hit compared to
just querying it from it's own index? Bare in mind all these indexes
run on the same machine. (we replicate them all to three machines and
do load balancing).

There are a number of considerations. From an application standpoint
when querying across all types we may split the results out into the
separate types anyway once we have the list back. If we always do
this, is it silly to have them in one index, rather then query
multiple indexes at once? Is multiple http requests less significant
then the time to post split the results?

In some ways it is easier to maintain a single index, although it has
felt easier to optimize the results for the type of content if they
are in separate indexes. My main concern of putting it all in one
index is that we'll make it harder to work with. We will definitely
want to do filtering on types sometimes, and if we go with a mashed up
index I'd prefer not to maintain separate specialized indexes as well.

Any thoughts?

~Tim.

Re: All in one index, or multiple indexes?

Posted by Jim Adams <ja...@gmail.com>.
It will depend on how much total volume you have.  If you are discussing
millions and millions of records, I'd say use multicore and shards.

On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell <tr...@gmail.com> wrote:

> Hi,
> I am wondering if it is common to have just one very large index, or
> multiple smaller indexes specialized for different content types.
>
> We currently have multiple smaller indexes, although one of them is
> much larger then the others. We are considering merging them, to allow
> the convenience of searching across multiple types at once and get
> them back in one list. The largest of the current indexes has a couple
> of types that belong together, it has just one text field, and it is
> usually quite short and is similar to product names (words like "The"
> matter). Another index I would merge with this one, has multiple text
> fields (also quite short).
>
> We of course would still like to be able to get specific types. Is
> doing filtering on just one type a big performance hit compared to
> just querying it from it's own index? Bare in mind all these indexes
> run on the same machine. (we replicate them all to three machines and
> do load balancing).
>
> There are a number of considerations. From an application standpoint
> when querying across all types we may split the results out into the
> separate types anyway once we have the list back. If we always do
> this, is it silly to have them in one index, rather then query
> multiple indexes at once? Is multiple http requests less significant
> then the time to post split the results?
>
> In some ways it is easier to maintain a single index, although it has
> felt easier to optimize the results for the type of content if they
> are in separate indexes. My main concern of putting it all in one
> index is that we'll make it harder to work with. We will definitely
> want to do filtering on types sometimes, and if we go with a mashed up
> index I'd prefer not to maintain separate specialized indexes as well.
>
> Any thoughts?
>
> ~Tim.
>

Re: All in one index, or multiple indexes?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
keep in mind that everytime a commit is done all the caches are thrown
away. If  updates for each of these indexes happen at different time
then the caches get invalidated each time you commit. so in that case
smaller index helps

On Wed, Jul 8, 2009 at 4:55 PM, Tim Sell<tr...@gmail.com> wrote:
> Hi,
> I am wondering if it is common to have just one very large index, or
> multiple smaller indexes specialized for different content types.
>
> We currently have multiple smaller indexes, although one of them is
> much larger then the others. We are considering merging them, to allow
> the convenience of searching across multiple types at once and get
> them back in one list. The largest of the current indexes has a couple
> of types that belong together, it has just one text field, and it is
> usually quite short and is similar to product names (words like "The"
> matter). Another index I would merge with this one, has multiple text
> fields (also quite short).
>
> We of course would still like to be able to get specific types. Is
> doing filtering on just one type a big performance hit compared to
> just querying it from it's own index? Bare in mind all these indexes
> run on the same machine. (we replicate them all to three machines and
> do load balancing).
>
> There are a number of considerations. From an application standpoint
> when querying across all types we may split the results out into the
> separate types anyway once we have the list back. If we always do
> this, is it silly to have them in one index, rather then query
> multiple indexes at once? Is multiple http requests less significant
> then the time to post split the results?
>
> In some ways it is easier to maintain a single index, although it has
> felt easier to optimize the results for the type of content if they
> are in separate indexes. My main concern of putting it all in one
> index is that we'll make it harder to work with. We will definitely
> want to do filtering on types sometimes, and if we go with a mashed up
> index I'd prefer not to maintain separate specialized indexes as well.
>
> Any thoughts?
>
> ~Tim.
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com