You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by karthik c <ka...@gmail.com> on 2009/03/19 12:14:08 UTC

large number of cores

Hi guys,

We need to index data of a large number of types. I was wondering if it is
better to create separate cores for each type or add everything to one core
with a "type" field ?

Here are some more details:
The database: Currently we have around 200 types of data. The data for each
type is stored in a separate mysql table. Each type has its own set of
fields, though they all share a name field and a globally unique id field.
The volume of data under each type varies from around 30 records to around
1.5 million records.

The queries: We will need to support the following kinds of queries:
  1. search by name within a type
  2. perform faceted filtering on all fields within a type
  3. search by name across all types

We have currently created separate cores for each type. We also wrote a
small tool to create cores for each type and trigger a full-import for each
of them. I am not sure if this is right approach though. Also, the number of
types may increase by quite a bit in the future.

My concerns with having such a large number of cores is:
1. Does Solr support such a large number of cores ?
2. Will searching across all cores be fast/effective with such a large
number of cores ?
3. We ran into an issue where they were too many open file handles and had
to increase the file open limit in the OS.
4. Triggering the full-import for a lot of cores at once results in some
cores not being indexed fully. Manually re-triggering the import for these
cores seems to fix the problem though.

My concerns about using a single core are:
1. The schema will now contain fields for all types. So most fields will be
empty in most documents.
2. Will searching within a type be slower when compared to having the type
in a separate core ?

Thanks,
karthik c
http://cantspellathing.blogspot.com

Re: large number of cores

Posted by karthik c <ka...@gmail.com>.

Thanks Otis. Will try out using a single index.

karthik c
http://cantspellathing.blogspot.com


On Thu, Mar 19, 2009 at 11:24 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

>
> You can really go either way.  Empty fields are OK.  Having lots of cores
> seems harder to maintain.  Searching against a small core will be faster
> than searching against a single core/index with all data, but you can use
> 'fq' to make things really fast.  The numbers you quote are not really big.
>  If you need to search by name across types, I would go with a single index.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: karthik c <ka...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Thursday, March 19, 2009 7:14:08 AM
> > Subject: large number of cores
> >
> > Hi guys,
> >
> > We need to index data of a large number of types. I was wondering if it
> is
> > better to create separate cores for each type or add everything to one
> core
> > with a "type" field ?
> >
> > Here are some more details:
> > The database: Currently we have around 200 types of data. The data for
> each
> > type is stored in a separate mysql table. Each type has its own set of
> > fields, though they all share a name field and a globally unique id
> field.
> > The volume of data under each type varies from around 30 records to
> around
> > 1.5 million records.
> >
> > The queries: We will need to support the following kinds of queries:
> >   1. search by name within a type
> >   2. perform faceted filtering on all fields within a type
> >   3. search by name across all types
> >
> > We have currently created separate cores for each type. We also wrote a
> > small tool to create cores for each type and trigger a full-import for
> each
> > of them. I am not sure if this is right approach though. Also, the number
> of
> > types may increase by quite a bit in the future.
> >
> > My concerns with having such a large number of cores is:
> > 1. Does Solr support such a large number of cores ?
> > 2. Will searching across all cores be fast/effective with such a large
> > number of cores ?
> > 3. We ran into an issue where they were too many open file handles and
> had
> > to increase the file open limit in the OS.
> > 4. Triggering the full-import for a lot of cores at once results in some
> > cores not being indexed fully. Manually re-triggering the import for
> these
> > cores seems to fix the problem though.
> >
> > My concerns about using a single core are:
> > 1. The schema will now contain fields for all types. So most fields will
> be
> > empty in most documents.
> > 2. Will searching within a type be slower when compared to having the
> type
> > in a separate core ?
> >
> > Thanks,
> > karthik c
> > http://cantspellathing.blogspot.com
>
>

Re: large number of cores

Posted by Otis Gospodnetic <ot...@yahoo.com>.

You can really go either way.  Empty fields are OK.  Having lots of cores seems harder to maintain.  Searching against a small core will be faster than searching against a single core/index with all data, but you can use 'fq' to make things really fast.  The numbers you quote are not really big.  If you need to search by name across types, I would go with a single index.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: karthik c <ka...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, March 19, 2009 7:14:08 AM
> Subject: large number of cores
> 
> Hi guys,
> 
> We need to index data of a large number of types. I was wondering if it is
> better to create separate cores for each type or add everything to one core
> with a "type" field ?
> 
> Here are some more details:
> The database: Currently we have around 200 types of data. The data for each
> type is stored in a separate mysql table. Each type has its own set of
> fields, though they all share a name field and a globally unique id field.
> The volume of data under each type varies from around 30 records to around
> 1.5 million records.
> 
> The queries: We will need to support the following kinds of queries:
>   1. search by name within a type
>   2. perform faceted filtering on all fields within a type
>   3. search by name across all types
> 
> We have currently created separate cores for each type. We also wrote a
> small tool to create cores for each type and trigger a full-import for each
> of them. I am not sure if this is right approach though. Also, the number of
> types may increase by quite a bit in the future.
> 
> My concerns with having such a large number of cores is:
> 1. Does Solr support such a large number of cores ?
> 2. Will searching across all cores be fast/effective with such a large
> number of cores ?
> 3. We ran into an issue where they were too many open file handles and had
> to increase the file open limit in the OS.
> 4. Triggering the full-import for a lot of cores at once results in some
> cores not being indexed fully. Manually re-triggering the import for these
> cores seems to fix the problem though.
> 
> My concerns about using a single core are:
> 1. The schema will now contain fields for all types. So most fields will be
> empty in most documents.
> 2. Will searching within a type be slower when compared to having the type
> in a separate core ?
> 
> Thanks,
> karthik c
> http://cantspellathing.blogspot.com