You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Manepalli, Kalyan" <KA...@orbitz.com> on 2009/03/25 22:46:40 UTC

large index vs multicore

Hi All,
            In my project, I have one primary core containing all the basic information for a product.
Now I need to add additional information which will be searched and displayed in conjunction with the product results.
My question is - From design and query speed point of - should I add new core to handle the additional data or should I add the data to the existing core.

The data size is not very large around 150,000 - 200,000 documents.

Any insights into this will be helpful

Thanks,
Kalyan Manepalli


RE: large index vs multicore

Posted by "Manepalli, Kalyan" <KA...@orbitz.com>.
The manual suggested by Otis would happen inside of Solr. We use the last-component to do the sub-query and then merge the results. Since it's a new sub-query the relevancy and sorting should be independent of the main query.

Thanks,
Kalyan Manepalli
-----Original Message-----
From: Nicolas Pastorino [mailto:nfrp@ez.no]
Sent: Thursday, May 07, 2009 10:21 AM
To: solr-user@lucene.apache.org
Subject: Re: large index vs multicore

Hi, and sorry for slightly hijacking the thread,

On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:

>
> Hi,
>
> Without knowing the details, I'd say keep it in the same index if
> the additional information shares some/enough fields with the main
> product data and separately if it's sufficiently distinct (this
> also means 2 queries and manual merging/joining).

Where would this manual merging/joining occur? At the client-side or
inside Solr, before returning the results ?
I was wondering what relevancy, sorting, etc. would become.
--
Nicolas

>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: "Manepalli, Kalyan" <KA...@orbitz.com>
>> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>> Sent: Wednesday, March 25, 2009 5:46:40 PM
>> Subject: large index vs multicore
>>
>> Hi All,
>>             In my project, I have one primary core containing all
>> the basic
>> information for a product.
>> Now I need to add additional information which will be searched
>> and displayed in
>> conjunction with the product results.
>> My question is - From design and query speed point of - should I
>> add new core to
>> handle the additional data or should I add the data to the
>> existing core.
>>
>> The data size is not very large around 150,000 - 200,000 documents.
>>
>> Any insights into this will be helpful
>>
>> Thanks,
>> Kalyan Manepalli
>

--
Nicolas Pastorino
Consultant - Trainer - System Developer
Phone :  +33 (0)4.78.37.01.34
eZ Systems ( Western Europe )  |  http://ez.no





Re: large index vs multicore

Posted by Nicolas Pastorino <nf...@ez.no>.
Hi, and sorry for slightly hijacking the thread,

On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:

>
> Hi,
>
> Without knowing the details, I'd say keep it in the same index if  
> the additional information shares some/enough fields with the main  
> product data and separately if it's sufficiently distinct (this  
> also means 2 queries and manual merging/joining).

Where would this manual merging/joining occur? At the client-side or  
inside Solr, before returning the results ?
I was wondering what relevancy, sorting, etc. would become.
-- 
Nicolas

>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: "Manepalli, Kalyan" <KA...@orbitz.com>
>> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>> Sent: Wednesday, March 25, 2009 5:46:40 PM
>> Subject: large index vs multicore
>>
>> Hi All,
>>             In my project, I have one primary core containing all  
>> the basic
>> information for a product.
>> Now I need to add additional information which will be searched  
>> and displayed in
>> conjunction with the product results.
>> My question is - From design and query speed point of - should I  
>> add new core to
>> handle the additional data or should I add the data to the  
>> existing core.
>>
>> The data size is not very large around 150,000 - 200,000 documents.
>>
>> Any insights into this will be helpful
>>
>> Thanks,
>> Kalyan Manepalli
>

-- 
Nicolas Pastorino
Consultant - Trainer - System Developer
Phone :  +33 (0)4.78.37.01.34
eZ Systems ( Western Europe )  |  http://ez.no





Re: large index vs multicore

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

Without knowing the details, I'd say keep it in the same index if the additional information shares some/enough fields with the main product data and separately if it's sufficiently distinct (this also means 2 queries and manual merging/joining).


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: "Manepalli, Kalyan" <KA...@orbitz.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Wednesday, March 25, 2009 5:46:40 PM
> Subject: large index vs multicore
> 
> Hi All,
>             In my project, I have one primary core containing all the basic 
> information for a product.
> Now I need to add additional information which will be searched and displayed in 
> conjunction with the product results.
> My question is - From design and query speed point of - should I add new core to 
> handle the additional data or should I add the data to the existing core.
> 
> The data size is not very large around 150,000 - 200,000 documents.
> 
> Any insights into this will be helpful
> 
> Thanks,
> Kalyan Manepalli


RE: large index vs multicore

Posted by "Manepalli, Kalyan" <KA...@orbitz.com>.
Thanks for the reply. 
Yes in most of the usecase the data would be from both the indices.
It's like a parent child relation. The usecase requires the data from the child be displayed along with parent product information.


Thanks,
Kalyan Manepalli

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Wednesday, March 25, 2009 8:54 PM
To: solr-user@lucene.apache.org
Subject: Re: large index vs multicore


>
> My question is - From design and query speed point of - should I add  
> new core to handle the additional data or should I add the data to  
> the existing core.

Do you ever need to get results from both sets of data in the same  
query?  If so, putting them in the same index will be faster.  If  
every query is always limited to results within on set or the other --  
and the doc count is not huge, then the choice of single core vs multi  
core is more about what you are more comfortable managing then it is  
about query speeds.

Advantages of multicore-
  - the distinct data is in different indexes, you can maintain them  
independently
    (perhaps one data set never changes and the other changes often)

Advantages of single core (with multiple data sets)
  - everything is in one place
  - replicate / load balance a single index rather then multiple.


ryan

Re: large index vs multicore

Posted by Ryan McKinley <ry...@gmail.com>.
>
> My question is - From design and query speed point of - should I add  
> new core to handle the additional data or should I add the data to  
> the existing core.

Do you ever need to get results from both sets of data in the same  
query?  If so, putting them in the same index will be faster.  If  
every query is always limited to results within on set or the other --  
and the doc count is not huge, then the choice of single core vs multi  
core is more about what you are more comfortable managing then it is  
about query speeds.

Advantages of multicore-
  - the distinct data is in different indexes, you can maintain them  
independently
    (perhaps one data set never changes and the other changes often)

Advantages of single core (with multiple data sets)
  - everything is in one place
  - replicate / load balance a single index rather then multiple.


ryan