You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Licinio Fernández Maurelo <li...@gmail.com> on 2009/08/19 11:02:01 UTC

Adding cores dynamically

Hi there,

currently we want to add cores dynamically when the active one reaches
some capacity,
can anyone give me some hints to achieve such this functionality? (Just
wondering if you have used shell-scripting or you have code some 100%
Java based solution)

Thx


-- 
Lici

Re: Adding cores dynamically

Posted by Licinio Fernández Maurelo <li...@gmail.com>.

These are the reasons why we are thinking on splitting and index via multi-core:

First of all all, we have an index of news which size is about 9G. As
we will keep aggregating news forever and ever and let users do free
text search on our system, we think that it will be easier for IT
crowd to manage fixed size indexes (read-only indexes) giving
flexibility to the plattform (i'm wondering how much performance will
we lose if read-only indexes live in NFS).

Secondly, we plan to store date ranges per core, then, when a
federated search is made it filter the cores to query on (we plan to
install multiple solr servers as the info growth)

2009/8/26 Chris Hostetter <ho...@fucit.org>:
>
> : 1) We found the indexing speed starts dipping once the index grow to a
> : certain size - in our case around 50G. We don't optimize, but we have
> : to maintain a consistent index speed. The only way we could do that
> : was keep creating new cores (on the same box, though we do use
>
> Hmmm... it seems like ConcurrentMergeScheduler should make it possible to
> maintain semi-constant indexing speed by doing merges in background
> threads ... the only other issue would be making sure that an individual
> segment never got too big ... but that seems like it should be managable
> with the config options
>
> (i'm just hypothisizing, i don't normally worry about indexes of this
> size, and when i do i'm not incrementally adding to them as time goes one
> ... i guess what i'm asking is if you guys ever looked into these ideas
> and dissmissed them for some reason)
>
> : 2) Be able to drop the whole core for pruning purposes. We didn't want
>
> that makes a lot of sense ... removing older cores is on of the only
> reaosns i could think of for this model to really make a lot of sense for
> performance reasons.
>
> : > One problem is the IT logistics of handling the file set. At 200 million
> : > records you have at least 20G of data in one Lucene index. It takes hours to
> : > optimize this, and 10s of minutes to copy the optimized index around to
> : > query servers.
>
> i get that full optimizes become ridiculous at that point, but you could
> still do partial optimizes ... and isn't the total disk space with this
> strategy still the same?  Aren't you still ultimately copying the same
> amout of data arround?
>
>
>
> -Hoss
>
>



-- 
Lici

Re: Adding cores dynamically

Posted by Chris Hostetter <ho...@fucit.org>.

: 1) We found the indexing speed starts dipping once the index grow to a
: certain size - in our case around 50G. We don't optimize, but we have
: to maintain a consistent index speed. The only way we could do that
: was keep creating new cores (on the same box, though we do use

Hmmm... it seems like ConcurrentMergeScheduler should make it possible to 
maintain semi-constant indexing speed by doing merges in background 
threads ... the only other issue would be making sure that an individual 
segment never got too big ... but that seems like it should be managable 
with the config options 

(i'm just hypothisizing, i don't normally worry about indexes of this 
size, and when i do i'm not incrementally adding to them as time goes one 
... i guess what i'm asking is if you guys ever looked into these ideas 
and dissmissed them for some reason)

: 2) Be able to drop the whole core for pruning purposes. We didn't want

that makes a lot of sense ... removing older cores is on of the only 
reaosns i could think of for this model to really make a lot of sense for 
performance reasons.

: > One problem is the IT logistics of handling the file set. At 200 million
: > records you have at least 20G of data in one Lucene index. It takes hours to
: > optimize this, and 10s of minutes to copy the optimized index around to
: > query servers.

i get that full optimizes become ridiculous at that point, but you could 
still do partial optimizes ... and isn't the total disk space with this 
strategy still the same?  Aren't you still ultimately copying the same 
amout of data arround?



-Hoss

Re: Adding cores dynamically

Posted by vivek sar <vi...@gmail.com>.

There were two main reasons we went with multi-core solution,

1) We found the indexing speed starts dipping once the index grow to a
certain size - in our case around 50G. We don't optimize, but we have
to maintain a consistent index speed. The only way we could do that
was keep creating new cores (on the same box, though we do use
multiple boxes to scale horizontally as well) once it reaches its
capacity. The old core is not written to again once it reaches its
capacity.

2) Be able to drop the whole core for pruning purposes. We didn't want
to delete records from the index, so the best solution was to simply
delete the complete core directory (we do maintain the time period for
each core), which is much faster and easy to maintain.

So far things have been working fine. I'm not sure if there is any
inherent problem with this architecture given the above limitations
and requirements.

-vivek

On Tue, Aug 25, 2009 at 10:57 AM, Lance Norskog<go...@gmail.com> wrote:
> One problem is the IT logistics of handling the file set. At 200 million
> records you have at least 20G of data in one Lucene index. It takes hours to
> optimize this, and 10s of minutes to copy the optimized index around to
> query servers.
> Another problem is that indexing speed drops off after the index reaches a
> certain size. When making multiple indexes, you want to stop indexing before
> that size.
> Lance
>
> On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter
> <ho...@fucit.org>wrote:
>
>>
>> :   We're doing similar thing with multi-core - when a core reaches
>> : capacity (in our case 200 million records) we start a new core. We are
>> : doing this via web service call (Create web service),
>>
>> this whole thread perplexes me ... while i can understand not wanting to
>> let an index grow without bound becuase of hardware limitation, i don't
>> understand what value you are gaining by creating a new core on the same
>> box -- you're using the same physical resources to search the same number
>> of documents, making multiple cores for this actaully seems like it would
>> take up *more* resources to search the same amount of content, because the
>> individual cores will be isolated and the term dictionaries can't be
>> shared (not to mention you have to do a multi-shard query to get results
>> from all the cores)
>>
>> are you doing something special with the old cores vs the new ones? (ie:
>> create the new cores on new machines, shutdown cores after a certian
>> amount of time has expired, etc...)
>>
>>
>> : > Hi there,
>> : >
>> : > currently we want to add cores dynamically when the active one reaches
>> : > some capacity,
>> : > can anyone give me some hints to achieve such this functionality? (Just
>> : > wondering if you have used shell-scripting or you have code some 100%
>> : > Java based solution)
>> : >
>> : > Thx
>> : >
>> : >
>> : > --
>> : > Lici
>> : >
>> :
>>
>>
>>
>> -Hoss
>>
>>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Adding cores dynamically

Posted by Lance Norskog <go...@gmail.com>.

One problem is the IT logistics of handling the file set. At 200 million
records you have at least 20G of data in one Lucene index. It takes hours to
optimize this, and 10s of minutes to copy the optimized index around to
query servers.
Another problem is that indexing speed drops off after the index reaches a
certain size. When making multiple indexes, you want to stop indexing before
that size.
Lance

On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> :   We're doing similar thing with multi-core - when a core reaches
> : capacity (in our case 200 million records) we start a new core. We are
> : doing this via web service call (Create web service),
>
> this whole thread perplexes me ... while i can understand not wanting to
> let an index grow without bound becuase of hardware limitation, i don't
> understand what value you are gaining by creating a new core on the same
> box -- you're using the same physical resources to search the same number
> of documents, making multiple cores for this actaully seems like it would
> take up *more* resources to search the same amount of content, because the
> individual cores will be isolated and the term dictionaries can't be
> shared (not to mention you have to do a multi-shard query to get results
> from all the cores)
>
> are you doing something special with the old cores vs the new ones? (ie:
> create the new cores on new machines, shutdown cores after a certian
> amount of time has expired, etc...)
>
>
> : > Hi there,
> : >
> : > currently we want to add cores dynamically when the active one reaches
> : > some capacity,
> : > can anyone give me some hints to achieve such this functionality? (Just
> : > wondering if you have used shell-scripting or you have code some 100%
> : > Java based solution)
> : >
> : > Thx
> : >
> : >
> : > --
> : > Lici
> : >
> :
>
>
>
> -Hoss
>
>


-- 
Lance Norskog
goksron@gmail.com

Re: Adding cores dynamically

Posted by Chris Hostetter <ho...@fucit.org>.

:   We're doing similar thing with multi-core - when a core reaches
: capacity (in our case 200 million records) we start a new core. We are
: doing this via web service call (Create web service),

this whole thread perplexes me ... while i can understand not wanting to 
let an index grow without bound becuase of hardware limitation, i don't 
understand what value you are gaining by creating a new core on the same 
box -- you're using the same physical resources to search the same number 
of documents, making multiple cores for this actaully seems like it would 
take up *more* resources to search the same amount of content, because the 
individual cores will be isolated and the term dictionaries can't be 
shared (not to mention you have to do a multi-shard query to get results 
from all the cores)

are you doing something special with the old cores vs the new ones? (ie: 
create the new cores on new machines, shutdown cores after a certian 
amount of time has expired, etc...)


: > Hi there,
: >
: > currently we want to add cores dynamically when the active one reaches
: > some capacity,
: > can anyone give me some hints to achieve such this functionality? (Just
: > wondering if you have used shell-scripting or you have code some 100%
: > Java based solution)
: >
: > Thx
: >
: >
: > --
: > Lici
: >
: 



-Hoss

Re: Adding cores dynamically

Posted by vivek sar <vi...@gmail.com>.

Lici,

  We're doing similar thing with multi-core - when a core reaches
capacity (in our case 200 million records) we start a new core. We are
doing this via web service call (Create web service),

  http://wiki.apache.org/solr/CoreAdmin

This is all done in java code - before writing we check the number of
records in core - if reached it's capacity we create a new core and
then index there.

-vivek



2009/8/19 Licinio Fernández Maurelo <li...@gmail.com>:
> Hi there,
>
> currently we want to add cores dynamically when the active one reaches
> some capacity,
> can anyone give me some hints to achieve such this functionality? (Just
> wondering if you have used shell-scripting or you have code some 100%
> Java based solution)
>
> Thx
>
>
> --
> Lici
>