You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2007/12/11 07:47:01 UTC

purpose of MultiCore "default" ?

Forgive me if i'm off base with some stuff here ... i'm still trying to 
wrap my head arround some of the new multicore stuff.

Ryan's comments in SOLR-428 have made me realize that the "default" core 
means more then i thought.  I had missunderstood it to be a way of 
specifying what the "legacy" singlton core should be ... but based on on 
SOLR-428 I'm now getting the sense that the default core identifies what 
core to use if no core is specified in the URL, soif this is your 
multicore.xml...

   <multicore adminPath="/admin/multicore" persistent="true">
     <core name="core0" instanceDir="core0" default="true"/>
     <core name="core1" instanceDir="core1" />
   </multicore>

...then these two URLs are equivilent, correct?

    http://localhost:8983/solr/@core0/select?q=*:*
    http://localhost:8983/solr/select?q=*:*

If i may ask: what is the motivation for this?  isn't it fair to assume 
that if people want to use multiple cores they can include the core name 
in every URL?

The one use case i can think of is that based on the "SETASDEFAULT" option 
of the MultiCoreHandler i suspect people want to do stuff like this...


   1. start up server with a single "core0" as default
   2. use default URLs all day long...
        GET http://localhost:8983/solr/select?q=bar
        POST http://localhost:8983/solr/update ...
        GET http://localhost:8983/solr/select?q=foo
   3. decide you want to change the schema or something,
      load a new "core0"
   4. rebuild your index using "core0" urls...
        POST http://localhost:8983/solr/@core1/update ...
   5. once you're happy with "core1", set it as the default,
      and unload core0...
        GET http://localhost:8983/solr/admin/multicore?action=SETASDEFAULT&core=core1
        GET http://localhost:8983/solr/admin/multicore?action=UNLOAD&core=core0
   6. keep using core1 just like you use to use core0 with
      default urls...
        GET http://localhost:8983/solr/select?q=bar
        POST http://localhost:8983/solr/update ...
        GET http://localhost:8983/solr/select?q=foo

...this seems like a really cool use case of multicores, but it also seems 
like it is incompartible with the primary goal of multicores: having lots 
of different indexes; afterall: there's only one default, so you can only 
use this trick with one of your indexes.

It seems like if this is the only "perk" or having a "default" core, it 
would make more sense to require a core name in every url (when multicore 
support is turned on) and replace the SETASDEFAULT operation with a 
RENAME operation that changes the name of a core (unloading any previous 
core that was using that name) ... or maybe even support multiple names 
per core, with some ADDNAME, REMOVENAME, and MOVENAME options...

  1 /admin/multicore?action=ADDNAME&coreDir=cores/dir0&name=yak
  2 /@yak/select?q=*:*
  3 /admin/multicore?action=ADDNAME&coreDir=cores/dir1&name=foo
  4 /@foo/select?q=*:*
  5 /admin/multicore?action=ADDNAME&coreDir=cores/dir1&name=bar
  6 /@bar/select?q=*:*
      (#4 and #6 are now equivilent)
  7 /admin/multicore?action=REMOVENAME&coreDir=cores/dir1&name=foo
      (now #4 no longer works)
  8 /admin/multicore?action=MOVENAME&coreDir=cores/dir0&name=bar
      (now #2 and #6 are equivilent)

thoughts?

-Hoss


Re: purpose of MultiCore "default" ?

Posted by John Reuning <jr...@lulu.com>.
I have embedded solr wrapper code that does exactly this, with some
minor modifications to MultiCore.  The one missing piece is that for the
duration of step #3 below, adds and deletes to mainCore queued for
tempCore and applied when the reindex operation is complete.

-jrr

On Wed, 2007-12-19 at 18:01 -0500, Ryan McKinley wrote:

> 1. public queries "mainCore"
> 2. LOAD("tempCore") with same configs as "mainCore"
> 3. send all <add> commands to "tempCore"
> 5. SWAP("mainCore","tempCore") -- a synchronized name swap
> 6. UNLOAD("tempCore")



Re: purpose of MultiCore "default" ?

Posted by Chris Hostetter <ho...@fucit.org>.
: That leaves one purpose of MultiCore "default" -- there may be a better
: solution for it:  the "default" core is used to set the web-app wide variables
: configured in solrconfig.xml:
:  requestDispatcher/requestParsers/@multipartUploadLimitInKB
:  requestDispatcher/requestParsers/@enableRemoteStreaming
:  requestDispatcher/@handleSelect
:  abortOnConfigurationError

these seem like things that should live in a file that only exists once 
per Solr instance.  right now that's multicore.xml (but i'm starting to 
think it should have a more generic name).

semantics can be: if you want to override the hardcoded defaults for these 
options, and you have a multicore.xml, you *must* put the settings there 
-- otherwise you *may* specify them in solrconfig.xml.





-Hoss


Re: purpose of MultiCore "default" ?

Posted by Ryan McKinley <ry...@gmail.com>.
Henrib wrote:
> 
> Indeed, I reworded (plagiarized) your original proposal; sorry it took me a
> full thread loop to grasp it & realize that...
> 
> About "comfort", it feels like having one name and multiple aliases per core
> would be "easier" to work with than using a path-based identification; since
> the path is dependent on the deployment host (path can even be dependent on
> an environment variable), using a logical name would preserve more
> genericity (replication for instance).
> 
> On that premise, there are a few restrictions that {sh,c}ould apply:
> 0 - Name and aliases reside in a common identifier space; one identifier
> uniquely determines a core (can't have the identifier 'core' used a a name
> to point to coreA and as an alias to point to coreB)
> 1 - One core has one unique immutable name
> 2 - One core may have many aliases
> 3 - There are only 2 admin commands related to aliases:
>    3.1 - alias(core, alias): adds an alias to a core, overriding any
> existing alias but fails to override a core name.
>    3.2 - unalias(str); if str is a core name identifier, all its aliases get
> deleted, if str is an alias identifer only that alias gets deleted.
> 4 - Core addressing through URLs/API can use either name or alias (although
> using alias is best practice for common -aka non-admin- operations)
> 
> Would this still fit the bill? Any obvious (or not so) show-stopper?
> I'll try to post a prototype later to see how it goes.
> 

implementation wise, things get a little dicy when we allow multiple 
names for a core.

Going back to the problem default/alias is trying to solve:  I want to 
"re-index" a core and have clients continue to use it transparently.  As 
mentionied RENAME does not work because you loose access to a core that 
needs to be unloaded, but we could have a command for SWAP (or somehting 
like that).  This way when you want to reload a core you:

1. public queries "mainCore"
2. LOAD("tempCore") with same configs as "mainCore"
3. send all <add> commands to "tempCore"
5. SWAP("mainCore","tempCore") -- a synchronized name swap
6. UNLOAD("tempCore")

this would avoid having to manage/serialize multiple names.

That leaves one purpose of MultiCore "default" -- there may be a better 
solution for it:  the "default" core is used to set the web-app wide 
variables configured in solrconfig.xml:
  requestDispatcher/requestParsers/@multipartUploadLimitInKB
  requestDispatcher/requestParsers/@enableRemoteStreaming
  requestDispatcher/@handleSelect
  abortOnConfigurationError

ryan

Re: purpose of MultiCore "default" ?

Posted by Henrib <hb...@gmail.com>.

Indeed, I reworded (plagiarized) your original proposal; sorry it took me a
full thread loop to grasp it & realize that...

About "comfort", it feels like having one name and multiple aliases per core
would be "easier" to work with than using a path-based identification; since
the path is dependent on the deployment host (path can even be dependent on
an environment variable), using a logical name would preserve more
genericity (replication for instance).

On that premise, there are a few restrictions that {sh,c}ould apply:
0 - Name and aliases reside in a common identifier space; one identifier
uniquely determines a core (can't have the identifier 'core' used a a name
to point to coreA and as an alias to point to coreB)
1 - One core has one unique immutable name
2 - One core may have many aliases
3 - There are only 2 admin commands related to aliases:
   3.1 - alias(core, alias): adds an alias to a core, overriding any
existing alias but fails to override a core name.
   3.2 - unalias(str); if str is a core name identifier, all its aliases get
deleted, if str is an alias identifer only that alias gets deleted.
4 - Core addressing through URLs/API can use either name or alias (although
using alias is best practice for common -aka non-admin- operations)

Would this still fit the bill? Any obvious (or not so) show-stopper?
I'll try to post a prototype later to see how it goes.

-- 
View this message in context: http://www.nabble.com/purpose-of-MultiCore-%22default%22---tp14268755p14413886.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: purpose of MultiCore "default" ?

Posted by Chris Hostetter <ho...@fucit.org>.
: May be a bit late and not strictly about the Multicore "default" discussion
: but trying to aggregate the ideas;
: What about an "alias/unalias" feature that would allow managing multiple
: aliases (at least one) for each core?

this is pretty much exactly what i was suggesting in the first message in 
this thread, with the suggested ADDNAME REMOVENAME and MOVENAME options 
... where the only thing that uniquely identifies a core is the solr home 
dir of it's config files ... cores could have many names/aliases which 
could be changed at any moment to point at other cores.

if people don't like the idea of the config dir path being hte only thing 
that uniquely ids a core, then we could easily say every core has one and 
only one "name" that can't be changed, but every core can have many 
aliases that's fine too ... my worry is that we'd wind up wanting to 
use names and aliases pretty interchangable when doing queries and 
updates, but then people might get confused when they want to "move" a 
name from one core to another and it's not allowed but it is okay to move 
an alias.




-Hoss


Re: purpose of MultiCore "default" ?

Posted by Henrib <hb...@gmail.com>.

May be a bit late and not strictly about the Multicore "default" discussion
but trying to aggregate the ideas;
What about an "alias/unalias" feature that would allow managing multiple
aliases (at least one) for each core?
In the 'multiple index versions' scenario where someone would like to
reindex the whole collection because some structural changes needed in the
index (analysis chain update for instance), this would allow clean swapping.
Say 'articles,3' is my current index & collection with an alias of
'articles' so every client accesses it through
'http://host:port/solr/articles/'. I can create a new core named
'articles,4'  with my new index definition (index4), reindex the whole
collection and then override the alias for 'articles' when that core is
ready.
This does not preclude using the multicore feature to other ends (multiple
un-related indexes in the same webapp - that could still benefit from the
reindexing scenario btw).
Not sure if we really need multiple aliases capability; one usage is to
allow development, staging/QA & production profiles using the same URLs that
would be re-aliased on need so one can develop high level regression tests
(ensuring some typical query do get the proper results; dev aliases the core
as stage for QA to be performed, QA aliases that core as production when
checked).
Is this something we could try or are there obvious issues with it?

-- 
View this message in context: http://www.nabble.com/purpose-of-MultiCore-%22default%22---tp14268755p14373961.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: purpose of MultiCore "default" ?

Posted by Chris Hostetter <ho...@fucit.org>.
: > ...this seems like a really cool use case of multicores, but it also seems
: > like it is incompartible with the primary goal of multicores: having lots of
: > different indexes; afterall: there's only one default, so you can only use
: > this trick with one of your indexes.

: I have seen the primary goal of multicore as allowing you to modify / reload a
: core at runtime.  The static singletons prohibited any changes after startup.

riiiiight.  okay, let's call the "second" goal of multicore support: 
"support more then one (disparate) index in the same servlet context" 
since people seem to want that.   the point is having a default core makes 
those two usecases disjoint (and people using the second use case can't 
also do the "rebuild on the backend" type approach of the primary use case 
... hence my suggestion that there be no default core, every core have a 
name, and it be possible to move names arround.


: I think we should keep LOAD,UNLOAD,RELOAD and then adding "RENAME"

What do you suggest the semantics of RENAME are?  If i 
LOAD(mainCore), then LOAD(onDeckCOre) and then RENAME(onDeckCore => 
mainCOre) how to i access the old mainCore to UNLOAD it?

(this is why i suggested ADDNAME, REMOVENAME, and MOVENAME ... the idea 
being that each core can have more then one name ... so you can still 
refer to a core even after you give it's "main" name to someone else.

I was suggesting that those operations all be applied to "coreDir" ... the 
one thing about each core that can't be changed so there's never any down 
what core you're dealing with, but they could just as easily be applied to 
existing names of cores (except that you have to define what happens if 
you remove/rename the "last" name a core has)

: I am fine with this.  The only problem I have is that solrj clients would need
: to be configured with the solr url AND the default core name rather then just
: the solr url.

why?  if every core must have a name, and every url has a core name at the 
"root" of the path, then wouldn't you just configure the client with the 
"url prefix" of "http://host:port/context/core/"





-Hoss


Re: purpose of MultiCore "default" ?

Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> 
> Forgive me if i'm off base with some stuff here ... i'm still trying to 
> wrap my head arround some of the new multicore stuff.
> 

No forgiveness here!  The more comments/questions/clarification, the 
better for everyone.  Especially for something as substantial as this.


> 
>   1. start up server with a single "core0" as default
>   2. use default URLs all day long...
>        GET http://localhost:8983/solr/select?q=bar
>        POST http://localhost:8983/solr/update ...
>        GET http://localhost:8983/solr/select?q=foo
>   3. decide you want to change the schema or something,
>      load a new "core0"
>   4. rebuild your index using "core0" urls...
>        POST http://localhost:8983/solr/@core1/update ...
>   5. once you're happy with "core1", set it as the default,
>      and unload core0...
>        GET 
> http://localhost:8983/solr/admin/multicore?action=SETASDEFAULT&core=core1
>        GET 
> http://localhost:8983/solr/admin/multicore?action=UNLOAD&core=core0
>   6. keep using core1 just like you use to use core0 with
>      default urls...
>        GET http://localhost:8983/solr/select?q=bar
>        POST http://localhost:8983/solr/update ...
>        GET http://localhost:8983/solr/select?q=foo
>

Yes, this is exactly what I had in mind.  I want to periodically rebuild 
an index without downtime and without mucking with a load balancer.


> ...this seems like a really cool use case of multicores, but it also 
> seems like it is incompartible with the primary goal of multicores: 
> having lots of different indexes; afterall: there's only one default, so 
> you can only use this trick with one of your indexes.
> 

I have seen the primary goal of multicore as allowing you to modify / 
reload a core at runtime.  The static singletons prohibited any changes 
after startup.



> It seems like if this is the only "perk" or having a "default" core, it 
> would make more sense to require a core name in every url (when 
> multicore support is turned on) and replace the SETASDEFAULT operation 
> with a RENAME operation that changes the name of a core (unloading any 
> previous core that was using that name) ... or maybe even support 
> multiple names per core, with some ADDNAME, REMOVENAME, and MOVENAME 
> options...
> 
>  1 /admin/multicore?action=ADDNAME&coreDir=cores/dir0&name=yak
>  2 /@yak/select?q=*:*
>  3 /admin/multicore?action=ADDNAME&coreDir=cores/dir1&name=foo
>  4 /@foo/select?q=*:*
>  5 /admin/multicore?action=ADDNAME&coreDir=cores/dir1&name=bar
>  6 /@bar/select?q=*:*
>      (#4 and #6 are now equivilent)
>  7 /admin/multicore?action=REMOVENAME&coreDir=cores/dir1&name=foo
>      (now #4 no longer works)
>  8 /admin/multicore?action=MOVENAME&coreDir=cores/dir0&name=bar
>      (now #2 and #6 are equivilent)
> 
> thoughts?
> 

I think we should keep LOAD,UNLOAD,RELOAD and then adding "RENAME"

I am fine with this.  The only problem I have is that solrj clients 
would need to be configured with the solr url AND the default core name 
rather then just the solr url.

This would make the multi-core URL rule:
http://host/context/corename/handler?params

The only naming restraint is that "corename" can not contain '/' 
(handler can contain '/')

ryan