You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Blargy <zm...@hotmail.com> on 2010/03/28 20:54:22 UTC

Multicore process

I was hoping someone could explain to me how your Solr multicore process
currently operates.

This is what I am thinking about and I was hoping I could get some
ideas/suggestions. 

I have a master/slave setup where the master will be doing all the indexing
via DIH. Ill be doing a full-import every day or two with delta-imports
being run throughout the day. I want to be able to have have an offline core
that will be responsible for the the full-importing and when finished it
will be swapped with the live core. While the full-import may take a few
hours on the offline core Ill have delta-imports running on the live core.
All slaves will be replicating from the master live core. Any comments on
this logic?

Ok, now to the implementation. I've been playing around with the core admin
all day today but Im still unsure on the best way to accomplish the above
process. Im guessing first I need to create a new core. Then Ill have to
issue a DIH full-import against this new core. Then Ill run a swap command
against offline and live cores which should switch the cores. This sounds
about right but then Ill have a core named live which will not actually be
live anymore right? Is there anyway around this?

When setting up the new core what should I use for my instanceDir and
dataDir? At first I had something like this

home/items/data/live/index
home/items/data/offline/index

but I dont think this is right. Should I have something like this?

home/items/data/index
home/items-offline/data/index

When creating a new core from an existing core do the index files get
copied? 

Can someone please explain to me this whole process. Thanks!



-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p681929.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.

Mark Miller-3 wrote:
> 
> Hmmm...but isn't your slave on a different machine? Every install is
> going to need a solr.xml, no way around that..
> 

Of course its on another machine. I was just hoping to only have 1 version
of solr.xml checked into our source control and that I can change which
configuration to use by passing some sort of java property on the command
line. Like I said its no real probelm.. im just getting picky now ;) Ill
just have to make sure that during the deploy that the correct configuration
gets copied to home/solr.xml

Thanks again!


-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p682225.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Mark Miller <ma...@gmail.com>.
On 03/28/2010 05:43 PM, Blargy wrote:
> Thanks that makes perfect sense for solrconfig.xml however I dont see that
> sort of functionality for solr.xml.
>
> Im guessing Ill need to manage 2 different versions of solr.xml
>
> Version 1 master
> <solr persistent="true" sharedLib="lib">
>    <cores adminPath="/admin/cores" shareSchema="true">
>      <core name="items-live" instanceDir="items" default="true"
> dataDir="data/core0"/>
>      <core name="items-offline" instanceDir="items" default="true"
> dataDir="data/core1"/>
>    </cores>
> </solr>
>
> Version 2 slave
> <solr persistent="false" sharedLib="lib">
>    <cores adminPath="/admin/cores" shareSchema="true">
>      <core name="items" instanceDir="items" default="true">
>    </cores>
> </solr>
>
> And my app will always be pointing to http://slave-host:8983/solr/items
>
> This isnt the biggest deal but if there is a better/alternative way I would
> love to know.
>    

Hmmm...but isn't your slave on a different machine? Every install is 
going to need a solr.xml, no way around that (other than removing the 
solr.xml and doing all multicore stuff programmaticly :) ).

> Mark, I see you work for LucidImagination. Does the Lucid solr distribution
> happen to come with Solr-236 patch (Field Collapsing). I know it has some
> extras thrown in there but not quite sure of the exact nature of it. Im
> already using the LucidKStemmer ;)
>    

No, no Field Collapsing in the Lucid Dist - it will make it into Solr 
eventually tough.

-- 
- Mark

http://www.lucidimagination.com




Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.
Thanks that makes perfect sense for solrconfig.xml however I dont see that
sort of functionality for solr.xml.

Im guessing Ill need to manage 2 different versions of solr.xml

Version 1 master
<solr persistent="true" sharedLib="lib">
  <cores adminPath="/admin/cores" shareSchema="true">
    <core name="items-live" instanceDir="items" default="true"
dataDir="data/core0"/>
    <core name="items-offline" instanceDir="items" default="true"
dataDir="data/core1"/>
  </cores>
</solr>

Version 2 slave
<solr persistent="false" sharedLib="lib">
  <cores adminPath="/admin/cores" shareSchema="true">
    <core name="items" instanceDir="items" default="true">
  </cores>
</solr>

And my app will always be pointing to http://slave-host:8983/solr/items

This isnt the biggest deal but if there is a better/alternative way I would
love to know.

Mark, I see you work for LucidImagination. Does the Lucid solr distribution
happen to come with Solr-236 patch (Field Collapsing). I know it has some
extras thrown in there but not quite sure of the exact nature of it. Im
already using the LucidKStemmer ;)
-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p682205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Mark Miller <ma...@gmail.com>.
On 03/28/2010 05:14 PM, Blargy wrote:
> Nice. Almost there...
>
> So it appears then that I will need two different solr.xml configurations.
> One for the master defining core0 and core1 and one for the slave with the
> default configuration. Is there anyway to specify master/slave specific
> settings in solr.xml or will I have to have 2 different versions?
>
> Not as big of a deal but in the future when I have more than 1 type of
> document (currently "items") how would I configure solrconfig.xml for
> replication? For example I have this as of now:
>
> <str name="masterUrl">
>   http://localhost:8983/solr/items-live/replication
> </str>
>
> Which is fine... but what happens when I have another object say "users"
>
> <str name="masterUrl">
>   http://localhost:8983/solr/users-live/replication
> </str>
>
> I guess when it comes down to that I will have to have 2 different versions
> of solrconfig.xml too?
>
> ps. I can't thank you enough for your time
>    
Right -  two different solrconfig.xml's, or use XInclude to factor out 
the common parts into a third single file, and the two can just have the 
unique pieces in them.

http://wiki.apache.org/solr/SolrConfigXml?highlight=%28xinclude%29#XInclude

-- 
- Mark

http://www.lucidimagination.com




Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.
Nice. Almost there...

So it appears then that I will need two different solr.xml configurations.
One for the master defining core0 and core1 and one for the slave with the
default configuration. Is there anyway to specify master/slave specific
settings in solr.xml or will I have to have 2 different versions?

Not as big of a deal but in the future when I have more than 1 type of
document (currently "items") how would I configure solrconfig.xml for
replication? For example I have this as of now:

<str name="masterUrl">
 http://localhost:8983/solr/items-live/replication
</str>

Which is fine... but what happens when I have another object say "users"

<str name="masterUrl">
 http://localhost:8983/solr/users-live/replication
</str>

I guess when it comes down to that I will have to have 2 different versions
of solrconfig.xml too?

ps. I can't thank you enough for your time
-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p682176.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Mark Miller <ma...@gmail.com>.
On 03/28/2010 04:49 PM, Blargy wrote:
> I just thought about this...
>
> Im guessing my slaves should always be replicating from the "live" master
> core: (http://localhost:8983/solr/items-live/replication).
>
> So my master solr will have a directory structure like this:
>
> home/items/data/core0/index
> home/items/data/core1/index
>
> and at any point the "live" core could be physically located at core0 or
> core1
>
> Whereas my slave solr will have a directory structure like this:
> home/items/data/index
>
> Is this close?
>
>
>
>    

Yes, exactly.

-- 
- Mark

http://www.lucidimagination.com




Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.
I just thought about this...

Im guessing my slaves should always be replicating from the "live" master
core: (http://localhost:8983/solr/items-live/replication). 

So my master solr will have a directory structure like this:

home/items/data/core0/index
home/items/data/core1/index

and at any point the "live" core could be physically located at core0 or
core1

Whereas my slave solr will have a directory structure like this:
home/items/data/index

Is this close?



-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p682149.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.
Ok great... its starting to make sense. Now Im just a little confused on
replication.

So I had previously had my slave configuration as follows

 <requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="${replication.master}">
      <str name="replicateAfter">commit</str>
      <str name="replicateAfter">startup</str>
      <str name="confFiles">schema.xml,stopwords.txt</str>
    </lst>
    <lst name="${replication.slave}">
      <str name="masterUrl">
       
http://${replication.host}:8983/solr/${solr.core.instanceDir}replication
      </str>
      <str name="pollInterval">${replication.interval}</str>
    </lst>
  </requestHandler>

But Im assuming Ill need to change this now? I really only want my "live"
data to be replicated so how can I configure this? There is no real need for
the slaves to replicate the "offline" data.

FYI my dir structure looks like this:

home/items/data/core0/index
home/items/data/core1/index

-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p682141.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Mark Miller <ma...@gmail.com>.
Right - I'd just have the data dir be generic (like core0, core1 as you 
have i example 2) and then the names will be live and offline and flip 
back and forth between the core0, core1 dirs.


On 03/28/2010 04:06 PM, Blargy wrote:
> Mark, first off thanks for the response. Im glad someone is around today ;)
>
> So this is what I have so far:
>
> <solr persistent="true" sharedLib="lib">
>    <cores adminPath="/admin/cores" shareSchema="true">
>      <core name="items-live" instanceDir="items" default="true"
> dataDir="data/live"/>
>      <core name="items-offline" instanceDir="items" default="true"
> dataDir="data/offline"/>
>    </cores>
> </solr>
>
> So my directory structure is:
>
> home/items/data/live/index
> home/items/data/offline/index
>
> So after playing around I see that swap literally just swaps the dataDir in
> solr.xml. I have peristent = true so it saves which core is pointing to
> which dataDir. So where I think I am a little confused is the naming
> convention I used above. In this type of setup there is no such thing as a
> live or offline dataDir as at any point they can be one or the other... the
> core name is what really matters. So Im guessing this naming convention
> makes a little more sense
>
> <solr persistent="true" sharedLib="lib">
>    <cores adminPath="/admin/cores" shareSchema="true">
>      <core name="items-live" instanceDir="items" default="true"
> dataDir="data/core0"/>
>      <core name="items-offline" instanceDir="items" default="true"
> dataDir="data/core1"/>
>    </cores>
> </solr>
>
> Sine the actually dataDir name really doesnt mean anything. Is this the
> correct reasoning?
>    


-- 
- Mark

http://www.lucidimagination.com




Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.
Mark, first off thanks for the response. Im glad someone is around today ;)

So this is what I have so far:

<solr persistent="true" sharedLib="lib">
  <cores adminPath="/admin/cores" shareSchema="true">
    <core name="items-live" instanceDir="items" default="true"
dataDir="data/live"/>
    <core name="items-offline" instanceDir="items" default="true"
dataDir="data/offline"/>
  </cores>
</solr>

So my directory structure is:

home/items/data/live/index
home/items/data/offline/index

So after playing around I see that swap literally just swaps the dataDir in
solr.xml. I have peristent = true so it saves which core is pointing to
which dataDir. So where I think I am a little confused is the naming
convention I used above. In this type of setup there is no such thing as a
live or offline dataDir as at any point they can be one or the other... the
core name is what really matters. So Im guessing this naming convention
makes a little more sense

<solr persistent="true" sharedLib="lib">
  <cores adminPath="/admin/cores" shareSchema="true">
    <core name="items-live" instanceDir="items" default="true"
dataDir="data/core0"/>
    <core name="items-offline" instanceDir="items" default="true"
dataDir="data/core1"/>
  </cores>
</solr>

Sine the actually dataDir name really doesnt mean anything. Is this the
correct reasoning? 
-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p682088.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore process

Posted by Mark Miller <ma...@gmail.com>.
On 03/28/2010 02:54 PM, Blargy wrote:
>
>  I was hoping someone could explain to me how your Solr multicore
>  process currently operates.
>
>  This is what I am thinking about and I was hoping I could get some
>  ideas/suggestions.
>
>  I have a master/slave setup where the master will be doing all the
>  indexing via DIH. Ill be doing a full-import every day or two with
>  delta-imports being run throughout the day. I want to be able to have
>  have an offline core that will be responsible for the the
>  full-importing and when finished it will be swapped with the live
>  core. While the full-import may take a few hours on the offline core
>  Ill have delta-imports running on the live core. All slaves will be
>  replicating from the master live core. Any comments on this logic?

Whats the purpose of the full import if you will also be doing delta 
imports? Won't the live core end up the same as the offline core that 
got the full import? I'm sure you have a reason, just not following...

>
>  Ok, now to the implementation. I've been playing around with the core
>  admin all day today but Im still unsure on the best way to accomplish
>  the above process. Im guessing first I need to create a new core.
>  Then Ill have to issue a DIH full-import against this new core. Then
>  Ill run a swap command against offline and live cores which should
>  switch the cores. This sounds about right but then Ill have a core
>  named live which will not actually be live anymore right? Is there
>  anyway around this?

Hmm...this is not really true. The core that is accessed by hitting 
/live will always be the live core (though the underlying SolrCore 
object will change) if that is the access path you use for live traffic 
- see below.

>
>  When setting up the new core what should I use for my instanceDir
>  and dataDir? At first I had something like this
>
>  home/items/data/live/index home/items/data/offline/index
>
>  but I dont think this is right. Should I have something like this?
>
>  home/items/data/index home/items-offline/data/index

Yes - like this - the index dir under the data dir. But you only should 
make the data dir - the core will make the index dir when it does not 
see it - you will have issues if you make an empty index dir - seeing 
the dir, the core won't create it, and so the index will never get 
created inside the dir.

>
>  When creating a new core from an existing core do the index files
>  get copied?

I'm not sure what you mean here? I'm guessing the swap command as you 
reference above?

Swap will simply change what path references which core. So to start, 
localhost:8983/solr/live will hit one core, and 
localhost:8983/solr/offline will hit another core. You will direct all 
traffic to /live. Once you do the swap(live,offline), the live URL will 
actually hit the other core, and the offline URL will hit the previously 
live core. So there is no move or copy of files - it simply swaps which 
name accesses which core. Same thing if you are using solrj - it just 
changes which access name brings back a given underlying core.

>
>  Can someone please explain to me this whole process. Thanks!
>
>
>


-- 
- Mark

http://www.lucidimagination.com




Re: Multicore process

Posted by Mark Miller <ma...@gmail.com>.
On 03/28/2010 02:58 PM, Blargy wrote:
>
>  Also, how do I share the same schema and config files?

In solr.xml you can specify schema.xml and config.xml - just specify the 
same one for each core. If you are creating cores dynamically, you can 
still do this. You prob want to use the shareSchema option.

http://wiki.apache.org/solr/CoreAdmin

-- 
- Mark

http://www.lucidimagination.com




Re: Multicore process

Posted by Blargy <zm...@hotmail.com>.
Also, how do I share the same schema and config files?
-- 
View this message in context: http://n3.nabble.com/Multicore-process-tp681929p681936.html
Sent from the Solr - User mailing list archive at Nabble.com.