You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Mark Bennett <mb...@ideaeng.com> on 2009/06/01 19:33:50 UTC

Complete discussion of Solr "cores" ? (beyond wiki?)

I've been trying to create some small command line tools to exercise
different parts of the Solr system.

In doing so I've been trying to load minimal cores and have had some limited
success.  For example, I've had some luck with CoreContainer and
CoreDesriptor, getting a single core instance, etc.

However, even after re-reading the wiki entries, I'm still not clear what is
meant by "core".  The info I find talks about administering multiple cores,
which is nice, but doesn't really explain concisely what the core "is"?
Presumably the base set of "stuff" that Solr needs to run.

But, for example, is an instance directory with a valid solrconfig.xml and
schema.xml a valid singular core?  Or do you then need to register a core
with an arbitrary name?  For example, getCoreNames() gives an empty array,
even after it's loaded a valid instance dir / config / schema.

And does a core REQUIRE a process to come up on a port, even for an instant,
or can you do some quick tests from the command line with a "static" core
and not need to bind to a port?

And can search operations be done "statically", without the use of TCP/IP
port, like Lucene can?  Do cores support this?

And when moving from a single to multi core, how much extra configuration is
needed?  On the one extreme, do multiple cores require a completely separate
instance dir, data dir and solrconfig?  Or on the other end of the spectrum,
maybe it's just a second process with a different "core" name, maybe on a
secondary port?  I really these must sound very "newb" to your hard core
solr guys.

Then there are the deprecated methods for instantiating cores.  How "bad" is
it to use those?  And why were they changed?

These are the types of things I'm a bit confused on.  Is there an article or
presenation that goes over some of this?

Thanks all,
Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com

Re: Complete discussion of Solr "cores" ? (beyond wiki?)

Posted by Chris Hostetter <ho...@fucit.org>.
: And to be clear, is an Embedded Core an object that does not need to bring
: up a TCP/IP port?  It can be used with just static or instance calls,
: without bringing up a listener?

correct -- when you "Embed" solr into an app, you are instantiating hte 
solr internals in your app, and dealing with it programaticly via Java 
calls.  When you run the solr application, it's a war that runs in an 
application server, and the application server listens on a TCP/IP port 
and "proxies" HTTP requests to those same internal APIs.

If you use SolrJ, your app can transparently deal with "Solr" without 
knowing wether Solr is embeeded or remote.



-Hoss


Re: Complete discussion of Solr "cores" ? (beyond wiki?)

Posted by Mark Bennett <mb...@ideaeng.com>.
Great info, thanks!

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Tue, Jun 30, 2009 at 11:06 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Date: Mon, 1 Jun 2009 10:33:50 -0700
> : From: Mark Bennett
> : Subject: Complete discussion of Solr "cores" ? (beyond wiki?)
>
> Mark: catching up on my mail, i don't see much discussion arround your
> various questions.
>
> : However, even after re-reading the wiki entries, I'm still not clear what
> is
> : meant by "core".  The info I find talks about administering multiple
> cores,
>
> I've attempted to clarify that a bit by adding an entry to the terminology
> page...
>
> http://wiki.apache.org/solr/SolrTerminology
>

Ah, good info!  I've also added a quick blurb to the top of:
http://wiki.apache.org/solr/CoreAdmin
Links frequently reference there without the intro.
And added a link from there to:
http://wiki.apache.org/solr/MultipleIndexes
which also has some info.


>
> >> Solr Core: Also referred to as just a "Core" This is a running instance
> >> of a Solr index along with all of it's configuration (SolrConfigXml,
> >> SchemaXml, etc...). A single Solr application can contain 0 or more
> >> cores which are run largely in isolation but can communicate with each
> >> other if necessary via the CoreContainer. From a historical
> >> perspective: Solr initially only supported one index, and the SolrCore
> >> class was a singleton for coordinating the low level functionality at
> >> the "core" of Solr. When support was added for creating and managing
> >> multiple Cores on the fly, the class was refactored to no longer be a
> >> Singleton, but the name stuck.
>
> : But, for example, is an instance directory with a valid solrconfig.xml
> and
> : schema.xml a valid singular core?  Or do you then need to register a core
>
> An instanceDir is neccessary to create a SolrCore, but a SolrCore is a
> running java object that manages access to an index (via request handlers,
> and field types etc...) based on those configs.
>
> : with an arbitrary name?  For example, getCoreNames() gives an empty
> array,
> : even after it's loaded a valid instance dir / config / schema.
>
> ...hmmm. can you give us a more concrete example of how you are seeing
> this.  (code, configs, etc...).  That may be the expected situation when
> running in the legacy "single core" mode (ie: no solr.xml file) but i'm
> not 100% certain.


Let me dig that up.


> : And does a core REQUIRE a process to come up on a port, even for an
> instant,
> : or can you do some quick tests from the command line with a "static" core
> : and not need to bind to a port?
>
> : And can search operations be done "statically", without the use of TCP/IP
> : port, like Lucene can?  Do cores support this?
>
> a Solr Core is really a java run time concept .. typically Solr Cores
> existing the Solr app -- which is a webapp, living in a servlet container,
> running on a part -- but any java program can use a CoreContainer to bring
> up a SolrCore if it wants to (this would be known as Embedded Solr)


Somebody (Otis?) had said those were deprecated, though I don't recall why.

And to be clear, is an Embedded Core an object that does not need to bring
up a TCP/IP port?  It can be used with just static or instance calls,
without bringing up a listener?


> : And when moving from a single to multi core, how much extra configuration
> is
> : needed?  On the one extreme, do multiple cores require a completely
> separate
> : instance dir, data dir and solrconfig?  Or on the other end of the
> spectrum,
>
> again it depends on wether you are talking about the Solr application, or
> about embedding SOlr is another applicaiton.  In the Solr app you need a
> solr.xml file containing info about the multiple cores you want to run, or
> at least indicating that you want to run multiple cores and then you can
> create them at run time using the CoreAdmin commands.
>
> if you embed solr then you can declare cores progromaticly in java.
>
> as for wether you need seperate instanceDir, data dir, and solrconfig ...
> you definitely need seperate datadirs, but you can get away with reusing
> the same config files if you actuall ywant the SolrCores to all have
> identical configs -- if you wnat them to be different, they need to be
> different (obviously)
>

Thanks Hoss!

>
>
> -Hoss
>
>

Re: Complete discussion of Solr "cores" ? (beyond wiki?)

Posted by Chris Hostetter <ho...@fucit.org>.
: I trimmed down the code and tested with a fresh install.
: 
: Started with -Dsolr.solr.home set to apache-solr-nightly/example/solr
: 
: It starts up, but then container.getCores() and container.getCoreNames()
: give back zero element collections.

I would start a new thread on solr-user asking about using solr in an 
embedded context and what you might be doing wrong.   i'm pretty sure the 
SolrJ docs/unit tests have some good examples of doing something like 
this.

: 
: When run:
: ... lots of init ....
: Testing Cores ...
: Have 0 cores.
: Have 0 names.
: 
: // InitCoresMin.java
: import org.apache.lucene.analysis.Analyzer;
: import org.apache.lucene.analysis.TokenStream;
: import org.apache.lucene.index.Payload;
: import org.apache.solr.analysis.TokenFilterFactory;
: import org.apache.solr.analysis.TokenizerChain;
: import org.apache.solr.analysis.TokenizerFactory;
: import org.apache.solr.schema.FieldType;
: import org.apache.solr.schema.SchemaField;
: import org.apache.solr.common.util.XML;
: import java.io.IOException;
: import java.io.Reader;
: import java.io.StringReader;
: import java.util.*;
: import java.math.BigInteger;
: import java.io.File;
: 
: public class InitCoresMin
: {
: 
: public static void main( String [] argv )
: {
: 
: // From SolrContainer
: String instanceDir =
: org.apache.solr.core.SolrResourceLoader.locateInstanceDir();
: System.out.println( "instanceDir='" + instanceDir + "'" );
: 
: try {
:     org.apache.solr.core.SolrConfig cfg = new
: org.apache.solr.core.SolrConfig();
:     System.out.println( "Using config '" + cfg.getName() + "'" );
: 
:     org.apache.solr.core.CoreContainer container =
:         new org.apache.solr.core.CoreContainer(
:             new org.apache.solr.core.SolrResourceLoader(instanceDir)
:         );
: 
:     System.out.println();
:     System.out.println( "Testing Cores ..." );
: 
:     Collection cores = container.getCores();
:     System.out.println( "Have " + cores.size() + " cores." );
: 
:     Collection names = container.getCoreNames();
:     System.out.println( "Have " + names.size() + " names." );
: }
: catch( Exception e )
: {
:     System.err.println( "Exception: " + e );
:     e.printStackTrace( System.err );
: }
: 
: } // end of main
: } // end of class
: 
: 
: 
: On Tue, Jun 30, 2009 at 11:06 AM, Chris Hostetter
: <ho...@fucit.org>wrote:
: 
: >
: > : Date: Mon, 1 Jun 2009 10:33:50 -0700
: > : From: Mark Bennett
: > : Subject: Complete discussion of Solr "cores" ? (beyond wiki?)
: >
: > Mark: catching up on my mail, i don't see much discussion arround your
: > various questions.
: >
: > : However, even after re-reading the wiki entries, I'm still not clear what
: > is
: > : meant by "core".  The info I find talks about administering multiple
: > cores,
: >
: > I've attempted to clarify that a bit by adding an entry to the terminology
: > page...
: >
: > http://wiki.apache.org/solr/SolrTerminology
: >
: > >> Solr Core: Also referred to as just a "Core" This is a running instance
: > >> of a Solr index along with all of it's configuration (SolrConfigXml,
: > >> SchemaXml, etc...). A single Solr application can contain 0 or more
: > >> cores which are run largely in isolation but can communicate with each
: > >> other if necessary via the CoreContainer. From a historical
: > >> perspective: Solr initially only supported one index, and the SolrCore
: > >> class was a singleton for coordinating the low level functionality at
: > >> the "core" of Solr. When support was added for creating and managing
: > >> multiple Cores on the fly, the class was refactored to no longer be a
: > >> Singleton, but the name stuck.
: >
: > : But, for example, is an instance directory with a valid solrconfig.xml
: > and
: > : schema.xml a valid singular core?  Or do you then need to register a core
: >
: > An instanceDir is neccessary to create a SolrCore, but a SolrCore is a
: > running java object that manages access to an index (via request handlers,
: > and field types etc...) based on those configs.
: >
: > : with an arbitrary name?  For example, getCoreNames() gives an empty
: > array,
: > : even after it's loaded a valid instance dir / config / schema.
: >
: > ...hmmm. can you give us a more concrete example of how you are seeing
: > this.  (code, configs, etc...).  That may be the expected situation when
: > running in the legacy "single core" mode (ie: no solr.xml file) but i'm
: > not 100% certain.
: >
: > : And does a core REQUIRE a process to come up on a port, even for an
: > instant,
: > : or can you do some quick tests from the command line with a "static" core
: > : and not need to bind to a port?
: >
: > : And can search operations be done "statically", without the use of TCP/IP
: > : port, like Lucene can?  Do cores support this?
: >
: > a Solr Core is really a java run time concept .. typically Solr Cores
: > existing the Solr app -- which is a webapp, living in a servlet container,
: > running on a part -- but any java program can use a CoreContainer to bring
: > up a SolrCore if it wants to (this would be known as Embedded Solr)
: >
: > : And when moving from a single to multi core, how much extra configuration
: > is
: > : needed?  On the one extreme, do multiple cores require a completely
: > separate
: > : instance dir, data dir and solrconfig?  Or on the other end of the
: > spectrum,
: >
: > again it depends on wether you are talking about the Solr application, or
: > about embedding SOlr is another applicaiton.  In the Solr app you need a
: > solr.xml file containing info about the multiple cores you want to run, or
: > at least indicating that you want to run multiple cores and then you can
: > create them at run time using the CoreAdmin commands.
: >
: > if you embed solr then you can declare cores progromaticly in java.
: >
: > as for wether you need seperate instanceDir, data dir, and solrconfig ...
: > you definitely need seperate datadirs, but you can get away with reusing
: > the same config files if you actuall ywant the SolrCores to all have
: > identical configs -- if you wnat them to be different, they need to be
: > different (obviously)
: >
: >
: >
: > -Hoss
: >
: >
: 



-Hoss


Re: Complete discussion of Solr "cores" ? (beyond wiki?)

Posted by Mark Bennett <mb...@ideaeng.com>.
Hi Hoss,

I trimmed down the code and tested with a fresh install.

Started with -Dsolr.solr.home set to apache-solr-nightly/example/solr

It starts up, but then container.getCores() and container.getCoreNames()
give back zero element collections.

When run:
... lots of init ....
Testing Cores ...
Have 0 cores.
Have 0 names.

// InitCoresMin.java
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.index.Payload;
import org.apache.solr.analysis.TokenFilterFactory;
import org.apache.solr.analysis.TokenizerChain;
import org.apache.solr.analysis.TokenizerFactory;
import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.common.util.XML;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.util.*;
import java.math.BigInteger;
import java.io.File;

public class InitCoresMin
{

public static void main( String [] argv )
{

// From SolrContainer
String instanceDir =
org.apache.solr.core.SolrResourceLoader.locateInstanceDir();
System.out.println( "instanceDir='" + instanceDir + "'" );

try {
    org.apache.solr.core.SolrConfig cfg = new
org.apache.solr.core.SolrConfig();
    System.out.println( "Using config '" + cfg.getName() + "'" );

    org.apache.solr.core.CoreContainer container =
        new org.apache.solr.core.CoreContainer(
            new org.apache.solr.core.SolrResourceLoader(instanceDir)
        );

    System.out.println();
    System.out.println( "Testing Cores ..." );

    Collection cores = container.getCores();
    System.out.println( "Have " + cores.size() + " cores." );

    Collection names = container.getCoreNames();
    System.out.println( "Have " + names.size() + " names." );
}
catch( Exception e )
{
    System.err.println( "Exception: " + e );
    e.printStackTrace( System.err );
}

} // end of main
} // end of class



On Tue, Jun 30, 2009 at 11:06 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Date: Mon, 1 Jun 2009 10:33:50 -0700
> : From: Mark Bennett
> : Subject: Complete discussion of Solr "cores" ? (beyond wiki?)
>
> Mark: catching up on my mail, i don't see much discussion arround your
> various questions.
>
> : However, even after re-reading the wiki entries, I'm still not clear what
> is
> : meant by "core".  The info I find talks about administering multiple
> cores,
>
> I've attempted to clarify that a bit by adding an entry to the terminology
> page...
>
> http://wiki.apache.org/solr/SolrTerminology
>
> >> Solr Core: Also referred to as just a "Core" This is a running instance
> >> of a Solr index along with all of it's configuration (SolrConfigXml,
> >> SchemaXml, etc...). A single Solr application can contain 0 or more
> >> cores which are run largely in isolation but can communicate with each
> >> other if necessary via the CoreContainer. From a historical
> >> perspective: Solr initially only supported one index, and the SolrCore
> >> class was a singleton for coordinating the low level functionality at
> >> the "core" of Solr. When support was added for creating and managing
> >> multiple Cores on the fly, the class was refactored to no longer be a
> >> Singleton, but the name stuck.
>
> : But, for example, is an instance directory with a valid solrconfig.xml
> and
> : schema.xml a valid singular core?  Or do you then need to register a core
>
> An instanceDir is neccessary to create a SolrCore, but a SolrCore is a
> running java object that manages access to an index (via request handlers,
> and field types etc...) based on those configs.
>
> : with an arbitrary name?  For example, getCoreNames() gives an empty
> array,
> : even after it's loaded a valid instance dir / config / schema.
>
> ...hmmm. can you give us a more concrete example of how you are seeing
> this.  (code, configs, etc...).  That may be the expected situation when
> running in the legacy "single core" mode (ie: no solr.xml file) but i'm
> not 100% certain.
>
> : And does a core REQUIRE a process to come up on a port, even for an
> instant,
> : or can you do some quick tests from the command line with a "static" core
> : and not need to bind to a port?
>
> : And can search operations be done "statically", without the use of TCP/IP
> : port, like Lucene can?  Do cores support this?
>
> a Solr Core is really a java run time concept .. typically Solr Cores
> existing the Solr app -- which is a webapp, living in a servlet container,
> running on a part -- but any java program can use a CoreContainer to bring
> up a SolrCore if it wants to (this would be known as Embedded Solr)
>
> : And when moving from a single to multi core, how much extra configuration
> is
> : needed?  On the one extreme, do multiple cores require a completely
> separate
> : instance dir, data dir and solrconfig?  Or on the other end of the
> spectrum,
>
> again it depends on wether you are talking about the Solr application, or
> about embedding SOlr is another applicaiton.  In the Solr app you need a
> solr.xml file containing info about the multiple cores you want to run, or
> at least indicating that you want to run multiple cores and then you can
> create them at run time using the CoreAdmin commands.
>
> if you embed solr then you can declare cores progromaticly in java.
>
> as for wether you need seperate instanceDir, data dir, and solrconfig ...
> you definitely need seperate datadirs, but you can get away with reusing
> the same config files if you actuall ywant the SolrCores to all have
> identical configs -- if you wnat them to be different, they need to be
> different (obviously)
>
>
>
> -Hoss
>
>

Re: Complete discussion of Solr "cores" ? (beyond wiki?)

Posted by Chris Hostetter <ho...@fucit.org>.
: Date: Mon, 1 Jun 2009 10:33:50 -0700
: From: Mark Bennett
: Subject: Complete discussion of Solr "cores" ? (beyond wiki?)

Mark: catching up on my mail, i don't see much discussion arround your 
various questions.

: However, even after re-reading the wiki entries, I'm still not clear what is
: meant by "core".  The info I find talks about administering multiple cores,

I've attempted to clarify that a bit by adding an entry to the terminology 
page...

http://wiki.apache.org/solr/SolrTerminology

>> Solr Core: Also referred to as just a "Core" This is a running instance 
>> of a Solr index along with all of it's configuration (SolrConfigXml, 
>> SchemaXml, etc...). A single Solr application can contain 0 or more 
>> cores which are run largely in isolation but can communicate with each 
>> other if necessary via the CoreContainer. From a historical 
>> perspective: Solr initially only supported one index, and the SolrCore 
>> class was a singleton for coordinating the low level functionality at 
>> the "core" of Solr. When support was added for creating and managing 
>> multiple Cores on the fly, the class was refactored to no longer be a 
>> Singleton, but the name stuck. 

: But, for example, is an instance directory with a valid solrconfig.xml and
: schema.xml a valid singular core?  Or do you then need to register a core

An instanceDir is neccessary to create a SolrCore, but a SolrCore is a 
running java object that manages access to an index (via request handlers, 
and field types etc...) based on those configs.

: with an arbitrary name?  For example, getCoreNames() gives an empty array,
: even after it's loaded a valid instance dir / config / schema.

...hmmm. can you give us a more concrete example of how you are seeing 
this.  (code, configs, etc...).  That may be the expected situation when 
running in the legacy "single core" mode (ie: no solr.xml file) but i'm 
not 100% certain.

: And does a core REQUIRE a process to come up on a port, even for an instant,
: or can you do some quick tests from the command line with a "static" core
: and not need to bind to a port?

: And can search operations be done "statically", without the use of TCP/IP
: port, like Lucene can?  Do cores support this?

a Solr Core is really a java run time concept .. typically Solr Cores 
existing the Solr app -- which is a webapp, living in a servlet container, 
running on a part -- but any java program can use a CoreContainer to bring 
up a SolrCore if it wants to (this would be known as Embedded Solr)

: And when moving from a single to multi core, how much extra configuration is
: needed?  On the one extreme, do multiple cores require a completely separate
: instance dir, data dir and solrconfig?  Or on the other end of the spectrum,

again it depends on wether you are talking about the Solr application, or 
about embedding SOlr is another applicaiton.  In the Solr app you need a 
solr.xml file containing info about the multiple cores you want to run, or 
at least indicating that you want to run multiple cores and then you can 
create them at run time using the CoreAdmin commands. 

if you embed solr then you can declare cores progromaticly in java.

as for wether you need seperate instanceDir, data dir, and solrconfig ... 
you definitely need seperate datadirs, but you can get away with reusing 
the same config files if you actuall ywant the SolrCores to all have 
identical configs -- if you wnat them to be different, they need to be 
different (obviously)



-Hoss