You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rob Veliz <ro...@mavenbridge.com> on 2013/11/07 07:38:09 UTC

Multi-core support for indexing multiple servers

Trying to find specific information to support the following scenario:

- I have one site running on one server with marketing content, blog, etc.
I want to index.
- I have another site running on Magento on a different server with
ecommerce content (products).
- Both servers live in completely different environments.
- I would like to create one single search index between both sites and
make that index searchable from both sites.

I think I can/should use the multi-core approach and spin off a new server
to host Solr but can anyone verify this is the best/most appropriate
approach?  Are there any other details I need to consider?  Can anyone
provide a step by step for making this happen to validate my own technical
plan?  Any help appreciated...was initially thinking I needed SolrCloud but
that seems like overkill for my primary use case.

Re: Multi-core support for indexing multiple servers

Posted by Liu Bo <di...@gmail.com>.

As far as I know about magento, it's DB schema is designed for extensible
property storage and relationships between db tables are kind of complex.

Product has its attribute sets and properties which are stored in different
tables. Configurable product may have different attribute values for each
of it's sub simple products.

Handle relationship like this in DIH won't be easy, especially when you
want to group attributes of a configurable product into one document.

But if you just need to search on name and description but not other
attributes, you can try write DIH on catalog_product_flat_x tables, magento
may have several of them.

We used to use lucene core to provide search on magento products, what we
do is using SOAP service provided by magento to get products, and then
converting them to lucene document. Indexes are updated daily. This hides
lots of magento implementation details but it's kind of slow.




On 12 November 2013 22:41, Robert Veliz <ro...@mavenbridge.com> wrote:

> I have two sources/servers--one of them is Magento. Since Magento has a
> more or less out of the box integration with Solr, my thought was to run
> Solr server from the Magento instance and then use DIH to get/merge content
> from the other source/server. Seem feasible/appropriate?  I spec'd it out
> and it seems to make sense...
>
> R
>
> > On Nov 11, 2013, at 11:25 PM, Liu Bo <di...@gmail.com> wrote:
> >
> > like Erick said, merge data from different datasource could be very
> > difficult, SolrJ is much easier to use but may need another application
> to
> > do handle index process if you don't want to extends solr much.
> >
> > I eventually end up with a customized request handler which use
> SolrWriter
> > from DIH package to index data,
> >
> > So that I can fully control the index process, quite like SolrJ, you can
> > write code to convert your data into SolrInputDocument, and then post
> them
> > to SolrWriter, SolrWriter will handles the rest stuff.
> >
> >
> >> On 8 November 2013 21:46, Erick Erickson <er...@gmail.com>
> wrote:
> >>
> >> Yep, you can define multiple data sources for use with DIH.
> >>
> >> Combining data from those multiple sources into a single
> >> index can be a bit tricky with DIH, personally I tend to prefer
> >> SolrJ, but that's mostly personal preference, especially if
> >> I want to get some parallelism going on.
> >>
> >> But whatever works
> >>
> >> Erick
> >>
> >>
> >> On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 <manju16832003@gmail.com
> >>> wrote:
> >>
> >>> Eric,
> >>> Just a question :-), wouldn't it be easy to use DIH to pull data from
> >>> multiple data sources.
> >>>
> >>> I do use DIH to do that comfortably. I have three data sources
> >>> - MySQL
> >>> - URLDataSource that returns XML from an .NET application
> >>> - URLDataSource that connects to an API and return XML
> >>>
> >>> Here is part of data-config data source settings
> >>> <dataSource type="JdbcDataSource" name="solr"
> >>> driver="com.mysql.jdbc.Driver"
> >>> url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
> >>> password="root"/>
> >>>       <dataSource name="CRMServer" type="URLDataSource"
> encoding="UTF-8"
> >>> connectionTimeout="5000" readTimeout="10000"/>
> >>>       <dataSource name="ImageServer" type="URLDataSource"
> >> encoding="UTF-8"
> >>> connectionTimeout="5000" readTimeout="10000"/>
> >>>
> >>>
> >>> Of course, in application I do the same.
> >>> To construct my results, I do connect to MySQL and those two data
> >> sources.
> >>>
> >>> Basically we have two point of indexing
> >>> - Using DIH at one time indexing
> >>> - At application whenever there is transaction to the details that we
> >> are
> >>> storing in Solr.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> > --
> > All the best
> >
> > Liu Bo
>



-- 
All the best

Liu Bo

Re: Multi-core support for indexing multiple servers

Posted by Robert Veliz <ro...@mavenbridge.com>.

I have two sources/servers--one of them is Magento. Since Magento has a more or less out of the box integration with Solr, my thought was to run Solr server from the Magento instance and then use DIH to get/merge content from the other source/server. Seem feasible/appropriate?  I spec'd it out and it seems to make sense...

R

> On Nov 11, 2013, at 11:25 PM, Liu Bo <di...@gmail.com> wrote:
> 
> like Erick said, merge data from different datasource could be very
> difficult, SolrJ is much easier to use but may need another application to
> do handle index process if you don't want to extends solr much.
> 
> I eventually end up with a customized request handler which use SolrWriter
> from DIH package to index data,
> 
> So that I can fully control the index process, quite like SolrJ, you can
> write code to convert your data into SolrInputDocument, and then post them
> to SolrWriter, SolrWriter will handles the rest stuff.
> 
> 
>> On 8 November 2013 21:46, Erick Erickson <er...@gmail.com> wrote:
>> 
>> Yep, you can define multiple data sources for use with DIH.
>> 
>> Combining data from those multiple sources into a single
>> index can be a bit tricky with DIH, personally I tend to prefer
>> SolrJ, but that's mostly personal preference, especially if
>> I want to get some parallelism going on.
>> 
>> But whatever works
>> 
>> Erick
>> 
>> 
>> On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 <manju16832003@gmail.com
>>> wrote:
>> 
>>> Eric,
>>> Just a question :-), wouldn't it be easy to use DIH to pull data from
>>> multiple data sources.
>>> 
>>> I do use DIH to do that comfortably. I have three data sources
>>> - MySQL
>>> - URLDataSource that returns XML from an .NET application
>>> - URLDataSource that connects to an API and return XML
>>> 
>>> Here is part of data-config data source settings
>>> <dataSource type="JdbcDataSource" name="solr"
>>> driver="com.mysql.jdbc.Driver"
>>> url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
>>> password="root"/>
>>>       <dataSource name="CRMServer" type="URLDataSource" encoding="UTF-8"
>>> connectionTimeout="5000" readTimeout="10000"/>
>>>       <dataSource name="ImageServer" type="URLDataSource"
>> encoding="UTF-8"
>>> connectionTimeout="5000" readTimeout="10000"/>
>>> 
>>> 
>>> Of course, in application I do the same.
>>> To construct my results, I do connect to MySQL and those two data
>> sources.
>>> 
>>> Basically we have two point of indexing
>>> - Using DIH at one time indexing
>>> - At application whenever there is transaction to the details that we
>> are
>>> storing in Solr.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> All the best
> 
> Liu Bo

Re: Multi-core support for indexing multiple servers

Posted by Liu Bo <di...@gmail.com>.

like Erick said, merge data from different datasource could be very
difficult, SolrJ is much easier to use but may need another application to
do handle index process if you don't want to extends solr much.

I eventually end up with a customized request handler which use SolrWriter
from DIH package to index data,

So that I can fully control the index process, quite like SolrJ, you can
write code to convert your data into SolrInputDocument, and then post them
to SolrWriter, SolrWriter will handles the rest stuff.


On 8 November 2013 21:46, Erick Erickson <er...@gmail.com> wrote:

> Yep, you can define multiple data sources for use with DIH.
>
> Combining data from those multiple sources into a single
> index can be a bit tricky with DIH, personally I tend to prefer
> SolrJ, but that's mostly personal preference, especially if
> I want to get some parallelism going on.
>
> But whatever works
>
> Erick
>
>
> On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 <manju16832003@gmail.com
> >wrote:
>
> > Eric,
> > Just a question :-), wouldn't it be easy to use DIH to pull data from
> > multiple data sources.
> >
> > I do use DIH to do that comfortably. I have three data sources
> >  - MySQL
> >  - URLDataSource that returns XML from an .NET application
> >  - URLDataSource that connects to an API and return XML
> >
> > Here is part of data-config data source settings
> > <dataSource type="JdbcDataSource" name="solr"
> > driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
> > password="root"/>
> >        <dataSource name="CRMServer" type="URLDataSource" encoding="UTF-8"
> > connectionTimeout="5000" readTimeout="10000"/>
> >        <dataSource name="ImageServer" type="URLDataSource"
> encoding="UTF-8"
> > connectionTimeout="5000" readTimeout="10000"/>
> >
> >
> > Of course, in application I do the same.
> > To construct my results, I do connect to MySQL and those two data
> sources.
> >
> > Basically we have two point of indexing
> >  - Using DIH at one time indexing
> >  - At application whenever there is transaction to the details that we
> are
> > storing in Solr.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
All the best

Liu Bo

Re: Multi-core support for indexing multiple servers

Posted by Erick Erickson <er...@gmail.com>.

Yep, you can define multiple data sources for use with DIH.

Combining data from those multiple sources into a single
index can be a bit tricky with DIH, personally I tend to prefer
SolrJ, but that's mostly personal preference, especially if
I want to get some parallelism going on.

But whatever works

Erick


On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 <ma...@gmail.com>wrote:

> Eric,
> Just a question :-), wouldn't it be easy to use DIH to pull data from
> multiple data sources.
>
> I do use DIH to do that comfortably. I have three data sources
>  - MySQL
>  - URLDataSource that returns XML from an .NET application
>  - URLDataSource that connects to an API and return XML
>
> Here is part of data-config data source settings
> <dataSource type="JdbcDataSource" name="solr"
> driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
> password="root"/>
>        <dataSource name="CRMServer" type="URLDataSource" encoding="UTF-8"
> connectionTimeout="5000" readTimeout="10000"/>
>        <dataSource name="ImageServer" type="URLDataSource" encoding="UTF-8"
> connectionTimeout="5000" readTimeout="10000"/>
>
>
> Of course, in application I do the same.
> To construct my results, I do connect to MySQL and those two data sources.
>
> Basically we have two point of indexing
>  - Using DIH at one time indexing
>  - At application whenever there is transaction to the details that we are
> storing in Solr.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Multi-core support for indexing multiple servers

Posted by manju16832003 <ma...@gmail.com>.

Eric,
Just a question :-), wouldn't it be easy to use DIH to pull data from
multiple data sources.

I do use DIH to do that comfortably. I have three data sources
 - MySQL
 - URLDataSource that returns XML from an .NET application
 - URLDataSource that connects to an API and return XML

Here is part of data-config data source settings
<dataSource type="JdbcDataSource" name="solr" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/employeeDB" batchSize="-1" user="root"
password="root"/>
       <dataSource name="CRMServer" type="URLDataSource" encoding="UTF-8"
connectionTimeout="5000" readTimeout="10000"/>
       <dataSource name="ImageServer" type="URLDataSource" encoding="UTF-8"
connectionTimeout="5000" readTimeout="10000"/>
       

Of course, in application I do the same.
To construct my results, I do connect to MySQL and those two data sources.

Basically we have two point of indexing
 - Using DIH at one time indexing
 - At application whenever there is transaction to the details that we are
storing in Solr.





--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-core support for indexing multiple servers

Posted by Erick Erickson <er...@gmail.com>.

Rob:

What I think you're missing is that you are responsible
for pulling the data from your separate sources and
pushing it to solr via an update command. You can
do this in SolrJ, PHP, or any other package that supports
a Solr client. You simply address your requests (both
update and query) to the right core on your central server,
e.g. http://myserver:8983/solr/core/update
      http://myserver:8983/solr/othercore/update

etc.

s far as using DIH is concerned, you have to supply
credentials, including the connection URL in the database
case.

But I think you're right, you're not using multiple cores.
You'll probably have to write something (I use SolrJ) that
can talk to your two data sources, then combine the
information into Solr documents and push them to your
Solr server.

>From there, querying is usually fronted by an application
and the indexes are entirely self-contained on the Solr
server so no "reaching out" is necessary.

Best,
Erick


On Thu, Nov 7, 2013 at 3:50 AM, manju16832003 <ma...@gmail.com>wrote:

> Hi Rob,
> mlti-core approach is different. You could have two cares lets say
> marketing-core [Has its own schema.xml and data-config.xml]
> magento-core [Has its own schema.xml and data-config.xml]
>
> each core have their own schema.xml and data-config.xml
> If you go by multi-core approach I guess you won't be able to achieve what
> you described or what you needed. You can
> query across two cores but that is expensive and tedious.
>
> The one you explained with having document type is just single core (Single
> index) and you differentiate each
> document by their type
>
> lets say document_type=marketing OR document_type=magento
>
> I think you could go by having single-index (Single-core) with
> document_type
> as differentiator.
>
> Also note that if you have common fields between two databases, you don't
> need to re-define those fields.
> You can make use of the same field for two databases.
>
> Lets say you have field 'title' in marketing database and magento database.
> You could have one 'title' field defined
> in schema.xml, no need to define two title fields. Also carefully look at
> each fields default values in schema.xml
> Lets say you have some fields in marketing database and those fields does
> not exists in magento db. When your done
> with indexing, if the fields does not have values they will not show up in
> the result. If you want it that way you
> don't need to define default="". If you still want to appear the field
> regardless of data or no data you would have
> to mention default=""
> Ex:
> <field name="year" type="int" indexed="true" stored="true"
> multiValued="false" <b>default=""*/>
>
> To index two databases together, you can try with DataImportHandler.
> In DataImportHandler you can query multiple data sources. Good thing about
> DataImportHandler is that your datasource
> could be data bases (MySQL, MS-SQL, etc), URLDataSource etc.
>
> Hope that is helpful
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099746.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Multi-core support for indexing multiple servers

Posted by manju16832003 <ma...@gmail.com>.

Hi Rob,
mlti-core approach is different. You could have two cares lets say
marketing-core [Has its own schema.xml and data-config.xml]
magento-core [Has its own schema.xml and data-config.xml]

each core have their own schema.xml and data-config.xml
If you go by multi-core approach I guess you won't be able to achieve what
you described or what you needed. You can
query across two cores but that is expensive and tedious.

The one you explained with having document type is just single core (Single
index) and you differentiate each
document by their type

lets say document_type=marketing OR document_type=magento

I think you could go by having single-index (Single-core) with document_type
as differentiator.

Also note that if you have common fields between two databases, you don't
need to re-define those fields.
You can make use of the same field for two databases.

Lets say you have field 'title' in marketing database and magento database.
You could have one 'title' field defined
in schema.xml, no need to define two title fields. Also carefully look at
each fields default values in schema.xml
Lets say you have some fields in marketing database and those fields does
not exists in magento db. When your done
with indexing, if the fields does not have values they will not show up in
the result. If you want it that way you
don't need to define default="". If you still want to appear the field
regardless of data or no data you would have
to mention default=""
Ex:
<field name="year" type="int" indexed="true" stored="true"
multiValued="false" <b>default=""*/>

To index two databases together, you can try with DataImportHandler.
In DataImportHandler you can query multiple data sources. Good thing about
DataImportHandler is that your datasource
could be data bases (MySQL, MS-SQL, etc), URLDataSource etc.

Hope that is helpful

--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099746.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-core support for indexing multiple servers

Posted by Rob Veliz <ro...@mavenbridge.com>.

I've been reading about Solarium--definitely useful.  Could you elaborate
here:

If you are planning a single master index, that's not multicore.  Having
more than one document type in a single index is possible, they just
have to overlap on at least one field - whatever field is the uniqueKey
for the index.

What I'm trying to do is index marketing pages from one server AND index
product pages from a different ecommerce server and then combine those
results into a single index, so when I search for "foo" from either site, I
get the exact same results for "foo".  If that's not multi-core, what's the
right approach to accomplish this?


On Wed, Nov 6, 2013 at 11:29 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 11/7/2013 12:07 AM, Rob Veliz wrote:
> > Great feedback, thanks.  So the multi-core structure I have then is a
> > single Solr server set up, essentially hosted by one domain owner (but to
> > be used by both).  My question is how does that Solr server connect to
> the
> > 2 Web applications to create the 1 master index (to be used when
> searching
> > on either Web app)?  It feels like I just reference the Solr server from
> > within the Web app search templates (e.g. PHP files).  That is logical in
> > terms of pulling the data into the Web apps, but it's still not clear to
> me
> > how the data from those 2 Web apps actually gets into the Solr server if
> > Solr server doesn't live on the same server as the Web app(s).  Any
> > thoughts?
>
> Solr uses HTTP calls.  It is REST-like, though there has been some
> recent work to make parts of it actually use true REST, that paradigm
> might later be extended to the entire interface.
>
> There are a number of Solr API packages for PHP that give you an
> obect-oriented interface to Solr that won't require learning Solr's HTTP
> interface - you write PHP code to access Solr.  These are two of them
> that I have heard about.  I've not actually used these, as I have little
> personal experience with writing PHP:
>
> http://pecl.php.net/package/solr
> http://www.solarium-project.org/
>
> If you are planning a single master index, that's not multicore.  Having
> more than one document type in a single index is possible, they just
> have to overlap on at least one field - whatever field is the uniqueKey
> for the index.
>
> Thanks,
> Shawn
>
>


-- 
*Rob Veliz*, Founder | *Mavenbridge* | robert@mavenbridge.com | M: +1 (206)
909 - 3490

Follow us at: http://twitter.com/mavenbridge

Re: Multi-core support for indexing multiple servers

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/7/2013 12:07 AM, Rob Veliz wrote:
> Great feedback, thanks.  So the multi-core structure I have then is a
> single Solr server set up, essentially hosted by one domain owner (but to
> be used by both).  My question is how does that Solr server connect to the
> 2 Web applications to create the 1 master index (to be used when searching
> on either Web app)?  It feels like I just reference the Solr server from
> within the Web app search templates (e.g. PHP files).  That is logical in
> terms of pulling the data into the Web apps, but it's still not clear to me
> how the data from those 2 Web apps actually gets into the Solr server if
> Solr server doesn't live on the same server as the Web app(s).  Any
> thoughts?

Solr uses HTTP calls.  It is REST-like, though there has been some
recent work to make parts of it actually use true REST, that paradigm
might later be extended to the entire interface.

There are a number of Solr API packages for PHP that give you an
obect-oriented interface to Solr that won't require learning Solr's HTTP
interface - you write PHP code to access Solr.  These are two of them
that I have heard about.  I've not actually used these, as I have little
personal experience with writing PHP:

http://pecl.php.net/package/solr
http://www.solarium-project.org/

If you are planning a single master index, that's not multicore.  Having
more than one document type in a single index is possible, they just
have to overlap on at least one field - whatever field is the uniqueKey
for the index.

Thanks,
Shawn

Re: Multi-core support for indexing multiple servers

Posted by Rob Veliz <ro...@mavenbridge.com>.

Great feedback, thanks.  So the multi-core structure I have then is a
single Solr server set up, essentially hosted by one domain owner (but to
be used by both).  My question is how does that Solr server connect to the
2 Web applications to create the 1 master index (to be used when searching
on either Web app)?  It feels like I just reference the Solr server from
within the Web app search templates (e.g. PHP files).  That is logical in
terms of pulling the data into the Web apps, but it's still not clear to me
how the data from those 2 Web apps actually gets into the Solr server if
Solr server doesn't live on the same server as the Web app(s).  Any
thoughts?


On Wed, Nov 6, 2013 at 10:57 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 11/6/2013 11:38 PM, Rob Veliz wrote:
> > Trying to find specific information to support the following scenario:
> >
> > - I have one site running on one server with marketing content, blog,
> etc.
> > I want to index.
> > - I have another site running on Magento on a different server with
> > ecommerce content (products).
> > - Both servers live in completely different environments.
> > - I would like to create one single search index between both sites and
> > make that index searchable from both sites.
> >
> > I think I can/should use the multi-core approach and spin off a new
> server
> > to host Solr but can anyone verify this is the best/most appropriate
> > approach?  Are there any other details I need to consider?  Can anyone
> > provide a step by step for making this happen to validate my own
> technical
> > plan?  Any help appreciated...was initially thinking I needed SolrCloud
> but
> > that seems like overkill for my primary use case.
>
> SolrCloud makes for *easy* redundancy.  There is a three-server minimum
> if you want it to be fault-tolerant for both Solr and Zookeeper.  The
> third server would only run zookeeper and could be an extremely
> inexpensive machine.  The other two servers would run both Solr and
> Zookeeper.  Redundancy without cloud is possible, it's just not as
> automated, and can be done with two servers.
>
> It is highly recommended that redundant servers are not separated
> geographically.  This is especially important with SolrCloud, as
> Zookeeper redundancy requires that a majority of the servers be
> operational.  That can be extremely difficult to guarantee in a
> multi-datacenter model, if one assumes that an entire datacenter can
> disappear from the network.
>
> If you don't care about redundancy, then you'd just run a single server,
> and SolrCloud wouldn't provide much benefit.
>
> Multiple cores is a good way to go -- the two indexes would be logically
> separate, but you'd be able to use either one.  With SolrCloud, it would
> be multiple collections.
>
> Thanks,
> Shawn
>
>


-- 
*Rob Veliz*, Founder | *Mavenbridge* | robert@mavenbridge.com | M: +1 (206)
909 - 3490

Follow us at: http://twitter.com/mavenbridge

Re: Multi-core support for indexing multiple servers

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/6/2013 11:38 PM, Rob Veliz wrote:
> Trying to find specific information to support the following scenario:
> 
> - I have one site running on one server with marketing content, blog, etc.
> I want to index.
> - I have another site running on Magento on a different server with
> ecommerce content (products).
> - Both servers live in completely different environments.
> - I would like to create one single search index between both sites and
> make that index searchable from both sites.
> 
> I think I can/should use the multi-core approach and spin off a new server
> to host Solr but can anyone verify this is the best/most appropriate
> approach?  Are there any other details I need to consider?  Can anyone
> provide a step by step for making this happen to validate my own technical
> plan?  Any help appreciated...was initially thinking I needed SolrCloud but
> that seems like overkill for my primary use case.

SolrCloud makes for *easy* redundancy.  There is a three-server minimum
if you want it to be fault-tolerant for both Solr and Zookeeper.  The
third server would only run zookeeper and could be an extremely
inexpensive machine.  The other two servers would run both Solr and
Zookeeper.  Redundancy without cloud is possible, it's just not as
automated, and can be done with two servers.

It is highly recommended that redundant servers are not separated
geographically.  This is especially important with SolrCloud, as
Zookeeper redundancy requires that a majority of the servers be
operational.  That can be extremely difficult to guarantee in a
multi-datacenter model, if one assumes that an entire datacenter can
disappear from the network.

If you don't care about redundancy, then you'd just run a single server,
and SolrCloud wouldn't provide much benefit.

Multiple cores is a good way to go -- the two indexes would be logically
separate, but you'd be able to use either one.  With SolrCloud, it would
be multiple collections.

Thanks,
Shawn