You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mi...@materna.de on 2013/05/10 17:14:58 UTC

Sharing index data between two Solr instances

Hello together!

I've been googleing on this topic but still couldn't find a definitive answer to my question.

We have a setup of two machines both running Solr 4.2 within Tomcat. We are considering sharing the index data between both webapps. One of the machines will be configured to update the index periodically, the other one will be accessing it read-only. Using native locking on a network-mounted NTFS, is it possible for the reader to detect when new index data has been imported or do we need to signal it from the updating webapp and make a commit in order to open a new reader with the updated content?

Thanks in advance!

Milen Tilev
Master of Science
Softwareentwickler
Business Unit Information
________________________________________________

MATERNA GmbH
Information & Communications

Voßkuhle 37
44141 Dortmund
Deutschland

Telefon: +49 231 5599-8257
Fax: +49 231 5599-98257
E-Mail: milen.tilev@materna.de<ma...@materna.de>

www.materna.de<http://www.materna.de/> | Newsletter<http://www.materna.de/newsletter> | Twitter<http://twitter.com/MATERNA_GmbH> | XING<http://www.xing.com/companies/MATERNAGMBH> | Facebook<http://www.facebook.com/maternagmbh>
________________________________________________

Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
Amtsgericht Dortmund HRB 5839


AW: Sharing index data between two Solr instances

Posted by Mi...@materna.de.
Hello Peter,

thanks for Your answer!

Indeed, we are considering replication as an alternative design. However, we are appealed by the idea of having the updated index almost instantly after successful import. Decision is still pending, though.

Thanks and kind regards!

Milen


-----Ursprüngliche Nachricht-----
Von: Peter Sturge [mailto:peter.sturge@gmail.com] 
Gesendet: Freitag, 10. Mai 2013 17:42
An: solr-user@lucene.apache.org
Betreff: Re: Sharing index data between two Solr instances

Hello Milen,

We do something very similar to this, except we use separate processes on the same machine for the writer and reader. We do this so we can tune caches etc. to optimize for each, and still use the same index files. On MP machines, this works very well.
If you've got 2 separate machines, I would have thought replication would be the way to go, as it performs the necessary syncronization for you.
If you do share the same index files between 2 instances, you need to be aware of locking/contention issues (which it sounds like you are aware), and if they're on separate machines, you'll likely need some superfast shared disk channel (FC SAN or similar) to keep performance up (in our experience, Solr works best with fast local-attached storage - e.g. SSD or 15k SAS drives rather than SAN, and definitely not iSCSI or NAS). In order for the read-only instance to take the changes made by the writing instance, it will need to do an empty commit (i.e. no docs to commit - just auto-warming caches, readers etc.).
For us, as our writer is constantly writing, we do a timed refresh on the read-only instance, but for separate machines you could use an rpc mechanism between the two instances. - again though, replication already does all this. Have you considered using replication?

Thanks,
Peter



On Fri, May 10, 2013 at 4:14 PM, <Mi...@materna.de> wrote:

> Hello together!
>
> I've been googleing on this topic but still couldn't find a definitive 
> answer to my question.
>
> We have a setup of two machines both running Solr 4.2 within Tomcat. 
> We are considering sharing the index data between both webapps. One of 
> the machines will be configured to update the index periodically, the 
> other one will be accessing it read-only. Using native locking on a 
> network-mounted NTFS, is it possible for the reader to detect when new 
> index data has been imported or do we need to signal it from the 
> updating webapp and make a commit in order to open a new reader with the updated content?
>
> Thanks in advance!
>
> Milen Tilev
> Master of Science
> Softwareentwickler
> Business Unit Information
> ________________________________________________
>
> MATERNA GmbH
> Information & Communications
>
> Voßkuhle 37
> 44141 Dortmund
> Deutschland
>
> Telefon: +49 231 5599-8257
> Fax: +49 231 5599-98257
> E-Mail: milen.tilev@materna.de<ma...@materna.de>
>
> www.materna.de<http://www.materna.de/> | Newsletter< 
> http://www.materna.de/newsletter> | Twitter< 
> http://twitter.com/MATERNA_GmbH> | XING< 
> http://www.xing.com/companies/MATERNAGMBH> | Facebook< 
> http://www.facebook.com/maternagmbh>
> ________________________________________________
>
> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph 
> Hartwig Amtsgericht Dortmund HRB 5839
>
>

Re: Sharing index data between two Solr instances

Posted by Peter Sturge <pe...@gmail.com>.
Hello Milen,

We do something very similar to this, except we use separate processes on
the same machine for the writer and reader. We do this so we can tune
caches etc. to optimize for each, and still use the same index files. On MP
machines, this works very well.
If you've got 2 separate machines, I would have thought replication would
be the way to go, as it performs the necessary syncronization for you.
If you do share the same index files between 2 instances, you need to be
aware of locking/contention issues (which it sounds like you are aware),
and if they're on separate machines, you'll likely need some superfast
shared disk channel (FC SAN or similar) to keep performance up (in our
experience, Solr works best with fast local-attached storage - e.g. SSD or
15k SAS drives rather than SAN, and definitely not iSCSI or NAS). In order
for the read-only instance to take the changes made by the writing
instance, it will need to do an empty commit (i.e. no docs to commit - just
auto-warming caches, readers etc.).
For us, as our writer is constantly writing, we do a timed refresh on the
read-only instance, but for separate machines you could use an rpc
mechanism between the two instances. - again though, replication already
does all this. Have you considered using replication?

Thanks,
Peter



On Fri, May 10, 2013 at 4:14 PM, <Mi...@materna.de> wrote:

> Hello together!
>
> I've been googleing on this topic but still couldn't find a definitive
> answer to my question.
>
> We have a setup of two machines both running Solr 4.2 within Tomcat. We
> are considering sharing the index data between both webapps. One of the
> machines will be configured to update the index periodically, the other one
> will be accessing it read-only. Using native locking on a network-mounted
> NTFS, is it possible for the reader to detect when new index data has been
> imported or do we need to signal it from the updating webapp and make a
> commit in order to open a new reader with the updated content?
>
> Thanks in advance!
>
> Milen Tilev
> Master of Science
> Softwareentwickler
> Business Unit Information
> ________________________________________________
>
> MATERNA GmbH
> Information & Communications
>
> Voßkuhle 37
> 44141 Dortmund
> Deutschland
>
> Telefon: +49 231 5599-8257
> Fax: +49 231 5599-98257
> E-Mail: milen.tilev@materna.de<ma...@materna.de>
>
> www.materna.de<http://www.materna.de/> | Newsletter<
> http://www.materna.de/newsletter> | Twitter<
> http://twitter.com/MATERNA_GmbH> | XING<
> http://www.xing.com/companies/MATERNAGMBH> | Facebook<
> http://www.facebook.com/maternagmbh>
> ________________________________________________
>
> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
> Amtsgericht Dortmund HRB 5839
>
>

AW: Sharing index data between two Solr instances

Posted by Mi...@materna.de.
Jason,

Again many thanks for Your response!

Best reards,

Milen

________________________________________
Von: Jason Hellman [jhellman@innoventsolutions.com]
Gesendet: Freitag, 10. Mai 2013 20:11
An: solr-user@lucene.apache.org
Betreff: Re: Sharing index data between two Solr instances

Milen,

It is possible to have the configuration shared amongst multiple cores, I have seen this…though I haven't seen multiple separate instances share the same solr.xml core configuration (and, for that matter, separate possible locking policies).  It might work.

Honestly, I don't like it.  Your config is not likely changing often, and keeping these in sync should be relatively trivial for your data ingestion delegate.

But all of this is what replication does for you.  Of course, as you note, there is latency…and as such you may wish to consider SolrCloud instead.  Or a NRT (non SolrCloud) configuration.  You have a lot of options!  But the replication master/slave behavior is rock solid and does nearly everything you seek.

Jason

On May 10, 2013, at 8:40 AM, Milen.Tilev@materna.de wrote:

> Hello Jason,
>
> Thanks for Your quick response! The alternative of using the Solr replication is also still pending at this point, so we will consider its pros and cons, too.
>
> Fortunately, we are not using AutoCommit in our project, as we need to control the creation of new segments, so I will propose to my colleagues that we issue a manual commit on the read-only node after successful index update.
>
> Just one more question: would it be possible in this case to use the same solrhome/conf directory (shared schema and solrconfig) and solr.xml file within both webapps? I guess we should then signal the read-only side each time the solr.xml has changed (additional cores may be added by the updating machine depending on the imported data).
>
> Thanks again and best regards!
>
> Milen
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jason Hellman [mailto:jhellman@innoventsolutions.com]
> Gesendet: Freitag, 10. Mai 2013 17:30
> An: solr-user@lucene.apache.org
> Betreff: Re: Sharing index data between two Solr instances
>
> Milen,
>
> At some point you'll need to call a commit to search your data, either via AutoCommit policy or deterministically.  There are various schools of though on which way to go but something needs to do this.
>
> If you go the AutoCommit route be sure to pay attention to the openSearcher value.  The default value of false will not cause an IndexSearcher to open the new data, and there is a strong use case for this.but if you're not aware you might be caught by surprise.
>
> Once the commit fires your search process will automatically see the new data, with no interruption to its queue of queries.
>
> You may also want to consider having a Master/Slave relationship via replication for higher availability.  it is trivial to set up and works like a charm.
>
> Jason
>
>
>
> On May 10, 2013, at 8:14 AM, Milen.Tilev@materna.de wrote:
>
>> Hello together!
>>
>> I've been googleing on this topic but still couldn't find a definitive answer to my question.
>>
>> We have a setup of two machines both running Solr 4.2 within Tomcat. We are considering sharing the index data between both webapps. One of the machines will be configured to update the index periodically, the other one will be accessing it read-only. Using native locking on a network-mounted NTFS, is it possible for the reader to detect when new index data has been imported or do we need to signal it from the updating webapp and make a commit in order to open a new reader with the updated content?
>>
>> Thanks in advance!
>>
>> Milen Tilev
>> Master of Science
>> Softwareentwickler
>> Business Unit Information
>> ________________________________________________
>>
>> MATERNA GmbH
>> Information & Communications
>>
>> Voßkuhle 37
>> 44141 Dortmund
>> Deutschland
>>
>> Telefon: +49 231 5599-8257
>> Fax: +49 231 5599-98257
>> E-Mail: milen.tilev@materna.de<ma...@materna.de>
>>
>> www.materna.de<http://www.materna.de/> |
>> Newsletter<http://www.materna.de/newsletter> |
>> Twitter<http://twitter.com/MATERNA_GmbH> |
>> XING<http://www.xing.com/companies/MATERNAGMBH> |
>> Facebook<http://www.facebook.com/maternagmbh>
>> ________________________________________________
>>
>> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
>> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph
>> Hartwig Amtsgericht Dortmund HRB 5839
>>
>


Re: Sharing index data between two Solr instances

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Milen,

It is possible to have the configuration shared amongst multiple cores, I have seen this…though I haven't seen multiple separate instances share the same solr.xml core configuration (and, for that matter, separate possible locking policies).  It might work.

Honestly, I don't like it.  Your config is not likely changing often, and keeping these in sync should be relatively trivial for your data ingestion delegate.

But all of this is what replication does for you.  Of course, as you note, there is latency…and as such you may wish to consider SolrCloud instead.  Or a NRT (non SolrCloud) configuration.  You have a lot of options!  But the replication master/slave behavior is rock solid and does nearly everything you seek.

Jason

On May 10, 2013, at 8:40 AM, Milen.Tilev@materna.de wrote:

> Hello Jason,
> 
> Thanks for Your quick response! The alternative of using the Solr replication is also still pending at this point, so we will consider its pros and cons, too.
> 
> Fortunately, we are not using AutoCommit in our project, as we need to control the creation of new segments, so I will propose to my colleagues that we issue a manual commit on the read-only node after successful index update.
> 
> Just one more question: would it be possible in this case to use the same solrhome/conf directory (shared schema and solrconfig) and solr.xml file within both webapps? I guess we should then signal the read-only side each time the solr.xml has changed (additional cores may be added by the updating machine depending on the imported data).
> 
> Thanks again and best regards!
> 
> Milen
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Jason Hellman [mailto:jhellman@innoventsolutions.com] 
> Gesendet: Freitag, 10. Mai 2013 17:30
> An: solr-user@lucene.apache.org
> Betreff: Re: Sharing index data between two Solr instances
> 
> Milen,
> 
> At some point you'll need to call a commit to search your data, either via AutoCommit policy or deterministically.  There are various schools of though on which way to go but something needs to do this.  
> 
> If you go the AutoCommit route be sure to pay attention to the openSearcher value.  The default value of false will not cause an IndexSearcher to open the new data, and there is a strong use case for this.but if you're not aware you might be caught by surprise.
> 
> Once the commit fires your search process will automatically see the new data, with no interruption to its queue of queries. 
> 
> You may also want to consider having a Master/Slave relationship via replication for higher availability.  it is trivial to set up and works like a charm.
> 
> Jason
> 
> 
> 
> On May 10, 2013, at 8:14 AM, Milen.Tilev@materna.de wrote:
> 
>> Hello together!
>> 
>> I've been googleing on this topic but still couldn't find a definitive answer to my question.
>> 
>> We have a setup of two machines both running Solr 4.2 within Tomcat. We are considering sharing the index data between both webapps. One of the machines will be configured to update the index periodically, the other one will be accessing it read-only. Using native locking on a network-mounted NTFS, is it possible for the reader to detect when new index data has been imported or do we need to signal it from the updating webapp and make a commit in order to open a new reader with the updated content?
>> 
>> Thanks in advance!
>> 
>> Milen Tilev
>> Master of Science
>> Softwareentwickler
>> Business Unit Information
>> ________________________________________________
>> 
>> MATERNA GmbH
>> Information & Communications
>> 
>> Voßkuhle 37
>> 44141 Dortmund
>> Deutschland
>> 
>> Telefon: +49 231 5599-8257
>> Fax: +49 231 5599-98257
>> E-Mail: milen.tilev@materna.de<ma...@materna.de>
>> 
>> www.materna.de<http://www.materna.de/> | 
>> Newsletter<http://www.materna.de/newsletter> | 
>> Twitter<http://twitter.com/MATERNA_GmbH> | 
>> XING<http://www.xing.com/companies/MATERNAGMBH> | 
>> Facebook<http://www.facebook.com/maternagmbh>
>> ________________________________________________
>> 
>> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
>> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph 
>> Hartwig Amtsgericht Dortmund HRB 5839
>> 
> 


AW: Sharing index data between two Solr instances

Posted by Mi...@materna.de.
Hello Jason,

Thanks for Your quick response! The alternative of using the Solr replication is also still pending at this point, so we will consider its pros and cons, too.

Fortunately, we are not using AutoCommit in our project, as we need to control the creation of new segments, so I will propose to my colleagues that we issue a manual commit on the read-only node after successful index update.

Just one more question: would it be possible in this case to use the same solrhome/conf directory (shared schema and solrconfig) and solr.xml file within both webapps? I guess we should then signal the read-only side each time the solr.xml has changed (additional cores may be added by the updating machine depending on the imported data).

Thanks again and best regards!

Milen


-----Ursprüngliche Nachricht-----
Von: Jason Hellman [mailto:jhellman@innoventsolutions.com] 
Gesendet: Freitag, 10. Mai 2013 17:30
An: solr-user@lucene.apache.org
Betreff: Re: Sharing index data between two Solr instances

Milen,

At some point you'll need to call a commit to search your data, either via AutoCommit policy or deterministically.  There are various schools of though on which way to go but something needs to do this.  

If you go the AutoCommit route be sure to pay attention to the openSearcher value.  The default value of false will not cause an IndexSearcher to open the new data, and there is a strong use case for this.but if you're not aware you might be caught by surprise.

Once the commit fires your search process will automatically see the new data, with no interruption to its queue of queries. 

You may also want to consider having a Master/Slave relationship via replication for higher availability.  it is trivial to set up and works like a charm.

Jason



On May 10, 2013, at 8:14 AM, Milen.Tilev@materna.de wrote:

> Hello together!
> 
> I've been googleing on this topic but still couldn't find a definitive answer to my question.
> 
> We have a setup of two machines both running Solr 4.2 within Tomcat. We are considering sharing the index data between both webapps. One of the machines will be configured to update the index periodically, the other one will be accessing it read-only. Using native locking on a network-mounted NTFS, is it possible for the reader to detect when new index data has been imported or do we need to signal it from the updating webapp and make a commit in order to open a new reader with the updated content?
> 
> Thanks in advance!
> 
> Milen Tilev
> Master of Science
> Softwareentwickler
> Business Unit Information
> ________________________________________________
> 
> MATERNA GmbH
> Information & Communications
> 
> Voßkuhle 37
> 44141 Dortmund
> Deutschland
> 
> Telefon: +49 231 5599-8257
> Fax: +49 231 5599-98257
> E-Mail: milen.tilev@materna.de<ma...@materna.de>
> 
> www.materna.de<http://www.materna.de/> | 
> Newsletter<http://www.materna.de/newsletter> | 
> Twitter<http://twitter.com/MATERNA_GmbH> | 
> XING<http://www.xing.com/companies/MATERNAGMBH> | 
> Facebook<http://www.facebook.com/maternagmbh>
> ________________________________________________
> 
> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph 
> Hartwig Amtsgericht Dortmund HRB 5839
> 


Re: Sharing index data between two Solr instances

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Milen,

At some point you'll need to call a commit to search your data, either via AutoCommit policy or deterministically.  There are various schools of though on which way to go but something needs to do this.  

If you go the AutoCommit route be sure to pay attention to the openSearcher value.  The default value of false will not cause an IndexSearcher to open the new data, and there is a strong use case for this…but if you're not aware you might be caught by surprise.

Once the commit fires your search process will automatically see the new data, with no interruption to its queue of queries. 

You may also want to consider having a Master/Slave relationship via replication for higher availability.  it is trivial to set up and works like a charm.

Jason



On May 10, 2013, at 8:14 AM, Milen.Tilev@materna.de wrote:

> Hello together!
> 
> I've been googleing on this topic but still couldn't find a definitive answer to my question.
> 
> We have a setup of two machines both running Solr 4.2 within Tomcat. We are considering sharing the index data between both webapps. One of the machines will be configured to update the index periodically, the other one will be accessing it read-only. Using native locking on a network-mounted NTFS, is it possible for the reader to detect when new index data has been imported or do we need to signal it from the updating webapp and make a commit in order to open a new reader with the updated content?
> 
> Thanks in advance!
> 
> Milen Tilev
> Master of Science
> Softwareentwickler
> Business Unit Information
> ________________________________________________
> 
> MATERNA GmbH
> Information & Communications
> 
> Voßkuhle 37
> 44141 Dortmund
> Deutschland
> 
> Telefon: +49 231 5599-8257
> Fax: +49 231 5599-98257
> E-Mail: milen.tilev@materna.de<ma...@materna.de>
> 
> www.materna.de<http://www.materna.de/> | Newsletter<http://www.materna.de/newsletter> | Twitter<http://twitter.com/MATERNA_GmbH> | XING<http://www.xing.com/companies/MATERNAGMBH> | Facebook<http://www.facebook.com/maternagmbh>
> ________________________________________________
> 
> Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
> Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
> Amtsgericht Dortmund HRB 5839
>