You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2012/11/02 14:05:33 UTC

Re: SolrCloud Tomcat configuration: problems and doubts.

Hello, Mark!

How are you? Thanks a lot for helping me. You were right about jetty.host
parameter. My fianl test solr.xml looks like:

*  <cores adminPath="/admin/cores" defaultCoreName="items_en"
host="localhost" hostPort="9080" hostContext="items_en">*
*    <core name="items_en" instanceDir="items_en" />*
*  </cores>*


I´ve noticed that 'hostContext' parameter was also required, so I included
it. After that corrections Cloud graph tree looks right, and executing
queries doesn' t return a 503 error. Phew! However, I checked in the Cloud
graph tree that a"collection1" appears too pointing to
http://localhost:8983/solr. I will continue testing if I missed something,
but looks like it is creating another collection with default parameters
(collection name, port) without control.

While using Apache Tomcat I was forced to include in catalina.sh (or
setenv.sh) the following environment parameters, as I told you before:

*JAVA_OPTS="-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en" *


Just three questions more:

*1.* That´s a problem for me, because I would like to deploy in each Tomcat
instance more than one Solr server with different configurations file (I
mean, differents configName parameters), so including that JAVA_OPTS forces
to me to deploy in that Tomcat server only Solr servers with this kind of
configuration. In a production environment I would like to deploy in a
single Tomcat instance at least for Solr servers, one per each kind of
documents that I will index and query to. Do you know any way to configure
the configName per each Solr server instance? Is it posible to configure it
inside solr.xml file? Also, it make sense to deploy in each Solr server a
multi-core configuration, each core with each configName allocated in
Zookeeper, but again using that kind of JAVA_OPTS on-fire params
configuration makes it impossible, :-(

*2.* The other question is about indexing. What is the best way to plain
index (I mean, without DIH or similar) in SolrCloud? Maybe configuring a
LBHttpSolrServer that decides itself what is the best Solr server instance
per indexation process?

*3.* The following question may sound strange, but... but the thing is that
I would like to help anyway in Apache Solr project contributing with code
(bugs corrections, new features, etc.). How can I contribute to the
community?

Thanks a lot.

Best Regards,


Luis Cappa.


2012/10/31 Mark Miller <ma...@gmail.com>

> A big difference if you are using tomcat is that you still need to
> specify jetty.port - unless you change the name of that sys prop in
> solr.xml.
>
> Some more below:
>
> On Wed, Oct 31, 2012 at 2:09 PM, Luis Cappa Banda <lu...@gmail.com>
> wrote:
> > Hello!
> >
> > How are you?I followed SolrCloud Wiki tutorial and noticed that all
> worked
> > perfectly with Jetty and with a very basic configuration. My first
> > impression was that SolrCloud is amazing and I´m interested on deploying
> a
> > more complex and near-production environment SolrCloud architecture with
> > tests purposes. I´m using Tomcat as application server, so I´ve started
> > testing with it.
> >
> > I´ve installed Zookeper sevice in a single machine and started up with
> the
> > following configuration:
> >
> > *1.)*
> >
> > ~zookeperhome/conf/zoo.cfg
> >
> > *tickTime=2000*
> > *initLimit=10*
> > *syncLimit=5*
> > *dataDir=~zookeperhome/data/*
> > *clientPort=9000*
> >
> > *2.) * I testing with a single core Solr server called 'items_en'. I have
> > the configuration is as follows:
> >
> > *Indexes conf/data tree*: /mnt/data-store*/solr/*
> >                                                    /solr.xml
> >                                                    /zoo.cfg
> >                                                    /items_en/
> >                                                                  /conf/
> >
> > schema.xml
> >
> > solrconfig.xml
> >
> etc.
> >
> > So we have a simple configuration where conf files and data indexes files
> > are in the same path.
> >
> > *3.)* Ok, so we have Solr server configured, but I have to save into
> > Zookeper the configuration. I do as follows:
> >
> > *./bin/zkcli.sh -cmd upconfig -zkhost 127.0.0.1:9000 -confdir *
> > /mnt/data-store/solr/*items_en/conf -collection items_en -confname
> items_en
> > *
> >
> > And seems to work perfectly, because if I use Zookeper client and
> executes
> > 'ls' command the files appear:
> >
> > *./bin/zkCli.sh -server localhost:9000
> > *
> > *
> > *
> > *[zk: localhost:9000(CONNECTED) 1] ls /configs/items_en*
> > *[admin-extra.menu-top.html, currency.xml, protwords.txt,
> > mapping-FoldToASCII.txt, solrconfig.xml, lang, spellings.txt,
> > mapping-ISOLatin1Accent.txt, admin-extra.html, xslt, scripts.conf,
> > synonyms.txt, update-script.js, velocity, elevate.xml, zoo.cfg,
> > admin-extra.menu-bottom.html, stopwords_en.txt, schema.xml]*
> > *
> > *
> > *
> > *
> > *4.) *I would like that all the Solr servers deployed in that Tomcat
> > instance points to Zookeper port 9000 service, so I included the
> following
> > JAVA_OPTS hoping that they´ll make that posible:
> >
> > *JAVA_OPTS="-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en
> > -DnumShards=2" *
> > *
> > *
> > *Question 1: suposing that JAVA_OPTS are OK, do you think there exists a
> > more flexible and less fixed way to indicate to each Solr server instance
> > which is it´s Zookeper service?*
>
> Your zkHost should actually be a comma sep list of the zk hosts. Yes,
> we hope to improve this in the future as zookeeper becomes more
> flexible.
>
> > *
> > *
> > *Question 2: can you increment the numShards later even after an
> > indexation? Example: imagine that you have millions of documents and you
> > want to expand from two to four shards and increment aswell the number of
> > Solr servers*
>
> You can't change the number of shards yet - there is an open jira
> issue for this and ongoing work. It's been called shard splitting.
>
> > *
> > *
> > *Question 3: do again suposing that JAVA_OPTS is OK (or near to be OK),
> is
> > it necessary to include always -DnumShard per each Tomcat server? Can' t
> > this confuse Zookeeper instance?*
>
> It depends on how you start your instances. The first one is the only
> one that matters - it only makes sense to specify for each instance if
> you plan on starting them all at the same time and are not sure which
> the first to register in zk will be.
>
> > *
> > *
> > *Question 4: **imagine that we have three Zookeeper instances to manage
> > config files in production environment. The parameter -DzkHost should be
> > like this? -DzkHost=host1:port1,host2:port2,host3:port3.*
>
> Yes.
>
> > *
> > *
> > *5.) *I started *Tomcat (port 8080)* with a single Solr server and
> > everything seems to be OK: there is a single core setted as 'items_en'
> and
> > Cloud button is active. The graph is a simple tree with shard1 and
> shard2.
> > Connected to shard1 is the current instance. *Also, if I execute any
> query
> > I just receive a 503 error code: "no servers hosting".*
> > *
>
> Not sure why offhand - if you are not passing jetty.port (or something
> else if you have renamed it - like tomcat.port), that will be a
> problem.
>
> > *
> > *
> > *
> > *6.) *I started another Solr server in a* second Tomcat instance (port
> > 9080). *Its Solr home is in the following path:
> >
> > *Indexes conf/data tree*: /mnt/data-store*/solr2/*
> >                                                    /solr.xml
> >                                                    /zoo.cfg
> >                                                    /items_en/
> >                                                                  /conf/
> >
> > schema.xml
> >
> > solrconfig.xml
> >
> etc.
> >
> > Notice that I have a second Solr home for this second Solr server. Again,
> > when depolying it in Tomcat the Cloud button is active, but when I
> analyze
> > the graph it appears another empty tree/shard1+shard2 graph where shard1
> is
> > Solr server instance from Tomcat 9080. What I have expected is that this
> > second Solr server instance becomes shard2, but it doesn´t. The most
> > interesting thing is that I was watching in paralallel both Tomcat1 and
> > Tomcat2 logs and they output some *"INFO: Updating live nodes"* traces,
> so
> > I thought everything was allright, but it doesn´t, :-(*
> > *
> > *
> > *
> > *Question 5: ehem... what I´m doing wrong? Can anyone help me? I just one
> > to follow the same example from SolrCloud wiki where there exists two
> > application server instances, each one with a Solr server deployed in and
> > each Solr server becomes shard1 and shard2.*
>
> I'm guessing its the jetty.port issue until you tell me otherwise.
>
> > *
> > *
> > *
> > *
> > Thank you very much for your help. At last I promise to write a detailed
> > (and for dummies, like me) step by step tutorial about how to configure
> and
> > deploy SolrCloud in Tomcat that I hope could help others.
> >
> >
> > Regards,
> >
> >
> >
> > Luis Cappa.
>
>
>
> --
> - Mark
>

Re: SolrCloud Tomcat configuration: problems and doubts.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Forward to solr-user mailing list. We forgot to reply to it, :-/

2012/11/5 Luis Cappa Banda <lu...@gmail.com>

> Hello, Mark!
>
> I´ve been testing more and more and things are going better. I have tested
> what you told me about "-Dbootstrap_conf=true" and works fine, but the
> problem is that if I include that application parameter in every Tomcat
> instance when I deploy all Solr servers each one load again all solrCore
> configurations inside Zookeeper.
>
> It should exists something like a Tomcat master server which only has the
> following parameters that defines the basic SolrCloud configuration:
>
> JAVA_OPTS="-DzkHost=127.0.0.1:9000 -DnumShards=2 -Dbootstrap_conf=true"
>
> Then the other Tomcat servers should have only:
>
> JAVA_OPTS="-DzkHost=127.0.0.1:9000"
>
>
> However, I think that is not the best way to procceed. We are at 2012,
> it´s the end of the world - God (well, one of them) is angry and attacks my
> Production environment. Imagine that all servers go down and a Monit
> service restarts them alleatory. Maybe one common Tomcat server finishes
> it´s startup faster than the named Tomcat master server, so those SolrCloud
> configuration parameters won´t be loaded at first. That´s a problem.
>
> One posibility is to write a simple script to be executed in every Tomcat
> launch execution that consists on something like:
>
> " I´m the first Tomcat and I´m launching! I´ll write a
> solrcloud.config.lock file in a well-known path (or maybe into Zookeeper)
> to announce the other Tomcats that I´ll start to load SolrCloud
> configuration files into Zookeeper. I am the Tomcat master server, so I´ll
> load* JAVA_OPTS="-DzkHost=127.0.0.1:9000 -DnumShards=2
> -Dbootstrap_conf=true"* ".
>
> " I´m a second Tomcat and I´m launching! First I check if any
> solrcloud.config.lock file exists. If exists, I simple load *
> JAVA_OPTS="-DzkHost=127.0.0.1:9000"* "
>
>
> And so on.
>
>
>
> I don´t like too much this solution because it´s not elegant and it´s very
> ad-hoc, but it works. What do you think about it? I´ve just started with
> SolrCloud four or five days ago and maybe I forget something that could
> solve this problem.
>
> Thank you very much, Mark.
>
> Regards,
>
>     Luis Cappa.
>
>
>
> 2012/11/3 Mark Miller <ma...@gmail.com>
>
>> On Fri, Nov 2, 2012 at 9:05 AM, Luis Cappa Banda <lu...@gmail.com>
>> wrote:
>> > Hello, Mark!
>> >
>> > How are you? Thanks a lot for helping me. You were right about
>> jetty.host
>> > parameter. My fianl test solr.xml looks like:
>> >
>> >   <cores adminPath="/admin/cores" defaultCoreName="items_en"
>> > host="localhost" hostPort="9080" hostContext="items_en">
>> >     <core name="items_en" instanceDir="items_en" />
>> >   </cores>
>> >
>> >
>> > I´ve noticed that 'hostContext' parameter was also required, so I
>> included
>> > it.
>>
>> It should default to /solr if you don't set it - it is there in case
>> you deploy to a different context though.
>>
>> >After that corrections Cloud graph tree looks right, and executing
>> > queries doesn' t return a 503 error. Phew! However, I checked in the
>> Cloud
>> > graph tree that a"collection1" appears too pointing to
>> > http://localhost:8983/solr. I will continue testing if I missed
>> something,
>> > but looks like it is creating another collection with default parameters
>> > (collection name, port) without control.
>>
>> It should only create what it finds in solr.xml - let me know what you
>> find.
>>
>> >
>> > While using Apache Tomcat I was forced to include in catalina.sh (or
>> > setenv.sh) the following environment parameters, as I told you before:
>> >
>> > JAVA_OPTS="-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en"
>>
>> You should only need -DzkHost= - see below.
>>
>> >
>> >
>> > Just three questions more:
>> >
>> > 1. That´s a problem for me, because I would like to deploy in each
>> Tomcat
>> > instance more than one Solr server with different configurations file (I
>> > mean, differents configName parameters), so including that JAVA_OPTS
>> forces
>> > to me to deploy in that Tomcat server only Solr servers with this kind
>> of
>> > configuration. In a production environment I would like to deploy in a
>> > single Tomcat instance at least for Solr servers, one per each kind of
>> > documents that I will index and query to. Do you know any way to
>> configure
>> > the configName per each Solr server instance? Is it posible to
>> configure it
>> > inside solr.xml file? Also, it make sense to deploy in each Solr server
>> a
>> > multi-core configuration, each core with each configName allocated in
>> > Zookeeper, but again using that kind of JAVA_OPTS on-fire params
>> > configuration makes it impossible, :-(
>>
>> That config name sys prop is not being used here - it's only used when
>> you use -Dbootstrap_confdir=<path>, and then only the first time you
>> start up.
>>
>> Collections are linked to configuration sets in ZooKeeper. If you use
>> -Dboostrap_conf=true, a special rule is used that auto links
>> collections and config sets with the same name as the collection.
>> Otherwise, you can use the ZkCLi cmd line tool to link any collectio
>> to any config in zookeeper.
>>
>>
>>
>> >
>> > 2. The other question is about indexing. What is the best way to plain
>> index
>> > (I mean, without DIH or similar) in SolrCloud? Maybe configuring a
>> > LBHttpSolrServer that decides itself what is the best Solr server
>> instance
>> > per indexation process?
>>
>> CloudSolrServer is prob you best bet. It does load balancing and knows
>> the cluster state from zookeeper.
>>
>> >
>> > 3. The following question may sound strange, but... but the thing is
>> that I
>> > would like to help anyway in Apache Solr project contributing with code
>> > (bugs corrections, new features, etc.). How can I contribute to the
>> > community?
>>
>> Create JIRA's in our issue tracking system, participate on the mailing
>> list, update our wiki, etc :)
>>
>> >
>> > Thanks a lot.
>> >
>> > Best Regards,
>> >
>> >
>> > Luis Cappa.
>> >
>> >
>> > 2012/10/31 Mark Miller <ma...@gmail.com>
>> >>
>> >> A big difference if you are using tomcat is that you still need to
>> >> specify jetty.port - unless you change the name of that sys prop in
>> >> solr.xml.
>> >>
>> >> Some more below:
>> >>
>> >> On Wed, Oct 31, 2012 at 2:09 PM, Luis Cappa Banda <luiscappa@gmail.com
>> >
>> >> wrote:
>> >> > Hello!
>> >> >
>> >> > How are you?I followed SolrCloud Wiki tutorial and noticed that all
>> >> > worked
>> >> > perfectly with Jetty and with a very basic configuration. My first
>> >> > impression was that SolrCloud is amazing and I´m interested on
>> deploying
>> >> > a
>> >> > more complex and near-production environment SolrCloud architecture
>> with
>> >> > tests purposes. I´m using Tomcat as application server, so I´ve
>> started
>> >> > testing with it.
>> >> >
>> >> > I´ve installed Zookeper sevice in a single machine and started up
>> with
>> >> > the
>> >> > following configuration:
>> >> >
>> >> > *1.)*
>> >> >
>> >> > ~zookeperhome/conf/zoo.cfg
>> >> >
>> >> > *tickTime=2000*
>> >> > *initLimit=10*
>> >> > *syncLimit=5*
>> >> > *dataDir=~zookeperhome/data/*
>> >> > *clientPort=9000*
>> >> >
>> >> > *2.) * I testing with a single core Solr server called 'items_en'. I
>> >> > have
>> >> > the configuration is as follows:
>> >> >
>> >> > *Indexes conf/data tree*: /mnt/data-store*/solr/*
>> >> >                                                    /solr.xml
>> >> >                                                    /zoo.cfg
>> >> >                                                    /items_en/
>> >> >
>>  /conf/
>> >> >
>> >> > schema.xml
>> >> >
>> >> > solrconfig.xml
>> >> >
>> >> > etc.
>> >> >
>> >> > So we have a simple configuration where conf files and data indexes
>> >> > files
>> >> > are in the same path.
>> >> >
>> >> > *3.)* Ok, so we have Solr server configured, but I have to save into
>> >> > Zookeper the configuration. I do as follows:
>> >> >
>> >> > *./bin/zkcli.sh -cmd upconfig -zkhost 127.0.0.1:9000 -confdir *
>> >> > /mnt/data-store/solr/*items_en/conf -collection items_en -confname
>> >> > items_en
>> >> > *
>> >> >
>> >> > And seems to work perfectly, because if I use Zookeper client and
>> >> > executes
>> >> > 'ls' command the files appear:
>> >> >
>> >> > *./bin/zkCli.sh -server localhost:9000
>> >> > *
>> >> > *
>> >> > *
>> >> > *[zk: localhost:9000(CONNECTED) 1] ls /configs/items_en*
>> >> > *[admin-extra.menu-top.html, currency.xml, protwords.txt,
>> >> > mapping-FoldToASCII.txt, solrconfig.xml, lang, spellings.txt,
>> >> > mapping-ISOLatin1Accent.txt, admin-extra.html, xslt, scripts.conf,
>> >> > synonyms.txt, update-script.js, velocity, elevate.xml, zoo.cfg,
>> >> > admin-extra.menu-bottom.html, stopwords_en.txt, schema.xml]*
>> >> > *
>> >> > *
>> >> > *
>> >> > *
>> >> > *4.) *I would like that all the Solr servers deployed in that Tomcat
>> >> > instance points to Zookeper port 9000 service, so I included the
>> >> > following
>> >> > JAVA_OPTS hoping that they´ll make that posible:
>> >> >
>> >> > *JAVA_OPTS="-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en
>> >> > -DnumShards=2" *
>> >> > *
>> >> > *
>> >> > *Question 1: suposing that JAVA_OPTS are OK, do you think there
>> exists a
>> >> > more flexible and less fixed way to indicate to each Solr server
>> >> > instance
>> >> > which is it´s Zookeper service?*
>> >>
>> >> Your zkHost should actually be a comma sep list of the zk hosts. Yes,
>> >> we hope to improve this in the future as zookeeper becomes more
>> >> flexible.
>> >>
>> >> > *
>> >> > *
>> >> > *Question 2: can you increment the numShards later even after an
>> >> > indexation? Example: imagine that you have millions of documents and
>> you
>> >> > want to expand from two to four shards and increment aswell the
>> number
>> >> > of
>> >> > Solr servers*
>> >>
>> >> You can't change the number of shards yet - there is an open jira
>> >> issue for this and ongoing work. It's been called shard splitting.
>> >>
>> >> > *
>> >> > *
>> >> > *Question 3: do again suposing that JAVA_OPTS is OK (or near to be
>> OK),
>> >> > is
>> >> > it necessary to include always -DnumShard per each Tomcat server?
>> Can' t
>> >> > this confuse Zookeeper instance?*
>> >>
>> >> It depends on how you start your instances. The first one is the only
>> >> one that matters - it only makes sense to specify for each instance if
>> >> you plan on starting them all at the same time and are not sure which
>> >> the first to register in zk will be.
>> >>
>> >> > *
>> >> > *
>> >> > *Question 4: **imagine that we have three Zookeeper instances to
>> manage
>> >> > config files in production environment. The parameter -DzkHost
>> should be
>> >> > like this? -DzkHost=host1:port1,host2:port2,host3:port3.*
>> >>
>> >> Yes.
>> >>
>> >> > *
>> >> > *
>> >> > *5.) *I started *Tomcat (port 8080)* with a single Solr server and
>> >> > everything seems to be OK: there is a single core setted as
>> 'items_en'
>> >> > and
>> >> > Cloud button is active. The graph is a simple tree with shard1 and
>> >> > shard2.
>> >> > Connected to shard1 is the current instance. *Also, if I execute any
>> >> > query
>> >> > I just receive a 503 error code: "no servers hosting".*
>> >> > *
>> >>
>> >> Not sure why offhand - if you are not passing jetty.port (or something
>> >> else if you have renamed it - like tomcat.port), that will be a
>> >> problem.
>> >>
>> >> > *
>> >> > *
>> >> > *
>> >> > *6.) *I started another Solr server in a* second Tomcat instance
>> (port
>> >> > 9080). *Its Solr home is in the following path:
>> >> >
>> >> > *Indexes conf/data tree*: /mnt/data-store*/solr2/*
>> >> >                                                    /solr.xml
>> >> >                                                    /zoo.cfg
>> >> >                                                    /items_en/
>> >> >
>>  /conf/
>> >> >
>> >> > schema.xml
>> >> >
>> >> > solrconfig.xml
>> >> >
>> >> > etc.
>> >> >
>> >> > Notice that I have a second Solr home for this second Solr server.
>> >> > Again,
>> >> > when depolying it in Tomcat the Cloud button is active, but when I
>> >> > analyze
>> >> > the graph it appears another empty tree/shard1+shard2 graph where
>> shard1
>> >> > is
>> >> > Solr server instance from Tomcat 9080. What I have expected is that
>> this
>> >> > second Solr server instance becomes shard2, but it doesn´t. The most
>> >> > interesting thing is that I was watching in paralallel both Tomcat1
>> and
>> >> > Tomcat2 logs and they output some *"INFO: Updating live nodes"*
>> traces,
>> >> > so
>> >> > I thought everything was allright, but it doesn´t, :-(*
>> >> > *
>> >> > *
>> >> > *
>> >> > *Question 5: ehem... what I´m doing wrong? Can anyone help me? I just
>> >> > one
>> >> > to follow the same example from SolrCloud wiki where there exists two
>> >> > application server instances, each one with a Solr server deployed in
>> >> > and
>> >> > each Solr server becomes shard1 and shard2.*
>> >>
>> >> I'm guessing its the jetty.port issue until you tell me otherwise.
>> >>
>> >> > *
>> >> > *
>> >> > *
>> >> > *
>> >> > Thank you very much for your help. At last I promise to write a
>> detailed
>> >> > (and for dummies, like me) step by step tutorial about how to
>> configure
>> >> > and
>> >> > deploy SolrCloud in Tomcat that I hope could help others.
>> >> >
>> >> >
>> >> > Regards,
>> >> >
>> >> >
>> >> >
>> >> > Luis Cappa.
>> >>
>> >>
>> >>
>> >> --
>> >> - Mark
>> >
>> >
>>
>>
>>
>> --
>> - Mark
>>
>
>


-- 

- Luis Cappa