You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by roySolr <ro...@gmail.com> on 2012/11/21 10:56:46 UTC

From Solr3.1 to SolrCloud

hello,

We are using solr 3.1 for searching on our webpage right now. We want to use
the nice features of solr 4: realtime search. Our current configuration
looks like this:

Master
Slave1
Slave2
Slave3

We have 3 slaves and 1 master and the data is replication every night. In
the future we want to update every ~5 seconds. I was looking to SOLRCLOUD
and got a few questions:

- We aren't using shards because our index only contains 1 mil simple docs.
We only need multiple server because the amount of traffic. In the examples
of solrCloud i see only examples with shards. Is numshards=1 possible? One
big index is faster than multiple shards? I need 1 collection with multiple
nodes?

- Should i run a single zookeeper instance(without solr) on a seperate
server? 

- Is the DIH still there in solr 4?

Any help is welcome!

Thanks
Roy



--
View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: From Solr3.1 to SolrCloud

Posted by Mark Miller <ma...@gmail.com>.
On Mon, Nov 26, 2012 at 9:40 AM, roySolr <ro...@gmail.com> wrote:
> Mark: I'm using a separate zookeeper instance. I don't use the embedded zk in
> solr.

Doesn't matter either way. Clear deletes whole directories.

-- 
- Mark

Re: From Solr3.1 to SolrCloud

Posted by roySolr <ro...@gmail.com>.
Mark: I'm using a separate zookeeper instance. I don't use the embedded zk in
solr. I can't find the location where the configs are stored, i can login to
zookeeper and see the configs. delete commando works but i can't delete the
whole config directory in once, only file by file.

Erick, The nodes aren't live anymore and not visible in "live_nodes" but
still in the cloud graph. Why is this and how can i remove it from there? I
was testing with 10 nodes and now only with 4. I see 6 nodes that aren't
there anymore.









--
View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4022358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: From Solr3.1 to SolrCloud

Posted by Mark Miller <ma...@gmail.com>.
The command line util has a clear command. If you use the out of the box setup, it's something like:

example/cloud-scripts/zkcli.sh -cmd clear /path/to/clear

http://wiki.apache.org/solr/SolrCloud#Command_Line_Util

- Mark

On Nov 26, 2012, at 3:44 AM, roySolr <ro...@gmail.com> wrote:

> Ok, that's important for the traffic.
> 
> Some questions about zookeeper. I have done some tests and i have the
> following questions:
> 
> - How can i delete configs from zookeeper?
> - I see some nodes in the clusterstate that are already gone. Why is this
> not up-to-date? Same for graph.
> 
> Thanks again!
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4022311.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: From Solr3.1 to SolrCloud

Posted by Erick Erickson <er...@gmail.com>.
what is the _status_ of the "nodes that are already gone"? What is the test
you run when you see this? It could just be that you're seeing nodes that
are unresponsive but that ZK knows about.

Best
Erick


On Mon, Nov 26, 2012 at 3:44 AM, roySolr <ro...@gmail.com> wrote:

> Ok, that's important for the traffic.
>
> Some questions about zookeeper. I have done some tests and i have the
> following questions:
>
> - How can i delete configs from zookeeper?
> - I see some nodes in the clusterstate that are already gone. Why is this
> not up-to-date? Same for graph.
>
> Thanks again!
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4022311.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: From Solr3.1 to SolrCloud

Posted by roySolr <ro...@gmail.com>.
Ok, that's important for the traffic.

Some questions about zookeeper. I have done some tests and i have the
following questions:

- How can i delete configs from zookeeper?
- I see some nodes in the clusterstate that are already gone. Why is this
not up-to-date? Same for graph.

Thanks again!




--
View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4022311.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: From Solr3.1 to SolrCloud

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I think that's correct. Queries to the existing nodes will still work with
no ZK.


On Fri, Nov 23, 2012 at 7:16 AM, roySolr <ro...@gmail.com> wrote:

> Thanks Tomás for the information so far.
>
> You said:
> You can effectively run with only one zk instance, the problem with this is
> that if that instance dies, then your whole cluster will go down.
>
> When the cluster goes down i can still send queries to the solr instances?
> We have a lb that's choose a solr instance round robin. Can Solr still
> handles query when there is no zookeeper up? Only updates wille be a
> problem?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021991.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: From Solr3.1 to SolrCloud

Posted by roySolr <ro...@gmail.com>.
Thanks Tomás for the information so far.

You said: 
You can effectively run with only one zk instance, the problem with this is
that if that instance dies, then your whole cluster will go down.

When the cluster goes down i can still send queries to the solr instances?
We have a lb that's choose a solr instance round robin. Can Solr still
handles query when there is no zookeeper up? Only updates wille be a
problem?





--
View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: From Solr3.1 to SolrCloud

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
>
> - I change my synonyms.txt on a solr node. How can i get zookeeper in sync
> and the other solr nodes without restart?
>

Well, you can upload the whole collection configuration again with zkClient
(included in the "cloud.scripts" section). see
http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper
Other option, if you only want to upload one file is to write something
that communicate with zk with any of it's apis. I did this before Solr's
"zkClient" was committed and it is quite simple. Then, you can reload the
collection, which is like reloading all the cores for the collection in the
different nodes. See
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

>
> - I read something more about zookeeper ensemble. When i need to run with 4
> solr nodes(replicas) i need 3 zookeepers in ensemble(50% live). When
> zookeeper and solr are separated it will takes 7 servers to get it live. In
> the past we only needed 4 servers. Are there some other options because the
> costs will grow? 3 zookeeper servers sounds like overkill.
>

The number of Solr instances doesn't have to do with the number of ZK
instances that you need to run.  You can effectively run with only one zk
instance, the problem with this is that if that instance dies, then your
whole cluster will go down. So you can increase the number of zk instances.
When you create your Zookeeper ensemble, you declare the size of it (the
number of zk instances it will contain). When you run that ensemble,
Zookeeper requires that N/2+1 of the servers are connected. This means that
if you want your zk ensemble to survive one instance dying, you'll need at
least 3 ZK instances (if you have 2, and one dies, you still need 2 to
work, so it wont).

There has been some discussions these days in the list about this, but if
the number of physical servers is too much for you, you could run on the
same physical machine an instance of Solr and ZK.

Tomás

Re: From Solr3.1 to SolrCloud

Posted by roySolr <ro...@gmail.com>.
I run a separate Zookeeper instance right now. Works great, nodes are visible
in admin.

Two more questions:

- I change my synonyms.txt on a solr node. How can i get zookeeper in sync
and the other solr nodes without restart?

- I read something more about zookeeper ensemble. When i need to run with 4
solr nodes(replicas) i need 3 zookeepers in ensemble(50% live). When
zookeeper and solr are separated it will takes 7 servers to get it live. In
the past we only needed 4 servers. Are there some other options because the
costs will grow? 3 zookeeper servers sounds like overkill.

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021849.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: From Solr3.1 to SolrCloud

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
>
> I will use numshards=1. Are there some instructions on how to install only
> zookeeper on a separate server? Or do i have to install solr 4 on that
> server?
>

You don't need to install Solr in that server. See
http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html


>
> How make the connection between the solr instances and the zk
> instance(server)?
>

With the -DzkHost=host:port , as described in the SolrCloud wiki page, but
now you have to set it to all the Solr instances, and none of them have to
use the "-DzkRun".

Tomás


> Thanks so far,
>
> Roy
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021583.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: From Solr3.1 to SolrCloud

Posted by roySolr <ro...@gmail.com>.
Thanks Tomás,

I will use numshards=1. Are there some instructions on how to install only
zookeeper on a separate server? Or do i have to install solr 4 on that
server?

How make the connection between the solr instances and the zk
instance(server)?

Thanks so far,

Roy




--
View this message in context: http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536p4021583.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: From Solr3.1 to SolrCloud

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
>
> - We aren't using shards because our index only contains 1 mil simple docs.
> We only need multiple server because the amount of traffic. In the examples
> of solrCloud i see only examples with shards. Is numshards=1 possible? One
> big index is faster than multiple shards? I need 1 collection with multiple
> nodes?
>
> Yes, you can use SolrCloud and specify the numShards=1. With 1M docs, I
would use one shard too, the overhead of the distribution may be bigger
than the time it takes to process a query on a single node with an index
this size (I do encourage you to test and see, because it usually depends
on more factors than just the index size, but I think 1 shard will be the
best).


> - Should i run a single zookeeper instance(without solr) on a seperate
> server?
>
Separate, and even better if you use a 3 zk ensemble, otherwise the
Zookeeper becomes a single point of failure.

>
> - Is the DIH still there in solr 4?
>
Yes, you'll see it in all nodes, and you can run it from any of them. That
said, you may see some improvements if you execute the DIH on the leader
node (which may not always be the same). I don't think the
dataimport.properties gets distributed though, you may have to figure that
out.


Tomás

>
> Any help is welcome!
>
> Thanks
> Roy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/From-Solr3-1-to-SolrCloud-tp4021536.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>