You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/04/15 16:05:25 UTC

Usage of CloudSolrServer?

I am reading Lucidworks Solr Guide it says at SolrCloud section:

*Read Side Fault Tolerance*
With earlier versions of Solr, you had to set up your own load balancer.
Now each individual node
load balances requests across the replicas in a cluster. You still need a
load balancer on the
'outside' that talks to the cluster, or you need a smart client. (Solr
provides a smart Java Solrj
client called CloudSolrServer.)

My system is as follows: I crawl data with Nutch and send them into
SolrCloud. Users will search at Solr.

What is that CloudSolrServer, should I use it for load balancing or is it
something else different?

Re: Usage of CloudSolrServer?

Posted by Furkan KAMACI <fu...@gmail.com>.

CloudSolrServer uses LBHttpSolrServer by default. CloudSolrServer connects
to Zookeeper and passes the live nodes
to LBHttpSolrServer. LBHttpSolrServer connects each node as round robin. By
the way do you mean "leader" instead of "master"?

2013/7/12 sathish_ix <sk...@inautix.co.in>

> Hi ,
>
> Iam using cloudsolrserver to connect to solrcloud, im indexing the
> documents
> using solrj API using cloudsolrserver object. Index is triggered on master
> node of a collection, whereas if i need to find the status of the loading ,
> it return the message from replica where status is null. How to find which
> instance the cloudsolrserver is connecting ?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Usage-of-CloudSolrServer-tp4056052p4077471.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Usage of CloudSolrServer?

Posted by sathish_ix <sk...@inautix.co.in>.

Hi ,

Iam using cloudsolrserver to connect to solrcloud, im indexing the documents
using solrj API using cloudsolrserver object. Index is triggered on master
node of a collection, whereas if i need to find the status of the loading ,
it return the message from replica where status is null. How to find which
instance the cloudsolrserver is connecting ?





--
View this message in context: http://lucene.472066.n3.nabble.com/Usage-of-CloudSolrServer-tp4056052p4077471.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Usage of CloudSolrServer?

Posted by Upayavira <uv...@odoko.co.uk>.

I cannot say that I have researched it, but I have always taken it to be
random.

Upayavira

On Tue, Apr 16, 2013, at 12:23 PM, Furkan KAMACI wrote:
> Thanks for your detailed explanation. However you said:
> 
> "It will then choose one of those hosts/cores for each shard, and send a
> request to them as a distributed search request." Is there any document
> that explains of distributed search? What is the criteria for it?
> 
> 
> 2013/4/16 Upayavira <uv...@odoko.co.uk>
> 
> > If you are accessing Solr from Java code, you will likely use the SolrJ
> > client to do so. If your users are hitting Solr directly, you should
> > think about whether this is wise - as well as providing them with direct
> > search access, you are also providing them with the ability to delete
> > your entire index with a single command.
> >
> > SolrJ isn't really a load balancer as such. When SolrJ is used to make a
> > request against a collection, it will ask Zookeeper for the names of the
> > shards that make up that collection, and for the hosts/cores that make
> > up the set of replicas for those shards.
> >
> > It will then choose one of those hosts/cores for each shard, and send a
> > request to them as a distributed search request.
> >
> > This has the advantage over traditional load balancing that if you bring
> > up a new node, that node will register itself with ZooKeeper, and thus
> > your SolrJ client(s) will know about it, without any intervention.
> >
> > Upayavira
> >
> > On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
> > > Hi Shawn;
> > >
> > > I am sorry but what kind of Load Balancing is that? I mean does it check
> > > whether some leaders are using much CPU or RAM etc.? I think a problem
> > > may
> > > occur at such kind of scenario: if some of leaders getting more documents
> > > than other leaders (I don't know how it is decided that into which shard
> > > a
> > > document will go) than there will be a bottleneck on that leader?
> > >
> > >
> > > 2013/4/15 Shawn Heisey <so...@elyograg.org>
> > >
> > > > On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
> > > >
> > > >> My system is as follows: I crawl data with Nutch and send them into
> > > >> SolrCloud. Users will search at Solr.
> > > >>
> > > >> What is that CloudSolrServer, should I use it for load balancing or
> > is it
> > > >> something else different?
> > > >>
> > > >
> > > > It appears that the Solr integration in Nutch currently does not use
> > > > CloudSolrServer.  There is an issue to add it.  The mutual dependency
> > on
> > > > HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
> > > > HttpClient 4.
> > > >
> > > > https://issues.apache.org/**jira/browse/NUTCH-1377<
> > https://issues.apache.org/jira/browse/NUTCH-1377>
> > > >
> > > > Until that is fixed, a load balancer would be required for full
> > redundancy
> > > > for updates with SolrCloud.  You don't have to use a load balancer for
> > it
> > > > to work, but if the Solr server that Nutch is using goes down, then
> > > > indexing will stop unless you reconfigure Nutch or bring the Solr
> > server
> > > > back up.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> >

Re: Usage of CloudSolrServer?

Posted by Furkan KAMACI <fu...@gmail.com>.

Thanks for your detailed explanation. However you said:

"It will then choose one of those hosts/cores for each shard, and send a
request to them as a distributed search request." Is there any document
that explains of distributed search? What is the criteria for it?


2013/4/16 Upayavira <uv...@odoko.co.uk>

> If you are accessing Solr from Java code, you will likely use the SolrJ
> client to do so. If your users are hitting Solr directly, you should
> think about whether this is wise - as well as providing them with direct
> search access, you are also providing them with the ability to delete
> your entire index with a single command.
>
> SolrJ isn't really a load balancer as such. When SolrJ is used to make a
> request against a collection, it will ask Zookeeper for the names of the
> shards that make up that collection, and for the hosts/cores that make
> up the set of replicas for those shards.
>
> It will then choose one of those hosts/cores for each shard, and send a
> request to them as a distributed search request.
>
> This has the advantage over traditional load balancing that if you bring
> up a new node, that node will register itself with ZooKeeper, and thus
> your SolrJ client(s) will know about it, without any intervention.
>
> Upayavira
>
> On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
> > Hi Shawn;
> >
> > I am sorry but what kind of Load Balancing is that? I mean does it check
> > whether some leaders are using much CPU or RAM etc.? I think a problem
> > may
> > occur at such kind of scenario: if some of leaders getting more documents
> > than other leaders (I don't know how it is decided that into which shard
> > a
> > document will go) than there will be a bottleneck on that leader?
> >
> >
> > 2013/4/15 Shawn Heisey <so...@elyograg.org>
> >
> > > On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
> > >
> > >> My system is as follows: I crawl data with Nutch and send them into
> > >> SolrCloud. Users will search at Solr.
> > >>
> > >> What is that CloudSolrServer, should I use it for load balancing or
> is it
> > >> something else different?
> > >>
> > >
> > > It appears that the Solr integration in Nutch currently does not use
> > > CloudSolrServer.  There is an issue to add it.  The mutual dependency
> on
> > > HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
> > > HttpClient 4.
> > >
> > > https://issues.apache.org/**jira/browse/NUTCH-1377<
> https://issues.apache.org/jira/browse/NUTCH-1377>
> > >
> > > Until that is fixed, a load balancer would be required for full
> redundancy
> > > for updates with SolrCloud.  You don't have to use a load balancer for
> it
> > > to work, but if the Solr server that Nutch is using goes down, then
> > > indexing will stop unless you reconfigure Nutch or bring the Solr
> server
> > > back up.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
>

Re: Usage of CloudSolrServer?

Posted by Upayavira <uv...@odoko.co.uk>.

If you are accessing Solr from Java code, you will likely use the SolrJ
client to do so. If your users are hitting Solr directly, you should
think about whether this is wise - as well as providing them with direct
search access, you are also providing them with the ability to delete
your entire index with a single command.

SolrJ isn't really a load balancer as such. When SolrJ is used to make a
request against a collection, it will ask Zookeeper for the names of the
shards that make up that collection, and for the hosts/cores that make
up the set of replicas for those shards.

It will then choose one of those hosts/cores for each shard, and send a
request to them as a distributed search request.

This has the advantage over traditional load balancing that if you bring
up a new node, that node will register itself with ZooKeeper, and thus
your SolrJ client(s) will know about it, without any intervention.

Upayavira

On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote:
> Hi Shawn;
> 
> I am sorry but what kind of Load Balancing is that? I mean does it check
> whether some leaders are using much CPU or RAM etc.? I think a problem
> may
> occur at such kind of scenario: if some of leaders getting more documents
> than other leaders (I don't know how it is decided that into which shard
> a
> document will go) than there will be a bottleneck on that leader?
> 
> 
> 2013/4/15 Shawn Heisey <so...@elyograg.org>
> 
> > On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
> >
> >> My system is as follows: I crawl data with Nutch and send them into
> >> SolrCloud. Users will search at Solr.
> >>
> >> What is that CloudSolrServer, should I use it for load balancing or is it
> >> something else different?
> >>
> >
> > It appears that the Solr integration in Nutch currently does not use
> > CloudSolrServer.  There is an issue to add it.  The mutual dependency on
> > HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
> > HttpClient 4.
> >
> > https://issues.apache.org/**jira/browse/NUTCH-1377<https://issues.apache.org/jira/browse/NUTCH-1377>
> >
> > Until that is fixed, a load balancer would be required for full redundancy
> > for updates with SolrCloud.  You don't have to use a load balancer for it
> > to work, but if the Solr server that Nutch is using goes down, then
> > indexing will stop unless you reconfigure Nutch or bring the Solr server
> > back up.
> >
> > Thanks,
> > Shawn
> >
> >

Re: Usage of CloudSolrServer?

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi Shawn;

I am sorry but what kind of Load Balancing is that? I mean does it check
whether some leaders are using much CPU or RAM etc.? I think a problem may
occur at such kind of scenario: if some of leaders getting more documents
than other leaders (I don't know how it is decided that into which shard a
document will go) than there will be a bottleneck on that leader?


2013/4/15 Shawn Heisey <so...@elyograg.org>

> On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
>
>> My system is as follows: I crawl data with Nutch and send them into
>> SolrCloud. Users will search at Solr.
>>
>> What is that CloudSolrServer, should I use it for load balancing or is it
>> something else different?
>>
>
> It appears that the Solr integration in Nutch currently does not use
> CloudSolrServer.  There is an issue to add it.  The mutual dependency on
> HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses
> HttpClient 4.
>
> https://issues.apache.org/**jira/browse/NUTCH-1377<https://issues.apache.org/jira/browse/NUTCH-1377>
>
> Until that is fixed, a load balancer would be required for full redundancy
> for updates with SolrCloud.  You don't have to use a load balancer for it
> to work, but if the Solr server that Nutch is using goes down, then
> indexing will stop unless you reconfigure Nutch or bring the Solr server
> back up.
>
> Thanks,
> Shawn
>
>

Re: Usage of CloudSolrServer?

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/15/2013 8:05 AM, Furkan KAMACI wrote:
> My system is as follows: I crawl data with Nutch and send them into
> SolrCloud. Users will search at Solr.
>
> What is that CloudSolrServer, should I use it for load balancing or is it
> something else different?

It appears that the Solr integration in Nutch currently does not use 
CloudSolrServer.  There is an issue to add it.  The mutual dependency on 
HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses 
HttpClient 4.

https://issues.apache.org/jira/browse/NUTCH-1377

Until that is fixed, a load balancer would be required for full 
redundancy for updates with SolrCloud.  You don't have to use a load 
balancer for it to work, but if the Solr server that Nutch is using goes 
down, then indexing will stop unless you reconfigure Nutch or bring the 
Solr server back up.

Thanks,
Shawn