You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Boban Acimovic <ba...@it-agenten.com> on 2019/02/11 11:18:30 UTC

Load balance writes

I am wondering would I get performance benefits if I distribute writes to Solr nodes by sending documents exactly to the master of collection where the document belongs? My idea is that this would save some load between the cluster nodes and improve performances. How to do writes in the best way? Thank you in advance.

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

OK, thank you guys :)

Regards,
Boban

Re: Load balance writes

Posted by Jason Gerlowski <ge...@gmail.com>.

> On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.

Did you mean "ConcurrentUpdateSolrClient"?  I don't think
CloudSolrClient does this, though I've been surprised before and
possible I just missed something.  Just wondering.

Jason

On Mon, Feb 11, 2019 at 2:14 PM Walter Underwood <wu...@wunderwood.org> wrote:
>
> The update router would also need to look for failures indexing at each leader,
> then re-read the cluster state to see if the leader had changed. Also re-send any
> failed updates, and so on.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 11, 2019, at 11:07 AM, lstusr 5u93n4 <ls...@gmail.com> wrote:
> >
> > Hi Boban,
> >
> > First of all: I agree with Walter here. Because the bottleneck is during
> > indexing on the leader, a basic round robin load balancer will perform just
> > as well as a custom solution. With far less headache. A custom solution
> > will be far more work than it's worth.
> >
> > But, should you really want to write this yourself, you can get all of the
> > information you need from zookeeper, from the path:
> >
> > <zkroot>/collections/<collection_name>/state.json
> >
> > There, for each shard you'll see:
> >  - the "range" parameter that tells  you which subset of documents this
> > shard is responsible for (see
> > https://lucene.apache.org/solr/guide/7_6/shards-and-indexing-data-in-solrcloud.html#document-routing
> > for details on routing)
> >  - the list of all replicas. On each replica it will tell you:
> >      - the host name (base_url)
> >      - if it is the leader (has the property leader: true)
> >
> > So your go-based solution would be to watch the state.json file from
> > zookeeper, and build up a function that, given the proper routing structure
> > for your document (the hash of the id by default, I think) will return the
> > hostname of the replica that's the leader.
> >
> > Kyle
> >
> > On Mon, 11 Feb 2019 at 13:30, Boban Acimovic <ba...@it-agenten.com> wrote:
> >
> >> Like I said before, nginx is not a load balancer or at least not a clever
> >> load balancer. It does not talk to ZK. Please give me advanced solutions.
> >>
> >>
> >>
> >>
> >>> On 11. Feb 2019, at 18:32, Walter Underwood <wu...@wunderwood.org>
> >> wrote:
> >>>
> >>> I haven’t used Kubernetes, but a web search for “helm nginx” seems to
> >> give some useful pages.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wunder@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <
> >> daniel.davis@nih.gov> wrote:
> >>>>
> >>>> I think that the container orchestration framework takes care of that
> >> for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress
> >> controller, and as long as the services are running within the Kubernetes
> >> cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a
> >> "Load Balancer" appears to be a concept for accessing services outside the
> >> cluster.
> >>>>
> >>>> I presume you are using Kubernetes because of your reference to helm,
> >> but for what it's worth, here's an official haproxy image -
> >> https://hub.docker.com/_/haproxy
> >>
>

Re: Load balance writes

Posted by Walter Underwood <wu...@wunderwood.org>.

The update router would also need to look for failures indexing at each leader,
then re-read the cluster state to see if the leader had changed. Also re-send any
failed updates, and so on.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 11:07 AM, lstusr 5u93n4 <ls...@gmail.com> wrote:
> 
> Hi Boban,
> 
> First of all: I agree with Walter here. Because the bottleneck is during
> indexing on the leader, a basic round robin load balancer will perform just
> as well as a custom solution. With far less headache. A custom solution
> will be far more work than it's worth.
> 
> But, should you really want to write this yourself, you can get all of the
> information you need from zookeeper, from the path:
> 
> <zkroot>/collections/<collection_name>/state.json
> 
> There, for each shard you'll see:
>  - the "range" parameter that tells  you which subset of documents this
> shard is responsible for (see
> https://lucene.apache.org/solr/guide/7_6/shards-and-indexing-data-in-solrcloud.html#document-routing
> for details on routing)
>  - the list of all replicas. On each replica it will tell you:
>      - the host name (base_url)
>      - if it is the leader (has the property leader: true)
> 
> So your go-based solution would be to watch the state.json file from
> zookeeper, and build up a function that, given the proper routing structure
> for your document (the hash of the id by default, I think) will return the
> hostname of the replica that's the leader.
> 
> Kyle
> 
> On Mon, 11 Feb 2019 at 13:30, Boban Acimovic <ba...@it-agenten.com> wrote:
> 
>> Like I said before, nginx is not a load balancer or at least not a clever
>> load balancer. It does not talk to ZK. Please give me advanced solutions.
>> 
>> 
>> 
>> 
>>> On 11. Feb 2019, at 18:32, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>>> 
>>> I haven’t used Kubernetes, but a web search for “helm nginx” seems to
>> give some useful pages.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <
>> daniel.davis@nih.gov> wrote:
>>>> 
>>>> I think that the container orchestration framework takes care of that
>> for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress
>> controller, and as long as the services are running within the Kubernetes
>> cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a
>> "Load Balancer" appears to be a concept for accessing services outside the
>> cluster.
>>>> 
>>>> I presume you are using Kubernetes because of your reference to helm,
>> but for what it's worth, here's an official haproxy image -
>> https://hub.docker.com/_/haproxy
>>

Re: Load balance writes

Posted by lstusr 5u93n4 <ls...@gmail.com>.

Hi Boban,

First of all: I agree with Walter here. Because the bottleneck is during
indexing on the leader, a basic round robin load balancer will perform just
as well as a custom solution. With far less headache. A custom solution
will be far more work than it's worth.

But, should you really want to write this yourself, you can get all of the
information you need from zookeeper, from the path:

<zkroot>/collections/<collection_name>/state.json

There, for each shard you'll see:
  - the "range" parameter that tells  you which subset of documents this
shard is responsible for (see
https://lucene.apache.org/solr/guide/7_6/shards-and-indexing-data-in-solrcloud.html#document-routing
for details on routing)
  - the list of all replicas. On each replica it will tell you:
      - the host name (base_url)
      - if it is the leader (has the property leader: true)

So your go-based solution would be to watch the state.json file from
zookeeper, and build up a function that, given the proper routing structure
for your document (the hash of the id by default, I think) will return the
hostname of the replica that's the leader.

Kyle

On Mon, 11 Feb 2019 at 13:30, Boban Acimovic <ba...@it-agenten.com> wrote:

> Like I said before, nginx is not a load balancer or at least not a clever
> load balancer. It does not talk to ZK. Please give me advanced solutions.
>
>
>
>
> > On 11. Feb 2019, at 18:32, Walter Underwood <wu...@wunderwood.org>
> wrote:
> >
> > I haven’t used Kubernetes, but a web search for “helm nginx” seems to
> give some useful pages.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.davis@nih.gov> wrote:
> >>
> >> I think that the container orchestration framework takes care of that
> for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress
> controller, and as long as the services are running within the Kubernetes
> cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a
> "Load Balancer" appears to be a concept for accessing services outside the
> cluster.
> >>
> >> I presume you are using Kubernetes because of your reference to helm,
> but for what it's worth, here's an official haproxy image -
> https://hub.docker.com/_/haproxy
>

Re: Load balance writes

Posted by Walter Underwood <wu...@wunderwood.org>.

For the fourth time, ignore the shard leaders until you have measurements that prove the complexity is worth it.

We can index a million documents per minute by sending batched updates to a dumb load balancer.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 10:29 AM, Boban Acimovic <ba...@it-agenten.com> wrote:
> 
> Like I said before, nginx is not a load balancer or at least not a clever load balancer. It does not talk to ZK. Please give me advanced solutions.
> 
> 
> 
> 
>> On 11. Feb 2019, at 18:32, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>> I haven’t used Kubernetes, but a web search for “helm nginx” seems to give some useful pages.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
>>> 
>>> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
>>> 
>>> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

Like I said before, nginx is not a load balancer or at least not a clever load balancer. It does not talk to ZK. Please give me advanced solutions.




> On 11. Feb 2019, at 18:32, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> I haven’t used Kubernetes, but a web search for “helm nginx” seems to give some useful pages.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
>> 
>> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
>> 
>> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy

Re: Load balance writes

Posted by Walter Underwood <wu...@wunderwood.org>.

I haven’t used Kubernetes, but a web search for “helm nginx” seems to give some useful pages.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
> 
> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
> 
> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy
> 
>> -----Original Message-----
>> From: Boban Acimovic <ba...@it-agenten.com>
>> Sent: Monday, February 11, 2019 11:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Load balance writes
>> 
>> Can you mention one dockerized load balancer? Or even better one with
>> Helm chart?
>> 
>> 
>> Like I said, I send all updates at the moment just to one out of 12 nodes.
>> 
>> 
>> 
>> 
>>> On 11. Feb 2019, at 17:52, Walter Underwood
>> <wu...@wunderwood.org> wrote:
>>> 
>>> Why would you want to write a load balancer when there are so many that
>> are free and very fast?
>>> 
>>> For update traffic, there is very little benefit in sending updates directly to
>> the shard leader. Forwarding an update to the leader is fast. Indexing is slow.
>> So the bottleneck is always at the leader.
>>> 
>>> Before you build anything, measure. Collect a large update and send that
>> directly to the leader. Then do the same to a non-leader shard. Compare the
>> speed. If you are batching and indexing with multiple threads, I doubt you’ll
>> see a meaningful difference. I commonly see 10% difference in identical load
>> benchmarks, so the speedup has to be much larger than that to be real.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

But like I said in the previous message, nginx is not aware of the status of Solr nodes. I can easily write Go load balancer but not considering the shards. The only problem I have here is how to figure out which shard master is responsible of a document I want to insert to the index. How does Solr sharing works? Which values are used to determine the shard?




> On 11. Feb 2019, at 18:13, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
> 
> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
> 
> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy
> 
>> -----Original Message-----
>> From: Boban Acimovic <ba...@it-agenten.com>
>> Sent: Monday, February 11, 2019 11:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Load balance writes
>> 
>> Can you mention one dockerized load balancer? Or even better one with
>> Helm chart?
>> 
>> 
>> Like I said, I send all updates at the moment just to one out of 12 nodes.
>> 
>> 
>> 
>> 
>>> On 11. Feb 2019, at 17:52, Walter Underwood
>> <wu...@wunderwood.org> wrote:
>>> 
>>> Why would you want to write a load balancer when there are so many that
>> are free and very fast?
>>> 
>>> For update traffic, there is very little benefit in sending updates directly to
>> the shard leader. Forwarding an update to the leader is fast. Indexing is slow.
>> So the bottleneck is always at the leader.
>>> 
>>> Before you build anything, measure. Collect a large update and send that
>> directly to the leader. Then do the same to a non-leader shard. Compare the
>> speed. If you are batching and indexing with multiple threads, I doubt you’ll
>> see a meaningful difference. I commonly see 10% difference in identical load
>> benchmarks, so the speedup has to be much larger than that to be real.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)

RE: Load balance writes

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.

I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy

> -----Original Message-----
> From: Boban Acimovic <ba...@it-agenten.com>
> Sent: Monday, February 11, 2019 11:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Load balance writes
> 
> Can you mention one dockerized load balancer? Or even better one with
> Helm chart?
> 
> 
> Like I said, I send all updates at the moment just to one out of 12 nodes.
> 
> 
> 
> 
> > On 11. Feb 2019, at 17:52, Walter Underwood
> <wu...@wunderwood.org> wrote:
> >
> > Why would you want to write a load balancer when there are so many that
> are free and very fast?
> >
> > For update traffic, there is very little benefit in sending updates directly to
> the shard leader. Forwarding an update to the leader is fast. Indexing is slow.
> So the bottleneck is always at the leader.
> >
> > Before you build anything, measure. Collect a large update and send that
> directly to the leader. Then do the same to a non-leader shard. Compare the
> speed. If you are batching and indexing with multiple threads, I doubt you’ll
> see a meaningful difference. I commonly see 10% difference in identical load
> benchmarks, so the speedup has to be much larger than that to be real.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

This is naive load balancing because it is not aware of ZK.




> On 11. Feb 2019, at 18:05, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> nginx
> 
> http://nginx.org/en/docs/http/load_balancing.html
> https://hub.docker.com/_/nginx
> 
> We run in Amazon AWS, so we use their Application Load Balaner (ALB). We do use nginx for other things.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Walter Underwood <wu...@wunderwood.org>.

nginx

http://nginx.org/en/docs/http/load_balancing.html
https://hub.docker.com/_/nginx

We run in Amazon AWS, so we use their Application Load Balaner (ALB). We do use nginx for other things.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:57 AM, Boban Acimovic <ba...@it-agenten.com> wrote:
> 
> Can you mention one dockerized load balancer? Or even better one with Helm chart?
> 
> 
> Like I said, I send all updates at the moment just to one out of 12 nodes.
> 
>> On 11. Feb 2019, at 17:52, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>> Why would you want to write a load balancer when there are so many that are free and very fast?
>> 
>> For update traffic, there is very little benefit in sending updates directly to the shard leader. Forwarding an update to the leader is fast. Indexing is slow. So the bottleneck is always at the leader.
>> 
>> Before you build anything, measure. Collect a large update and send that directly to the leader. Then do the same to a non-leader shard. Compare the speed. If you are batching and indexing with multiple threads, I doubt you’ll see a meaningful difference. I commonly see 10% difference in identical load benchmarks, so the speedup has to be much larger than that to be real.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

Can you mention one dockerized load balancer? Or even better one with Helm chart?


Like I said, I send all updates at the moment just to one out of 12 nodes.




> On 11. Feb 2019, at 17:52, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> Why would you want to write a load balancer when there are so many that are free and very fast?
> 
> For update traffic, there is very little benefit in sending updates directly to the shard leader. Forwarding an update to the leader is fast. Indexing is slow. So the bottleneck is always at the leader.
> 
> Before you build anything, measure. Collect a large update and send that directly to the leader. Then do the same to a non-leader shard. Compare the speed. If you are batching and indexing with multiple threads, I doubt you’ll see a meaningful difference. I commonly see 10% difference in identical load benchmarks, so the speedup has to be much larger than that to be real.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Walter Underwood <wu...@wunderwood.org>.

Why would you want to write a load balancer when there are so many that are free and very fast?

For update traffic, there is very little benefit in sending updates directly to the shard leader. Forwarding an update to the leader is fast. Indexing is slow. So the bottleneck is always at the leader.

Before you build anything, measure. Collect a large update and send that directly to the leader. Then do the same to a non-leader shard. Compare the speed. If you are batching and indexing with multiple threads, I doubt you’ll see a meaningful difference. I commonly see 10% difference in identical load benchmarks, so the speedup has to be much larger than that to be real.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:38 AM, Boban Acimovic <ba...@it-agenten.com> wrote:
> 
> I would actually like to write a load balancer itself, but I want it to be able to send the data as efficiently as possible. I know how to read ZK data, but I don’t know how can I figure out which shard is responsible upon data that I have in a document that I want to index.
> 
> 
> 
> 
>> On 11. Feb 2019, at 17:23, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>> We send all updates to the load balancer, so they’ll end up on the wrong shard, not on the leader, etc. Indexing speed is still limited by the CPU available on each leader. I don’t think that sending the update to the right leader makes any improvement in throughput.
>> 
>> On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.
>> 
>> I would stay with your current indexing client and worry about something else.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

I would actually like to write a load balancer itself, but I want it to be able to send the data as efficiently as possible. I know how to read ZK data, but I don’t know how can I figure out which shard is responsible upon data that I have in a document that I want to index.




> On 11. Feb 2019, at 17:23, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> We send all updates to the load balancer, so they’ll end up on the wrong shard, not on the leader, etc. Indexing speed is still limited by the CPU available on each leader. I don’t think that sending the update to the right leader makes any improvement in throughput.
> 
> On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.
> 
> I would stay with your current indexing client and worry about something else.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)

Re: Load balance writes

Posted by Walter Underwood <wu...@wunderwood.org>.

We send all updates to the load balancer, so they’ll end up on the wrong shard, not on the leader, etc. Indexing speed is still limited by the CPU available on each leader. I don’t think that sending the update to the right leader makes any improvement in throughput.

On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.

I would stay with your current indexing client and worry about something else.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:15 AM, Emir Arnautović <em...@sematext.com> wrote:
> 
> Hi Boban,
> Not sure if there is Solrj port to Go, but you can take that as model to build your ZK aware client that groups and sends updates to shard leaders. I see that there are couple of Solr Go clients, so you might first check if some already supports it or if it makes sense that you contribute that part to one of your choice.
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 11 Feb 2019, at 16:09, Boban Acimovic <ba...@it-agenten.com> wrote:
>> 
>> Thank you Emir for quick reply. I use home brewed Go client and write just to one of 12 available nodes. I believe I should find out this smart way to handle this :)
>> 
>> 
>> 
>> 
>>> On 11. Feb 2019, at 15:21, Emir Arnautović <em...@sematext.com> wrote:
>>> 
>>> Hi Boban,
>>> If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

Thank you again Emir. I can make my code ZK aware, that is no problem, but I can’t make it shard leader aware.  Can you point me to a document how are Solr shards created?  I already use ZK to get stuff, but I don’ t understand how to distinguish between shards from information I can get from a document that has to be indexes.

At the moment I send everything to one node, but I am pretty much sure it would help to send data to collection nodes. However, it would be even better it I can send data directly to shard leader. If you can’t describe this easily, I will check Soltj implementation.

Regards,
Boban

> On 11. Feb 2019, at 17:15, Emir Arnautović <em...@sematext.com> wrote:
> 
> Hi Boban,
> Not sure if there is Solrj port to Go, but you can take that as model to build your ZK aware client that groups and sends updates to shard leaders. I see that there are couple of Solr Go clients, so you might first check if some already supports it or if it makes sense that you contribute that part to one of your choice.
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Re: Load balance writes

Posted by Emir Arnautović <em...@sematext.com>.

Hi Boban,
Not sure if there is Solrj port to Go, but you can take that as model to build your ZK aware client that groups and sends updates to shard leaders. I see that there are couple of Solr Go clients, so you might first check if some already supports it or if it makes sense that you contribute that part to one of your choice.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Feb 2019, at 16:09, Boban Acimovic <ba...@it-agenten.com> wrote:
> 
> Thank you Emir for quick reply. I use home brewed Go client and write just to one of 12 available nodes. I believe I should find out this smart way to handle this :)
> 
> 
> 
> 
>> On 11. Feb 2019, at 15:21, Emir Arnautović <em...@sematext.com> wrote:
>> 
>> Hi Boban,
>> If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>

Re: Load balance writes

Posted by Boban Acimovic <ba...@it-agenten.com>.

Thank you Emir for quick reply. I use home brewed Go client and write just to one of 12 available nodes. I believe I should find out this smart way to handle this :)

> On 11. Feb 2019, at 15:21, Emir Arnautović <em...@sematext.com> wrote:
> 
> Hi Boban,
> If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Re: Load balance writes

Posted by Emir Arnautović <em...@sematext.com>.

Hi Boban,
If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Feb 2019, at 12:18, Boban Acimovic <ba...@it-agenten.com> wrote:
> 
> I am wondering would I get performance benefits if I distribute writes to Solr nodes by sending documents exactly to the master of collection where the document belongs? My idea is that this would save some load between the cluster nodes and improve performances. How to do writes in the best way? Thank you in advance.