You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Varun Thacker <va...@gmail.com> on 2016/06/16 17:14:39 UTC

NoHttpResponseException error between leader and replica

When running a bulk index process occasionally we see a
NoHttpResponseException error when the leader is forwarding docs to the
replica. I think this is a known issue and can be reproduced pretty easily.

What makes me want to dig more is that because of one such
NoHttpResponseException the leader will put the replica into recovery. The
replica can never catch up because the indexing throughput is quite high .
This can add hours of recovery time for the replica depending on how many
documents one is indexing .

So from what I can think we have two options here -
1. Implement a thread which removes stale connections. This has been
discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the past
2. The above solution is not the right way forward. The main problem here
is that replicas can't catch up because Solr doesn't implement backpressure
yet and implementing that would be the correct solution here

Does anyone have an opinion on how we should we go forward with this issue?



-- 


Regards,
Varun Thacker

Re: NoHttpResponseException error between leader and replica

Posted by Mark Miller <ma...@gmail.com>.

you can't do it good enough *without* built in support

On Fri, Jun 17, 2016 at 9:53 AM Mark Miller <ma...@gmail.com> wrote:

> No, you can't do it good enough with built in support. You can follow that
> ticket and see that is how I started. If it's a big enough issue, we should
> backport that to 6x and deal with the back compat breaks.
>
> On Fri, Jun 17, 2016 at 8:11 AM Varun Thacker <va...@gmail.com>
> wrote:
>
>> Hi Mark,
>>
>> So for the 6.x line do you think we should add a background thread which
>> expires idle connections and expired connections ?  Or do you have any
>> other recommendations ?
>>
>> On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <ma...@gmail.com>
>> wrote:
>>
>>> Ah, forgot to mention, it's only on 7x.
>>>
>>> Mark
>>>
>>> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <
>>> varunthacker1989@gmail.com> wrote:
>>>
>>>> bq. It's now part of HttpClient.
>>>>
>>>> Were you referring to Line230 of HttpClientUtil on master ? - cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
>>>> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>>>>
>>>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
>>>> varunthacker1989@gmail.com> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>>>>> collection.
>>>>> The test data is roughly 30M large documents. The indexing process is
>>>>> via map-reduce and there are 80 parallel reducers sending a batch of 500
>>>>> documents to solr at a go.
>>>>>
>>>>> In this setup almost all runs hit the NoHttpResponseException b/w
>>>>> leader and replica once.
>>>>>
>>>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats
>>>>> part of HttpClient?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm sorry, you say it's easy to reproduce, but can you explain
>>>>>> roughly what you are doing to reproduce it?
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That's already how things work. It's now part of HttpClient. There
>>>>>>> are some settings you can mess with. Is it easy to reproduce?
>>>>>>>
>>>>>>> Mark
>>>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>>>>> varunthacker1989@gmail.com> wrote:
>>>>>>>
>>>>>>>> When running a bulk index process occasionally we see a
>>>>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>>>>> replica. I think this is a known issue and can be reproduced pretty easily.
>>>>>>>>
>>>>>>>> What makes me want to dig more is that because of one such
>>>>>>>> NoHttpResponseException the leader will put the replica into recovery. The
>>>>>>>> replica can never catch up because the indexing throughput is quite high .
>>>>>>>> This can add hours of recovery time for the replica depending on how many
>>>>>>>> documents one is indexing .
>>>>>>>>
>>>>>>>> So from what I can think we have two options here -
>>>>>>>> 1. Implement a thread which removes stale connections. This has
>>>>>>>> been discussed on https://issues.apache.org/jira/browse/SOLR-4509
>>>>>>>> in the past
>>>>>>>> 2. The above solution is not the right way forward. The main
>>>>>>>> problem here is that replicas can't catch up because Solr doesn't implement
>>>>>>>> backpressure yet and implementing that would be the correct solution here
>>>>>>>>
>>>>>>>> Does anyone have an opinion on how we should we go forward with
>>>>>>>> this issue?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Varun Thacker
>>>>>>>>
>>>>>>> --
>>>>>>> - Mark
>>>>>>> about.me/markrmiller
>>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>> about.me/markrmiller
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Regards,
>>>>> Varun Thacker
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>> Varun Thacker
>>>>
>>> --
>>> - Mark
>>> about.me/markrmiller
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Posted by Mark Miller <ma...@gmail.com>.

No, you can't do it good enough with built in support. You can follow that
ticket and see that is how I started. If it's a big enough issue, we should
backport that to 6x and deal with the back compat breaks.

On Fri, Jun 17, 2016 at 8:11 AM Varun Thacker <va...@gmail.com>
wrote:

> Hi Mark,
>
> So for the 6.x line do you think we should add a background thread which
> expires idle connections and expired connections ?  Or do you have any
> other recommendations ?
>
> On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <ma...@gmail.com>
> wrote:
>
>> Ah, forgot to mention, it's only on 7x.
>>
>> Mark
>>
>> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <va...@gmail.com>
>> wrote:
>>
>>> bq. It's now part of HttpClient.
>>>
>>> Were you referring to Line230 of HttpClientUtil on master ? - cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
>>> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>>>
>>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
>>> varunthacker1989@gmail.com> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>>>> collection.
>>>> The test data is roughly 30M large documents. The indexing process is
>>>> via map-reduce and there are 80 parallel reducers sending a batch of 500
>>>> documents to solr at a go.
>>>>
>>>> In this setup almost all runs hit the NoHttpResponseException b/w
>>>> leader and replica once.
>>>>
>>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part
>>>> of HttpClient?
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>>>>> what you are doing to reproduce it?
>>>>>
>>>>> Mark
>>>>>
>>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That's already how things work. It's now part of HttpClient. There
>>>>>> are some settings you can mess with. Is it easy to reproduce?
>>>>>>
>>>>>> Mark
>>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>>>> varunthacker1989@gmail.com> wrote:
>>>>>>
>>>>>>> When running a bulk index process occasionally we see a
>>>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>>>> replica. I think this is a known issue and can be reproduced pretty easily.
>>>>>>>
>>>>>>> What makes me want to dig more is that because of one such
>>>>>>> NoHttpResponseException the leader will put the replica into recovery. The
>>>>>>> replica can never catch up because the indexing throughput is quite high .
>>>>>>> This can add hours of recovery time for the replica depending on how many
>>>>>>> documents one is indexing .
>>>>>>>
>>>>>>> So from what I can think we have two options here -
>>>>>>> 1. Implement a thread which removes stale connections. This has been
>>>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>>>>> past
>>>>>>> 2. The above solution is not the right way forward. The main problem
>>>>>>> here is that replicas can't catch up because Solr doesn't implement
>>>>>>> backpressure yet and implementing that would be the correct solution here
>>>>>>>
>>>>>>> Does anyone have an opinion on how we should we go forward with this
>>>>>>> issue?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Varun Thacker
>>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>> about.me/markrmiller
>>>>>>
>>>>> --
>>>>> - Mark
>>>>> about.me/markrmiller
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>> Varun Thacker
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>> Varun Thacker
>>>
>> --
>> - Mark
>> about.me/markrmiller
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Posted by Varun Thacker <va...@gmail.com>.

Hi Mark,

So for the 6.x line do you think we should add a background thread which
expires idle connections and expired connections ?  Or do you have any
other recommendations ?

On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <ma...@gmail.com> wrote:

> Ah, forgot to mention, it's only on 7x.
>
> Mark
>
> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <va...@gmail.com>
> wrote:
>
>> bq. It's now part of HttpClient.
>>
>> Were you referring to Line230 of HttpClientUtil on master ? - cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
>> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>>
>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
>> varunthacker1989@gmail.com> wrote:
>>
>>> Hi Mark,
>>>
>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>>> collection.
>>> The test data is roughly 30M large documents. The indexing process is
>>> via map-reduce and there are 80 parallel reducers sending a batch of 500
>>> documents to solr at a go.
>>>
>>> In this setup almost all runs hit the NoHttpResponseException b/w leader
>>> and replica once.
>>>
>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part
>>> of HttpClient?
>>>
>>>
>>>
>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <ma...@gmail.com>
>>> wrote:
>>>
>>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>>>> what you are doing to reproduce it?
>>>>
>>>> Mark
>>>>
>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> That's already how things work. It's now part of HttpClient. There are
>>>>> some settings you can mess with. Is it easy to reproduce?
>>>>>
>>>>> Mark
>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>>> varunthacker1989@gmail.com> wrote:
>>>>>
>>>>>> When running a bulk index process occasionally we see a
>>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>>> replica. I think this is a known issue and can be reproduced pretty easily.
>>>>>>
>>>>>> What makes me want to dig more is that because of one such
>>>>>> NoHttpResponseException the leader will put the replica into recovery. The
>>>>>> replica can never catch up because the indexing throughput is quite high .
>>>>>> This can add hours of recovery time for the replica depending on how many
>>>>>> documents one is indexing .
>>>>>>
>>>>>> So from what I can think we have two options here -
>>>>>> 1. Implement a thread which removes stale connections. This has been
>>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>>>> past
>>>>>> 2. The above solution is not the right way forward. The main problem
>>>>>> here is that replicas can't catch up because Solr doesn't implement
>>>>>> backpressure yet and implementing that would be the correct solution here
>>>>>>
>>>>>> Does anyone have an opinion on how we should we go forward with this
>>>>>> issue?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Varun Thacker
>>>>>>
>>>>> --
>>>>> - Mark
>>>>> about.me/markrmiller
>>>>>
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>> Varun Thacker
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
> --
> - Mark
> about.me/markrmiller
>



-- 


Regards,
Varun Thacker

Re: NoHttpResponseException error between leader and replica

Posted by Mark Miller <ma...@gmail.com>.

Ah, forgot to mention, it's only on 7x.

Mark
On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <va...@gmail.com>
wrote:

> bq. It's now part of HttpClient.
>
> Were you referring to Line230 of HttpClientUtil on master ? - cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>
> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
> varunthacker1989@gmail.com> wrote:
>
>> Hi Mark,
>>
>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>> collection.
>> The test data is roughly 30M large documents. The indexing process is via
>> map-reduce and there are 80 parallel reducers sending a batch of 500
>> documents to solr at a go.
>>
>> In this setup almost all runs hit the NoHttpResponseException b/w leader
>> and replica once.
>>
>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part
>> of HttpClient?
>>
>>
>>
>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <ma...@gmail.com>
>> wrote:
>>
>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>>> what you are doing to reproduce it?
>>>
>>> Mark
>>>
>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com>
>>> wrote:
>>>
>>>> That's already how things work. It's now part of HttpClient. There are
>>>> some settings you can mess with. Is it easy to reproduce?
>>>>
>>>> Mark
>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>> varunthacker1989@gmail.com> wrote:
>>>>
>>>>> When running a bulk index process occasionally we see a
>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>> replica. I think this is a known issue and can be reproduced pretty easily.
>>>>>
>>>>> What makes me want to dig more is that because of one such
>>>>> NoHttpResponseException the leader will put the replica into recovery. The
>>>>> replica can never catch up because the indexing throughput is quite high .
>>>>> This can add hours of recovery time for the replica depending on how many
>>>>> documents one is indexing .
>>>>>
>>>>> So from what I can think we have two options here -
>>>>> 1. Implement a thread which removes stale connections. This has been
>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>>> past
>>>>> 2. The above solution is not the right way forward. The main problem
>>>>> here is that replicas can't catch up because Solr doesn't implement
>>>>> backpressure yet and implementing that would be the correct solution here
>>>>>
>>>>> Does anyone have an opinion on how we should we go forward with this
>>>>> issue?
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Regards,
>>>>> Varun Thacker
>>>>>
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>>>
>>> --
>>> - Mark
>>> about.me/markrmiller
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Posted by Varun Thacker <va...@gmail.com>.

bq. It's now part of HttpClient.

Were you referring to Line230 of HttpClientUtil on master ? -
cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
VALIDATE_AFTER_INACTIVITY_DEFAULT));

On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <va...@gmail.com>
wrote:

> Hi Mark,
>
> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
> collection.
> The test data is roughly 30M large documents. The indexing process is via
> map-reduce and there are 80 parallel reducers sending a batch of 500
> documents to solr at a go.
>
> In this setup almost all runs hit the NoHttpResponseException b/w leader
> and replica once.
>
> "It's now part of HttpClient." - Sorry I didn't quite follow whats part of
> HttpClient?
>
>
>
> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <ma...@gmail.com>
> wrote:
>
>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>> what you are doing to reproduce it?
>>
>> Mark
>>
>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com>
>> wrote:
>>
>>> That's already how things work. It's now part of HttpClient. There are
>>> some settings you can mess with. Is it easy to reproduce?
>>>
>>> Mark
>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>> varunthacker1989@gmail.com> wrote:
>>>
>>>> When running a bulk index process occasionally we see a
>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>> replica. I think this is a known issue and can be reproduced pretty easily.
>>>>
>>>> What makes me want to dig more is that because of one such
>>>> NoHttpResponseException the leader will put the replica into recovery. The
>>>> replica can never catch up because the indexing throughput is quite high .
>>>> This can add hours of recovery time for the replica depending on how many
>>>> documents one is indexing .
>>>>
>>>> So from what I can think we have two options here -
>>>> 1. Implement a thread which removes stale connections. This has been
>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>> past
>>>> 2. The above solution is not the right way forward. The main problem
>>>> here is that replicas can't catch up because Solr doesn't implement
>>>> backpressure yet and implementing that would be the correct solution here
>>>>
>>>> Does anyone have an opinion on how we should we go forward with this
>>>> issue?
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>> Varun Thacker
>>>>
>>> --
>>> - Mark
>>> about.me/markrmiller
>>>
>> --
>> - Mark
>> about.me/markrmiller
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
>



-- 


Regards,
Varun Thacker

Re: NoHttpResponseException error between leader and replica

Posted by Varun Thacker <va...@gmail.com>.

Hi Mark,

We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
collection.
The test data is roughly 30M large documents. The indexing process is via
map-reduce and there are 80 parallel reducers sending a batch of 500
documents to solr at a go.

In this setup almost all runs hit the NoHttpResponseException b/w leader
and replica once.

"It's now part of HttpClient." - Sorry I didn't quite follow whats part of
HttpClient?



On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <ma...@gmail.com> wrote:

> I'm sorry, you say it's easy to reproduce, but can you explain roughly
> what you are doing to reproduce it?
>
> Mark
>
> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com> wrote:
>
>> That's already how things work. It's now part of HttpClient. There are
>> some settings you can mess with. Is it easy to reproduce?
>>
>> Mark
>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <va...@gmail.com>
>> wrote:
>>
>>> When running a bulk index process occasionally we see a
>>> NoHttpResponseException error when the leader is forwarding docs to the
>>> replica. I think this is a known issue and can be reproduced pretty easily.
>>>
>>> What makes me want to dig more is that because of one such
>>> NoHttpResponseException the leader will put the replica into recovery. The
>>> replica can never catch up because the indexing throughput is quite high .
>>> This can add hours of recovery time for the replica depending on how many
>>> documents one is indexing .
>>>
>>> So from what I can think we have two options here -
>>> 1. Implement a thread which removes stale connections. This has been
>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>> past
>>> 2. The above solution is not the right way forward. The main problem
>>> here is that replicas can't catch up because Solr doesn't implement
>>> backpressure yet and implementing that would be the correct solution here
>>>
>>> Does anyone have an opinion on how we should we go forward with this
>>> issue?
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>> Varun Thacker
>>>
>> --
>> - Mark
>> about.me/markrmiller
>>
> --
> - Mark
> about.me/markrmiller
>



-- 


Regards,
Varun Thacker

Re: NoHttpResponseException error between leader and replica

Posted by Mark Miller <ma...@gmail.com>.

I'm sorry, you say it's easy to reproduce, but can you explain roughly what
you are doing to reproduce it?

Mark
On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <ma...@gmail.com> wrote:

> That's already how things work. It's now part of HttpClient. There are
> some settings you can mess with. Is it easy to reproduce?
>
> Mark
> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <va...@gmail.com>
> wrote:
>
>> When running a bulk index process occasionally we see a
>> NoHttpResponseException error when the leader is forwarding docs to the
>> replica. I think this is a known issue and can be reproduced pretty easily.
>>
>> What makes me want to dig more is that because of one such
>> NoHttpResponseException the leader will put the replica into recovery. The
>> replica can never catch up because the indexing throughput is quite high .
>> This can add hours of recovery time for the replica depending on how many
>> documents one is indexing .
>>
>> So from what I can think we have two options here -
>> 1. Implement a thread which removes stale connections. This has been
>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the past
>> 2. The above solution is not the right way forward. The main problem here
>> is that replicas can't catch up because Solr doesn't implement backpressure
>> yet and implementing that would be the correct solution here
>>
>> Does anyone have an opinion on how we should we go forward with this
>> issue?
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
> --
> - Mark
> about.me/markrmiller
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Posted by Mark Miller <ma...@gmail.com>.

That's already how things work. It's now part of HttpClient. There are some
settings you can mess with. Is it easy to reproduce?

Mark
On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <va...@gmail.com>
wrote:

> When running a bulk index process occasionally we see a
> NoHttpResponseException error when the leader is forwarding docs to the
> replica. I think this is a known issue and can be reproduced pretty easily.
>
> What makes me want to dig more is that because of one such
> NoHttpResponseException the leader will put the replica into recovery. The
> replica can never catch up because the indexing throughput is quite high .
> This can add hours of recovery time for the replica depending on how many
> documents one is indexing .
>
> So from what I can think we have two options here -
> 1. Implement a thread which removes stale connections. This has been
> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the past
> 2. The above solution is not the right way forward. The main problem here
> is that replicas can't catch up because Solr doesn't implement backpressure
> yet and implementing that would be the correct solution here
>
> Does anyone have an opinion on how we should we go forward with this issue?
>
>
>
> --
>
>
> Regards,
> Varun Thacker
>
-- 
- Mark
about.me/markrmiller