You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Gus Heck <gu...@gmail.com> on 2019/05/05 17:35:26 UTC

Socket Timeouts

I'm working with a client that's trying to process a lot of data (billions
of docs) via a streaming expression, and the initial query is (not
surprisingly) taking a long time. Lots of various types of timeouts have
been cropping up and I've found myself thinking I solved some only to
discover that the settings in solr.xml are far less wide reaching than I
thought initially. The present 5% scale cluster seems to hit one particular
time out about 50% of the time which has made it particularly confusing.
I'm guessing it's probably depending on something like how busy the
virtualization in Amazon is, just barely making it when it gets more
resources and timing out if anything is starved.

As I look around the code base I'm finding a LOT of places where timeouts
on SolrClients and CloudSolrClients are just arbitrarily set to one-off
constant values. The one bugging me right now is

public abstract class SolrClientBuilder<B extends SolrClientBuilder<B>> {

  protected HttpClient httpClient;
  protected ResponseParser responseParser;
  protected Integer connectionTimeoutMillis = 15000;
  protected Integer socketTimeoutMillis = 120000;

Which I am unable to change because of this code in SolrStream:

  /**
  * Opens the stream to a single Solr instance.
  **/
  public void open() throws IOException {
    if(cache == null) {
      client = new HttpSolrClient.Builder(baseUrl).build();
    } else {
      client = cache.getHttpSolrClient(baseUrl);
    }

I need to make this particular case configurable, so that I can get results
from a very long running query, but I sense that there is a much wider
problem in that we don't seem to have any organized plan for how socket
timeouts are set/managed in the code.

What thoughts have people had on this front?

-Gus

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Socket Timeouts

Posted by Gus Heck <gu...@gmail.com>.
@Martin, yeah I expected that to work initially, but it has no effect on
the streaming expression code path.

@Kevin, thx for the link :)




On Sun, May 5, 2019, 2:49 PM Martin Gainty <mg...@hotmail.com> wrote:

> both connectionTimeout and socketTimeout are declared in
> $SOLR_HOME/solr.xml
>
>  <shardHandlerFactory name="shardHandlerFactory"
>     class="HttpShardHandlerFactory">
>     <int name="socketTimeout">${socketTimeout:600000}</int>
>     <int name="connTimeout">${connTimeout:60000}</int>
>   </shardHandlerFactory>
>
> </solr>
>
>
> ------------------------------
> *From:* Gus Heck <gu...@gmail.com>
> *Sent:* Sunday, May 5, 2019 1:35 PM
> *To:* dev
> *Subject:* Socket Timeouts
>
> I'm working with a client that's trying to process a lot of data (billions
> of docs) via a streaming expression, and the initial query is (not
> surprisingly) taking a long time. Lots of various types of timeouts have
> been cropping up and I've found myself thinking I solved some only to
> discover that the settings in solr.xml are far less wide reaching than I
> thought initially. The present 5% scale cluster seems to hit one particular
> time out about 50% of the time which has made it particularly confusing.
> I'm guessing it's probably depending on something like how busy the
> virtualization in Amazon is, just barely making it when it gets more
> resources and timing out if anything is starved.
>
> As I look around the code base I'm finding a LOT of places where timeouts
> on SolrClients and CloudSolrClients are just arbitrarily set to one-off
> constant values. The one bugging me right now is
>
> public abstract class SolrClientBuilder<B extends SolrClientBuilder<B>> {
>
>   protected HttpClient httpClient;
>   protected ResponseParser responseParser;
>   protected Integer connectionTimeoutMillis = 15000;
>   protected Integer socketTimeoutMillis = 120000;
>
> Which I am unable to change because of this code in SolrStream:
>
>   /**
>   * Opens the stream to a single Solr instance.
>   **/
>   public void open() throws IOException {
>     if(cache == null) {
>       client = new HttpSolrClient.Builder(baseUrl).build();
>     } else {
>       client = cache.getHttpSolrClient(baseUrl);
>     }
>
> I need to make this particular case configurable, so that I can get
> results from a very long running query, but I sense that there is a much
> wider problem in that we don't seem to have any organized plan for how
> socket timeouts are set/managed in the code.
>
> What thoughts have people had on this front?
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Socket Timeouts

Posted by Martin Gainty <mg...@hotmail.com>.
both connectionTimeout and socketTimeout are declared in $SOLR_HOME/solr.xml

 <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:600000}</int>
    <int name="connTimeout">${connTimeout:60000}</int>
  </shardHandlerFactory>

</solr>


________________________________
From: Gus Heck <gu...@gmail.com>
Sent: Sunday, May 5, 2019 1:35 PM
To: dev
Subject: Socket Timeouts

I'm working with a client that's trying to process a lot of data (billions of docs) via a streaming expression, and the initial query is (not surprisingly) taking a long time. Lots of various types of timeouts have been cropping up and I've found myself thinking I solved some only to discover that the settings in solr.xml are far less wide reaching than I thought initially. The present 5% scale cluster seems to hit one particular time out about 50% of the time which has made it particularly confusing. I'm guessing it's probably depending on something like how busy the virtualization in Amazon is, just barely making it when it gets more resources and timing out if anything is starved.

As I look around the code base I'm finding a LOT of places where timeouts on SolrClients and CloudSolrClients are just arbitrarily set to one-off constant values. The one bugging me right now is

public abstract class SolrClientBuilder<B extends SolrClientBuilder<B>> {

  protected HttpClient httpClient;
  protected ResponseParser responseParser;
  protected Integer connectionTimeoutMillis = 15000;
  protected Integer socketTimeoutMillis = 120000;

Which I am unable to change because of this code in SolrStream:

  /**
  * Opens the stream to a single Solr instance.
  **/
  public void open() throws IOException {
    if(cache == null) {
      client = new HttpSolrClient.Builder(baseUrl).build();
    } else {
      client = cache.getHttpSolrClient(baseUrl);
    }

I need to make this particular case configurable, so that I can get results from a very long running query, but I sense that there is a much wider problem in that we don't seem to have any organized plan for how socket timeouts are set/managed in the code.

What thoughts have people had on this front?

-Gus

--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Socket Timeouts

Posted by Erick Erickson <er...@gmail.com>.
Gus:

I’d _really_ like this all to be in one place, so fire away. We’ve spent far too much time working with clients and wondering “where the hell are the timeouts for XYZ?”

IMO, they should _all_ be configurable in a single place, preferably sorl.xml. If you’re ambitious, a few comments in solr.xml about what the various parameters actually control would be super-welcome.

Best
Erick@CheeringYouOnWhileNotDoingTheWork…


> On May 5, 2019, at 11:39 AM, Kevin Risden <kr...@apache.org> wrote:
> 
> You might be interested in:
> 
> https://issues.apache.org/jira/browse/SOLR-13389
> 
> 
> Kevin Risden
> 
> 
> On Sun, May 5, 2019 at 1:35 PM Gus Heck <gu...@gmail.com> wrote:
> I'm working with a client that's trying to process a lot of data (billions of docs) via a streaming expression, and the initial query is (not surprisingly) taking a long time. Lots of various types of timeouts have been cropping up and I've found myself thinking I solved some only to discover that the settings in solr.xml are far less wide reaching than I thought initially. The present 5% scale cluster seems to hit one particular time out about 50% of the time which has made it particularly confusing. I'm guessing it's probably depending on something like how busy the virtualization in Amazon is, just barely making it when it gets more resources and timing out if anything is starved. 
> 
> As I look around the code base I'm finding a LOT of places where timeouts on SolrClients and CloudSolrClients are just arbitrarily set to one-off constant values. The one bugging me right now is 
> 
> public abstract class SolrClientBuilder<B extends SolrClientBuilder<B>> {
> 
>   protected HttpClient httpClient;
>   protected ResponseParser responseParser;
>   protected Integer connectionTimeoutMillis = 15000;
>   protected Integer socketTimeoutMillis = 120000;
> 
> Which I am unable to change because of this code in SolrStream:
> 
>   /**
>   * Opens the stream to a single Solr instance.
>   **/
>   public void open() throws IOException {
>     if(cache == null) {
>       client = new HttpSolrClient.Builder(baseUrl).build();
>     } else {
>       client = cache.getHttpSolrClient(baseUrl);
>     }
> 
> I need to make this particular case configurable, so that I can get results from a very long running query, but I sense that there is a much wider problem in that we don't seem to have any organized plan for how socket timeouts are set/managed in the code.
> 
> What thoughts have people had on this front? 
> 
> -Gus
> 
> -- 
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Socket Timeouts

Posted by Kevin Risden <kr...@apache.org>.
You might be interested in:

https://issues.apache.org/jira/browse/SOLR-13389


Kevin Risden


On Sun, May 5, 2019 at 1:35 PM Gus Heck <gu...@gmail.com> wrote:

> I'm working with a client that's trying to process a lot of data (billions
> of docs) via a streaming expression, and the initial query is (not
> surprisingly) taking a long time. Lots of various types of timeouts have
> been cropping up and I've found myself thinking I solved some only to
> discover that the settings in solr.xml are far less wide reaching than I
> thought initially. The present 5% scale cluster seems to hit one particular
> time out about 50% of the time which has made it particularly confusing.
> I'm guessing it's probably depending on something like how busy the
> virtualization in Amazon is, just barely making it when it gets more
> resources and timing out if anything is starved.
>
> As I look around the code base I'm finding a LOT of places where timeouts
> on SolrClients and CloudSolrClients are just arbitrarily set to one-off
> constant values. The one bugging me right now is
>
> public abstract class SolrClientBuilder<B extends SolrClientBuilder<B>> {
>
>   protected HttpClient httpClient;
>   protected ResponseParser responseParser;
>   protected Integer connectionTimeoutMillis = 15000;
>   protected Integer socketTimeoutMillis = 120000;
>
> Which I am unable to change because of this code in SolrStream:
>
>   /**
>   * Opens the stream to a single Solr instance.
>   **/
>   public void open() throws IOException {
>     if(cache == null) {
>       client = new HttpSolrClient.Builder(baseUrl).build();
>     } else {
>       client = cache.getHttpSolrClient(baseUrl);
>     }
>
> I need to make this particular case configurable, so that I can get
> results from a very long running query, but I sense that there is a much
> wider problem in that we don't seem to have any organized plan for how
> socket timeouts are set/managed in the code.
>
> What thoughts have people had on this front?
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>