You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Mark Mc Keown (Jira)" <ji...@apache.org> on 2022/10/26 15:11:00 UTC

[jira] [Created] (HDFS-16825) hadoop-azure flush timing out and triggering retry

Mark Mc Keown created HDFS-16825:
------------------------------------

             Summary: hadoop-azure flush timing out and triggering retry
                 Key: HDFS-16825
                 URL: https://issues.apache.org/jira/browse/HDFS-16825
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Mark Mc Keown


From AbfsHttpOperation the code to create a HTTP connection to Azure is:

{code}
public AbfsHttpOperation(final URL url, final String method, final List<AbfsHttpHeader> requestHeaders)
      throws IOException {
    this.isTraceEnabled = LOG.isTraceEnabled();
    this.url = url;
    this.method = method;
    this.clientRequestId = UUID.randomUUID().toString();

    this.connection = openConnection();
    if (this.connection instanceof HttpsURLConnection) {
      HttpsURLConnection secureConn = (HttpsURLConnection) this.connection;
      SSLSocketFactory sslSocketFactory = SSLSocketFactoryEx.getDefaultFactory();
      if (sslSocketFactory != null) {
        secureConn.setSSLSocketFactory(sslSocketFactory);
      }
    }

    this.connection.setConnectTimeout(CONNECT_TIMEOUT);
    this.connection.setReadTimeout(READ_TIMEOUT);

    this.connection.setRequestMethod(method);

    for (AbfsHttpHeader header : requestHeaders) {
      this.connection.setRequestProperty(header.getName(), header.getValue());
    }

    this.connection.setRequestProperty(HttpHeaderConfigurations.X_MS_CLIENT_REQUEST_ID, clientRequestId);
  }
{code}

The READ_TIMEOUT is hard coded to 30 seconds. When a file uploaded to Azure and closed it triggers a flush operation - Azure sometimes takes longer than 30 seconds  to respond and this is triggering a retry within hadoop-azure library. 

(This can cause issues with DataBricks Autoloader which monitors EventGrid for tiggers to ingest data - multiple flush/close can confuse it, this is an Autoloader bug as retries can happen normally).

Can the READ_TIMEOUT be increased or made configurable?





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org