You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2015/01/22 15:08:50 UTC

[Hadoop Wiki] Update of "SocketTimeout" by SteveLoughran

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "SocketTimeout" page has been changed by SteveLoughran:
https://wiki.apache.org/hadoop/SocketTimeout?action=diff&rev1=4&rev2=5

Comment:
Object stores can trigger SocketTimeoutException when throttling PUT/DELETE operations. 

   * The remote machine crashing. This cannot easily be distinguished from a network partitioning.
   * A change in the firewall settings of one of the machines preventing communication.
   * The settings are wrong and the client is trying to talk to the wrong machine, one that is not on the network. That could be an error in Hadoop configuration files, or an entry in the DNS tables or the /etc/hosts file.
+  * If using a client of an object store such as the Amazon S3 and OpenStack Swift clients, socket timeouts may be caused by remote-throttling of client requests: your program is making too many PUT/DELETE requests and is being deliberately blocked by the far end. This is most likely to happen when creating many small files, or performing bulk deletes (e.g. deleting a directory with many child entries). 
  
- Comparing this exception to the ConnectionRefused error, the latter indicates there is a server at the far end, but no program running on it can receive inbound connections on the chosen port. A Socket Timeout usually means that there is a something there, but it or the network are not working right
+ Comparing this exception to the ConnectionRefused error, the latter indicates there is a server at the far end, but no program running on it can receive inbound connections on the chosen port. A Socket Timeout usually means that there is something there, but it or the network are not working right
  
  == Identifying and Fixing Socket Timeouts ==
  The root cause of a Socket Timeout is a connectivity failure between the machines, so try the usual process
@@ -25, +26 @@

   1. Can you telnet to the target host and port?
   1. Can you telnet to the target host and port from any other machine?
   1. On the target machine, can you telnet to the port using localhost as the hostname. If this works but external network connections time out, it's usually a firewall issue.
+  1. If it is a remote object store: is the address correct? Does it only happen on bulk operations? If the latter, it's probably due to throttling at the far end.
  
  Remember: These are [[YourNetworkYourProblem|your network configuration problems]] . Only you can fix them.