You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Jeff Cliff (JIRA)" <ji...@apache.org> on 2008/12/15 18:33:45 UTC

[jira] Commented: (NET-160) FTP Client API Hangs - Attached Scenario.

    [ https://issues.apache.org/jira/browse/NET-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656669#action_12656669 ] 

Jeff Cliff commented on NET-160:
--------------------------------

There may be multiple points where hanging can occur, but it seems like a lot of them share a common characteristic, described somewhat by Guy Rouillier in this thread.

For some of these hangs it looks like from within copyStream(), the InputStream source is blocking and not being refilled. The fact that it's blocking at at java.net.SocketInputStream.socketRead0(Native Method) is just a symptom of this level's not knowing that the InputStream is not available for reading. It's easy to simulate, too; merely unplug your network cable during a read. A connect() will rightly detect that there's nothing there to connect to, but while read(), 95% of the time seems to block indefinitely, even when setSoTimeout() and other set*Timeout()s are used.

As far as copyStream() is concerned, You can actually detect when it blocks, but there's a problem with that in that it sometimes blocks and then waits, and then after a short period(ms) source is refilled. One way to get around this in realistic situations is to have another Timeout***; this one from within Util itself, or probably better yet, encapsulate Util in a FTPClient specific Util that has this extra bit of baggage for these situations*. It seems to work**, but only for some aspects of this problem****. It'll at least give you an CopyStreamException, which for critical applications is a lot better than infinite waits on a nonexistent input streams caused by noncritical network error.

Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160

ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect: __getReply() is being called and its result is not being given to it, so it waits. Forever. Instead of at least throwing an Exception.

To perhaps rephrase(after a weekend to think about it); what seems to be happening, all over the FTPClient code, is any time a read is made to a stream before it is filled ends up being an infinite block. And this can happen. For example, in FTPClient.__getReply(), if a call is made to __getReply before the server actually replies, __getReply() blocks indefinitely. Even if the server actually does reply a millisecond or so after the indefinite blocking begins.

* I'm a fairly new java programmer so maybe this is not optimal.
** I'll attatch my updated Util class to this thread.
*** Someone pointed out that this is basically making setSoTimeout work. Not sure what to do about this
**** albeit the most common, the actual downloading of files. Uploading/connecting/disconnecting/other not really impacted

> FTP Client API Hangs - Attached Scenario.
> -----------------------------------------
>
>                 Key: NET-160
>                 URL: https://issues.apache.org/jira/browse/NET-160
>             Project: Commons Net
>          Issue Type: Bug
>    Affects Versions: 1.4
>         Environment: Suse Enterprise Linux 9.0, Intel
>            Reporter: Lvenkataraman
>
> FTP client hangs with the following error.   Here is the truss output. The client hangs indefinitely and no work is being done other than this error when you truss the process.  I use Java 1.5 to run this.  We wrote a long-running ftp daemon and this happens after 20-23 hours of continuous running.
> Here is the code to pull files from the ftp server. This one lists files that are received after this particular "incal" time (currently it lists but the intention is to implement an incremental download).  We initially thought doing periodic noop would keep the client alive by sending dummy commands to the server. But we found this probelm still occurs.
> public void ListIncrementalFiles (String directory, Calendar incal)
> 	{
> 		try 
> 		{
> 			FTPFile[] files = listFiles(directory);
> 			
> 			int total_files = 0;
> 					
> 			System.out.println("FileName\tFileSize\t\tDate Time Stamp");
> 			for (int i = 0; i < files.length; i++)
> 			{
> 												
> 					FTPFile file = files[i];
> 					if ((i % 1000) == 0)
> 					{
> 						System.out.println("noop occurred at this instance---> " + i + "\n");
> 						noop();
> 					}
> 							
> 					if (file.isFile())
> 					{
> 						Calendar c = file.getTimestamp();
> 												
> 						if (c.after(incal) || c.equals(incal))
> 						{
> 								
> 								System.out.println(file.getName() + "\t" + file.getSize() + "\t\t" + c.getTime());
> 								total_files++;
> 						}
> 					 }
>     				
> 			}
> 			System.out.println("Total Files in the listing ----> " + total_files);
> 		} 
> 		catch (Exception e)
> 		{
> 			System.out.println("FTP Getfiles exception occured");
> 			e.printStackTrace();
> 		}
>     
> 	}
> ---------------------------------------------------------------------------------------------------------------------------------Truss output begins--------------------------------------------------------
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /9:	lwp_park(0x00000000, 0)		(sleeping...)
> /3:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /8:	lwp_cond_wait(0x0014A908, 0x0014A8F0, 0x00000000, 0) (sleeping...)
> /10:	lwp_cond_wait(0x000366E8, 0x000366D0, 0x00000000, 0) (sleeping...)
> /1:	pollsys(0x00000000, 0, 0xFFBFEB60, 0x00000000) (sleeping...)
> /11:	lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /4:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /12:	lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /2:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /7:	lwp_cond_wait(0x00149F98, 0x00149F80, 0x00000000, 0) (sleeping...)
> /5:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /13:	lwp_cond_wait(0x00034958, 0x00034940, 0x00000000, 0) (sleeping...)
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> ------------------------------------------------------------------------------------------Truss output ends-------------------------------------------------------------------------------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.