You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mina.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/10/27 09:24:00 UTC

[jira] [Work logged] (SSHD-1217) Slow performance listing huge number of files on Apache SSHD server

     [ https://issues.apache.org/jira/browse/SSHD-1217?focusedWorklogId=670583&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-670583 ]

ASF GitHub Bot logged work on SSHD-1217:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Oct/21 09:23
            Start Date: 27/Oct/21 09:23
    Worklog Time Spent: 10m 
      Work Description: tomaswolf opened a new pull request #206:
URL: https://github.com/apache/mina-sshd/pull/206


   An Apache MINA sshd SFTP server configured to use an SftpFileSystem
   pointing to yet another SFTP server would serve directory listings
   only very slowly.
   
   This was caused by the SFTP server implementation using Java FileSystem
   abstractions itself for listing directories. Using java.nio.file.Files,
   it ended up doing essentially the following when receiving a
   SSH_FXP_READDIR command:
   
   ```
   try(DirectoryStream<Path> list = Files.newDirectoryStream(dir)) {
     Map<Path, BasicFileAttributes> toSend = new HashMap<>();
     list.iterator().forEach(p ->
       toSend.put(p, Files.readAttributes(p, BasicFileAttributes.class))
     );
     replyToClient(SSH_FXP_NAME, toSend);
   };
   ```
   (Not literally. This omits a lot of special handling for empty
   directories, not sending too large messages back to the client, and
   so on. But ultimately it boiled down to the above.)
   
   That is fine when the server-side file system from which files are
   served is local on the server. But when that file system itself is
   a remote file system, `newDirectoryStream` is a remote call sending
   SSH_FXP_OPENDIR and SSH_FXP_READDIR to the upstream server, and
   additionally each of these `readAttributes()` calls is yet another
   remote call sending SSH_FXP_LSTAT. This slows down getting the
   directory listing to the client tremendously.
   
   The most annoying part of all this is that the SSH_FXP_READDIR to the
   upstream SFTP server in `newDirectoryStream` _already returned all the
   attributes_, but this is lost because the Java NIO File abstractions
   have no real support for getting a directory listing including
   attributes in one fell swoop. (No, using a FileVisitor and
   Files.walkFileTree with depth 1 wouldn't help.)
   
   So detect this case in the server's SftpSubsystem, and bypass the Java
   FileSystem abstraction if the file system is itself an SftpFileSystem.
   Simply issue SSH_FXP_READDIR requests and forward the whole reply, which
   includes file names and attributes, directly to the client. For reading
   a directory containing 2000 files, this eliminates 10040 SSH_FXP_LSTAT
   calls; only the directory itself is stat'ed.
   
   A corollary to this is that clients in general should avoid listing
   directories on an SftpFileSystem via java.nio.file.Files _if they also
   need the file attributes_. Instead do
   
   ```
   SftpFileSystem fs = ...;
   try (SftpClient client = fs.getClient();
        CloseableHandle dir = client.openDir(remDirPath)) {
     for (SftpClient.DirEntry entry : client.listDir(dir)) {
       Path path = dir.getFile().resolve(entry.getFilename());
       SftpClient.Attributes = entry.getAttributes();
       // Do whatever you need with it
     }
   }
   ```
   
   Implementation note: another idea is to cache the attributes read in
   SSH_FXP_READDIR in `newDirectoryStream` on the Path objects returned,
   and make `readAttributes` return these cached attributes while the
   stream is not closed yet. Technically, this is doable; it'd be a hack
   similar to what Java's own UnixFileSystem does (and FileVisitor takes
   advantage of it). However, it would break some edge cases, like a
   client writing to one of the files during the directory stream
   iteration, and then reading the attributes again and expecting the
   new attributes but still getting the cached ones.
   
   Side note: the SFTP protocol has no command to get _only_ a listing
   of names. SSH_FXP_READDIR _always_ returns both names and attributes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@mina.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 670583)
    Remaining Estimate: 0h
            Time Spent: 10m

> Slow performance listing huge number of files on Apache SSHD server
> -------------------------------------------------------------------
>
>                 Key: SSHD-1217
>                 URL: https://issues.apache.org/jira/browse/SSHD-1217
>             Project: MINA SSHD
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Roberto Deandrea
>            Priority: Minor
>         Attachments: trace.ssh-frontend-sftplist.finest.log.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi Thomas,
> I noted slow performance listing files on the front-end Apache SSHD server in the same scenario as https://issues.apache.org/jira/browse/SSHD-1215
> The front-end Apache SSHD server is configured with a Filesystem built upon SFTPFileSystemProvider to proxy files to an Apache SSHD back-end server.
>  
> In the /inbox folder of the Apache SSHD backend server I have 2000 files.
> The client sftp ls  commands take 2 secs on the backend Apache SSHD server, instead it takes about 48 secs on the front-end Apache SSHD server.
> For greater number of files in the /inbox folder times are getting worse.
>  
> I have full traces of  sftp list commands to front-end Apache SSHD server that is attached to this jira.[^trace.frontend.sshd.log.zip]
> I looked through the traces on the front-end server and it seems to me that for every files in the folder the sftp client on the front-end server creates a SSH_MSG_CHANNEL_DATA generating tcp traffic that slow down the performance of the list command.
> Obviously this does not happen when a sftp client connects directly to the backend Apache SSHD server.
> Can you take a look at traces on the front-end Apache SSHD server   ?
> Do you think it's possbile change something to improve performance of list files in this situation ?
>  
> Thanks in advance
>  
> Kind Regards
> Roberto
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@mina.apache.org
For additional commands, e-mail: dev-help@mina.apache.org