You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2018/11/07 17:50:00 UTC

[jira] [Created] (IMPALA-7827) Investigate increasing disk utilization by overlapping file open with reads

Joe McDonnell created IMPALA-7827:
-------------------------------------

             Summary: Investigate increasing disk utilization by overlapping file open with reads
                 Key: IMPALA-7827
                 URL: https://issues.apache.org/jira/browse/IMPALA-7827
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 3.2.0
            Reporter: Joe McDonnell


Disk IO threads are responsible for doing both the HDFS file open and the reads for ScanRanges. Most HDFS file opens are served from the file handle cache. However, in case of a cache miss, the Disk IO thread is tied up waiting on a roundtrip to the NameNode. Depending on the number of Disk IO threads and the speed of the NameNode, all of the Disk IO threads could be blocked waiting on HDFS file open calls, even if there are ScanRanges that have file handles available in the cache. In particular, for spinning disks, there is a single Disk IO thread per disk. If this thread gets tied up in an open call, the disk will go idle.

It might make sense for the open call to be serviced by a separate thread pool. The ScanRange would go through a separate state transition that opens the file handle. The Disk IO thread can process ScanRanges that already have an open file handle (cached or otherwise) while the open call is in progress.

This is complicated by the fact that file handles can't be simultaneously used by multiple threads. In order to do the state transition properly, it needs to be clear whether a new file handle is necessary. Keeping a file handle cache at the RequestContext level and using preads (See IMPALA-6403) might make this clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)