You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2018/01/05 23:00:05 UTC
[jira] [Resolved] (IMPALA-6364) Lock contention in FileHandleCache results in >2x slowdown for remote HDFS reads

     [ https://issues.apache.org/jira/browse/IMPALA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe McDonnell resolved IMPALA-6364.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

commit d1a0510bfe0a168256d37904aca3a30994306454
Author: Joe McDonnell <jo...@cloudera.com>
Date:   Wed Jan 3 19:02:19 2018 -0800

    IMPALA-6364: Bypass file handle cache for ineligible files
    
    Currently, all HdfsFileHandles are owned and constructed
    by the file handle cache. When the file handle cache
    is disabled or the file handle is not eligible for
    caching, the HdfsFileHandle is stored exclusively in
    ScanRange::exclusive_hdfs_fh_, but the HdfsFileHandle still
    comes from the file handle cache. It is created via a call to
    DiskIoMgr::GetCachedHdfsFileHandle() with 'require_new_handle'
    set to true and destroyed via
    DiskIoMgr::ReleaseCachedHdfsFileHandle() with 'destroy_handle'
    set to true.
    
    Recent testing has revealed that the lock on the file handle
    cache is a bottleneck for workloads with many small remote
    files. There is no benefit to storing these exclusive file
    handles in the file handle cache, as they do not participate
    in the caching.
    
    This change introduces DiskIoMgr::GetExclusiveHdfsFileHandle()
    and DiskIoMgr::ReleaseExclusiveHdfsFileHandle(). These are
    equivalent to the Get/ReleaseCachedHdfsFileHandle() calls, except
    they bypass the file handle cache and create/destroy the
    file handle directly. ScanRange::Open()/Close(), which
    populates and frees ScanRange::exclusive_hdfs_fh_, now uses
    these new calls rather than accessing the file handle cache.
    This avoids the locking entirely, solving the bottleneck.
    
    To draw a distinction between the two codepaths, HdfsFileHandle
    is now an abstract class with two subclasses:
     - CachedHdfsFileHandles cover all handles that live in file handle
       cache. Get/ReleaseCachedHdfsFileHandle() use this subclass.
     - ExclusiveHdfsFileHandles cover all cases where a file handle
       does not come from the cache. The new
       Get/ReleaseExclusiveHdfsFileHandle() use this subclass.
    
    Separately, testing revealed that increasing the number of
    partitions for the file handle cache also fixes the contention
    problem. This changes the file handle cache to make the number
    of partitions configurable via startup parameter
    num_file_handle_cache_partitions. This allows mitigation of
    future bottlenecks without a patch.
    
    Change-Id: I4ab52b0884a909a4faeb6692f32d45878ea2838f
    Reviewed-on: http://gerrit.cloudera.org:8080/8945
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Lock contention in FileHandleCache results in >2x slowdown for remote HDFS reads
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-6364
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6364
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 2.10.0, Impala 2.11.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Joe McDonnell
>            Priority: Blocker
>             Fix For: Impala 2.12.0
>
>         Attachments: d2402_cdh5.12_profile.txt, d2402_cdh5.13_profile.txt, remote_hdfs_scan_pstack.txt
>
>
> IMPALA-4623 introduced a locking schema to the file handle cache which has 16 buckets, this  results in lock contention between IO threads which limits system throughput. 
> Most IO threads end-up in one of these stacks.
> {code}
> #0  0x0000000002085d47 in base::internal::SpinLockDelay(int volatile*, int, int) ()
> #1  0x0000000002085c29 in base::SpinLock::SlowLock() ()
> #2  0x00000000010fa76d in impala::io::FileHandleCache<16ul>::GetFileHandle(hdfs_internal* const&, std::string*, long, bool, bool*) ()
> #3  0x00000000010f6e22 in impala::io::DiskIoMgr::GetCachedHdfsFileHandle(hdfs_internal* const&, std::string*, long, impala::io::RequestContext*, bool) ()
> #4  0x00000000010fd514 in impala::io::ScanRange::Open(bool) ()
> #5  0x00000000010f691f in impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, impala::io::ScanRange*) ()
> #6  0x00000000010f6dc4 in impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
> #7  0x0000000000d13333 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
> #8  0x0000000000d13a74 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> #9  0x000000000128ea3a in thread_proxy ()
> #10 0x00007f49f2bbadc5 in start_thread () from /lib64/libpthread.so.0
> #11 0x00007f49f28e976d in clone () from /lib64/libc.so.6
> {code}
> {code}
> #0  0x0000000002085d47 in base::internal::SpinLockDelay(int volatile*, int, int) ()
> #1  0x0000000002085c29 in base::SpinLock::SlowLock() ()
> #2  0x00000000010f9929 in impala::io::FileHandleCache<16ul>::ReleaseFileHandle(std::string*, impala::io::HdfsFileHandle*, bool) ()
> #3  0x00000000010fe69e in impala::io::ScanRange::Close() ()
> #4  0x00000000010f6565 in impala::io::DiskIoMgr::HandleReadFinished(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, std::unique_ptr<impala::io::BufferDescriptor, std::default_delete<impala::io::BufferDescriptor> >) ()
> #5  0x00000000010f695b in impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, impala::io::ScanRange*) ()
> #6  0x00000000010f6dc4 in impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
> #7  0x0000000000d13333 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
> #8  0x0000000000d13a74 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> #9  0x000000000128ea3a in thread_proxy ()
> #10 0x00007f49f2bbadc5 in start_thread () from /lib64/libpthread.so.0
> #11 0x00007f49f28e976d in clone () from /lib64/libc.so.6
> {code}
> Increasing the number of partitions to 256 made the contention go away, a simple fix would be to make the number of partitions a startup flag and change it to 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)