You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2019/05/22 04:16:00 UTC

[jira] [Created] (IMPALA-8569) Periodically scrub deleted files from the file handle cache

Todd Lipcon created IMPALA-8569:
-----------------------------------

             Summary: Periodically scrub deleted files from the file handle cache
                 Key: IMPALA-8569
                 URL: https://issues.apache.org/jira/browse/IMPALA-8569
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Todd Lipcon


Currently, if you query a file, and then later delete that file (eg drop the partition or table), the file will still stay in the impalad's file handle cache. Because the file is open, the space can't be reclaimed on disk until the impalad restarts or churns through its cache enough to drop the handle.

Typically this isn't a big deal in practice, since most files don't get deleted shortly after being read, and the FH cache should cycle through after 6 hours by default. Additionally, fixing it would be a bit of a pain since we'd need to add HDFS and libhdfs hooks to get HDFS to tell us if the underlying short circuit FD is unlinked, which probably also means adding JNI code to let Java call to fstat() in order to check st_nlink. Given that, I'm not sure it's worth fixing, or if we should just consider a shorter default expiry on the FH cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)