You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Dinesh Garg (JIRA)" <ji...@apache.org> on 2019/06/27 05:46:00 UTC

[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories

     [ https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dinesh Garg updated IMPALA-8663:
--------------------------------
    Labels: impala-acid  (was: )

> FileMetadataLoader should skip listing files in hidden and tmp directories
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-8663
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8663
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>              Labels: impala-acid
>
> Currently, the file metadata loader recursively lists the table and partition directories to get the fileStatuses. For each filestatus we ignore the hidden files in {{FileSystemUtil.isValidDataFile}}(). However that is not sufficient. For instance, if Hive is inserting data into a table when the refresh is called, it is possible the staging directory is present within the table directory. This staging directory is a hidden directory of the naming {{.hive-staging_*}}. It is possible that this directory has files which are not hidden (starting from a . or _). Such files should be considered temporary files and should not be considered as valid data files.
>  
> Another instance where we see this happen is in transactional tables which has a {{.manifest}} which is located in a {{_tmp}} directory within the table directory. This file should also be skipped and not considered as a valid data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org