You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2016/10/28 08:23:58 UTC

[jira] [Created] (DRILL-4978) Parquet metadata cache on S3 is always renewed

Uwe L. Korn created DRILL-4978:
----------------------------------

             Summary: Parquet metadata cache on S3 is always renewed
                 Key: DRILL-4978
                 URL: https://issues.apache.org/jira/browse/DRILL-4978
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.8.0
         Environment: Hadoop s3a storage
            Reporter: Uwe L. Korn


As dictionary modification times are not tracked by S3 (see https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories ) the Parquet metadata is always renewed on query planning.

This could either be tuned by:
 * for the case of s3a, check the modification times of all Parquet files in this directory
 * deactivate the metadata cache for s3a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)