You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "benj (JIRA)" <ji...@apache.org> on 2019/04/26 09:45:00 UTC

[jira] [Created] (DRILL-7219) Ignore hidden file problems

benj created DRILL-7219:
---------------------------

             Summary: Ignore hidden file problems
                 Key: DRILL-7219
                 URL: https://issues.apache.org/jira/browse/DRILL-7219
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - JSON, Storage - Parquet, Storage - Text &amp; CSV
    Affects Versions: 1.15.0
            Reporter: benj


Drill seems to use different filtering rules for files depending on the type.
 * *Parquet*: filtering hidden file (starting with ".") +whether+ we request the directory or the files with *
{code:java}
/* DirPqt
   |--sub1.pqt
   |--sub2.pqt
   |--.sub3.pqt
*/
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt`);
=> 2
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/*`);
=> 2
/* Its possible to request the hidden file */
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirPqt/.*`);
=> 1
/* But don't know how to request visible and hidden simultaneously (except to do an union) */
{code}

 * *CSV, json*: filtering hidden file (starting with ".") +depends+ if the request is on directory or files
{code:java}
/* DirCSVH
   |--sub1.csvh
   |--sub2.csvh
   |--.sub3.csvh
*/
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH`);
=> 2
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/*`);
=> 3
/* Like for Parquet, its possible to request the hidden file*/
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/.*`);
=>1
/* It's also possible to request only visible */
SELECT count(*) FROM (SELECT DISTINCT filename FROM ....`DirCSVH/[^.]*`);
=>2
/* But don't know how to request visible and hidden simultaneously (except to do an union)*/
{code}

Some issue are about the problematic of hidden files, example : DRILL-2424
But don't found any precision of this filtering in the documentation. I found that hidden file start with "." or "_" but maybe there are other case ?  

It's a little bit strange to not have the same filtering rules depending of the type of the file.
 It's not practical to not have the possibility to simply say if we want or not hidden file. For example with a :
{code:java}
SELECT * FROM ....`MyDir/[.]?*`;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)