You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2014/12/20 21:16:14 UTC

[jira] [Created] (DRILL-1906) Parquet reader error when reading a subdirectory

Aman Sinha created DRILL-1906:
---------------------------------

             Summary: Parquet reader error when reading a subdirectory
                 Key: DRILL-1906
                 URL: https://issues.apache.org/jira/browse/DRILL-1906
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Aman Sinha


I am not sure if this is a regression but on current master branch, Drill is unable to read subdirectories if there are parquet files in the parent directory and subdirectory.  It's trying to read the footer for the subdirectory itself instead of recursing below.   JSON works fine.  

For example, here's my directory structure: 

{code}
 ls -lR /tmp/foo1
-rw-r--r--  1 asinha  wheel  132 Dec 20 11:10 0_0_0.parquet
drwxr-xr-x  3 asinha  wheel  102 Dec 20 09:54 foo2

/tmp/foo1/foo2:
-rw-r--r--  1 asinha  wheel  132 Dec 16 16:14 0_0_0.parquet
{code}

Here's the failure and stack trace: 
{code}
0: jdbc:drill:zk=local> select * from foo1;
Query failed: Query failed: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillTableRule, args [rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, tmp, foo1])]

<skip>
Caused by: java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true; modifica
tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx; isSymlink=false}
        at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
        at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
        at parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)