You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/04/10 20:50:12 UTC

[jira] [Updated] (DRILL-1906) Parquet reader error when reading a subdirectory

     [ https://issues.apache.org/jira/browse/DRILL-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Phillips updated DRILL-1906:
-----------------------------------
    Fix Version/s:     (was: 0.9.0)
                   1.0.0

> Parquet reader error when reading a subdirectory
> ------------------------------------------------
>
>                 Key: DRILL-1906
>                 URL: https://issues.apache.org/jira/browse/DRILL-1906
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Aman Sinha
>            Assignee: Steven Phillips
>             Fix For: 1.0.0
>
>
> I am not sure if this is a regression but on current master branch, Drill is unable to read subdirectories if there are parquet files in the parent directory and subdirectory.  It's trying to read the footer for the subdirectory itself instead of recursing below.   JSON works fine.  
> For example, here's my directory structure: 
> {code}
>  ls -lR /tmp/foo1
> -rw-r--r--  1 asinha  wheel  132 Dec 20 11:10 0_0_0.parquet
> drwxr-xr-x  3 asinha  wheel  102 Dec 20 09:54 foo2
> /tmp/foo1/foo2:
> -rw-r--r--  1 asinha  wheel  132 Dec 16 16:14 0_0_0.parquet
> {code}
> Here's the failure and stack trace: 
> {code}
> 0: jdbc:drill:zk=local> select * from foo1;
> Query failed: Query failed: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillTableRule, args [rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs, tmp, foo1])]
> <skip>
> Caused by: java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true; modifica
> tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx; isSymlink=false}
>         at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)