You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2014/08/12 19:21:12 UTC

[jira] [Resolved] (SPARK-2700) Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile

     [ https://issues.apache.org/jira/browse/SPARK-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust resolved SPARK-2700.
-------------------------------------

    Resolution: Fixed

> Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2700
>                 URL: https://issues.apache.org/jira/browse/SPARK-2700
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.0.1
>            Reporter: Teng Qiu
>             Fix For: 1.1.0
>
>
> when creating a table in impala, a hidden folder .impala_insert_staging will be created in the folder of table.
> if we want to load such a table using Spark SQL API sqlContext.parquetFile, this hidden folder makes trouble, spark try to get metadata from this folder, you will see the exception:
> {code:borderStyle=solid}
> Caused by: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging; isDirectory=true; modification_time=1406333729252; access_time=0; owner=hdfs; group=hdfs; permission=rwxr-xr-x; isSymlink=false}
> ...
> ...
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging
> {code}
> and impala side do not think this is their problem: https://issues.cloudera.org/browse/IMPALA-837 (IMPALA-837 Delete .impala_insert_staging directory after INSERT)
> so maybe we should filter out these hidden folder/file by reading parquet tables



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org