You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/08/04 21:50:00 UTC

[jira] [Resolved] (IMPALA-11469) Ignore _spark_metadata folder in table location

     [ https://issues.apache.org/jira/browse/IMPALA-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-11469.
-------------------------------------
    Fix Version/s: Impala 4.2.0
       Resolution: Fixed

> Ignore _spark_metadata folder in table location
> -----------------------------------------------
>
>                 Key: IMPALA-11469
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11469
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Matthias Wies
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: Impala 4.2.0
>
>
> When spark streaming is used to write parquet files out to an external table a folder _spark_metadata is created within the directory of the table. Hive is capable of dealing with this directory, but Impala trips on it. 
> So REFRESH TABLE won't work as it sees a directory with data Impala cannot cope with. A SELECT will also not work as it trips on the _spark_metadata __ folder _._
> Issue was found in CDP 7.1.7 SP1 but I suspect it is in all versions
> Regards Matthias



--
This message was sent by Atlassian Jira
(v8.20.10#820010)