You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/11/04 23:00:00 UTC

[jira] [Commented] (IMPALA-11469) Ignore _spark_metadata folder in table location

    [ https://issues.apache.org/jira/browse/IMPALA-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629222#comment-17629222 ] 

ASF subversion and git services commented on IMPALA-11469:
----------------------------------------------------------

Commit 645df57af8db40bff19407b7006b469bc3298737 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=645df57af ]

IMPALA-11699: Fix NPE in FE tests thrown from static code of FileSystemUtil

FileSystemUtil has static code to get configuration from
BackendConfig.INSTANCE which could be null in some FE tests. In commit
c1610a163 of IMPALA-11469, we fixed the issue by modifying the failed FE
tests to extend FrontendTestBase which can make sure
BackendConfig.INSTANCE is initialized. However, AcidUtilsTest and
TestCaseLoaderTest also have the issue. But they are missed since the
issue depends on the test order. If a test that inits
BackendConfig.INSTANCE runs first, the following tests won't suffer this
issue.

To avoid new tests hitting this issue again, this patch inits
BackendConfig.INSTANCE lazily in the static code of FileSystemUtil.
Also adds a warning mentioning this should only happen in tests. Note
that in impalad and catalogd, BackendConfig.INSTANCE is initialized in
constructors of JniFrontend and JniCatalog.

The changes of c1610a163 is redundant after this patch so they are
reverted.

Tests:
 - Run FE tests one by one so each test won't depend on any previous
   env. Only found AcidUtilsTest and TestCaseLoaderTest have the issue.
   Verified this patch fixes the issue in these two tests.

Change-Id: I5c056791406cd4535a7e43889dbb73d153b06f0a
Reviewed-on: http://gerrit.cloudera.org:8080/19195
Reviewed-by: Daniel Becker <da...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Michael Smith <mi...@cloudera.com>
Reviewed-by: Joe McDonnell <jo...@cloudera.com>


> Ignore _spark_metadata folder in table location
> -----------------------------------------------
>
>                 Key: IMPALA-11469
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11469
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Matthias Wies
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: Impala 4.2.0
>
>
> When spark streaming is used to write parquet files out to an external table a folder _spark_metadata is created within the directory of the table. Hive is capable of dealing with this directory, but Impala trips on it. 
> So REFRESH TABLE won't work as it sees a directory with data Impala cannot cope with. A SELECT will also not work as it trips on the _spark_metadata __ folder _._
> Issue was found in CDP 7.1.7 SP1 but I suspect it is in all versions
> Regards Matthias



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org