You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vova Vysotskyi (Jira)" <ji...@apache.org> on 2020/02/07 10:13:00 UTC
[jira] [Commented] (DRILL-7565) ANALYZE TABLE ... REFRESH METADATA
does not work for empty Parquet files
[ https://issues.apache.org/jira/browse/DRILL-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032274#comment-17032274 ]
Vova Vysotskyi commented on DRILL-7565:
---------------------------------------
The initial idea to fix this issue was to enhance Metastore aggregation to handle empty input in a similar way as it is done for handling `count(*)` without `group by` clause, but such approach cannot be used because we will lose information about file location, row group indexes, etc. Enforce using `ConvertMetadataAggregateToDirectScanRule` is also wouldn't help in a long-term perspective since it cannot be applied for other formats, which will be supported for Metastore.
> ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files
> ------------------------------------------------------------------------
>
> Key: DRILL-7565
> URL: https://issues.apache.org/jira/browse/DRILL-7565
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.17.0
> Reporter: Bohdan Kazydub
> Assignee: Vova Vysotskyi
> Priority: Major
> Fix For: 1.18.0
>
>
> The following query does not create metadata for empty Parquet table:
> {code:java}
> @Test
> public void testAnalyzeEmptyParquetTable() throws Exception {
> dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
> String tableName = "parquet/empty/simple/empty_simple.parquet";
> try {
> client.alterSession(ExecConstants.METASTORE_ENABLED, true);
> testBuilder()
> .sqlQuery("ANALYZE TABLE dfs.`%s` REFRESH METADATA", tableName)
> .unOrdered()
> .baselineColumns("ok", "summary")
> .baselineValues(true, String.format("Collected / refreshed metadata for table [dfs.default.%s]", tableName))
> .go();
> } finally {
> run("analyze table dfs.`%s` drop metadata if exists", tableName);
> client.resetSession(ExecConstants.METASTORE_ENABLED);
> }
> }
> {code}
> but yields
> {code:java}
> java.lang.AssertionError: Different number of records returned
> Expected :1
> Actual :0
> <Click to see difference>
> at org.apache.drill.test.DrillTestWrapper.compareResults(DrillTestWrapper.java:862)
> at org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:567)
> at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:171)
> at org.apache.drill.test.TestBuilder.go(TestBuilder.java:145)
> at org.apache.drill.exec.store.parquet.TestEmptyParquet.testSelectWithDisabledMetastore(TestEmptyParquet.java:430)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> When changing expected result set to empty ({{TestBuilder#expectsEmptyResultSet()}}), {{SHOW TABLES}} command after {{ANALYZE TABLE ...}} does not show any table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)