You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Vova Vysotskyi (Jira)" <ji...@apache.org> on 2020/02/07 10:14:00 UTC

[jira] [Comment Edited] (DRILL-7565) ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files

    [ https://issues.apache.org/jira/browse/DRILL-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032274#comment-17032274 ] 

Vova Vysotskyi edited comment on DRILL-7565 at 2/7/20 10:13 AM:
----------------------------------------------------------------

The initial idea to fix this issue was to enhance Metastore aggregation to handle empty input in a similar way as it is done for handling {{count(*)}} without {{group by}} clause, but such approach cannot be used because we will lose information about file location, row group indexes, etc. Enforce using {{ConvertMetadataAggregateToDirectScanRule}} is also wouldn't help in a long-term perspective since it cannot be applied for other formats, which will be supported for Metastore.


was (Author: vvysotskyi):
The initial idea to fix this issue was to enhance Metastore aggregation to handle empty input in a similar way as it is done for handling `count(*)` without `group by` clause, but such approach cannot be used because we will lose information about file location, row group indexes, etc. Enforce using `ConvertMetadataAggregateToDirectScanRule` is also wouldn't help in a long-term perspective since it cannot be applied for other formats, which will be supported for Metastore.

> ANALYZE TABLE ... REFRESH METADATA does not work for empty Parquet files
> ------------------------------------------------------------------------
>
>                 Key: DRILL-7565
>                 URL: https://issues.apache.org/jira/browse/DRILL-7565
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: Bohdan Kazydub
>            Assignee: Vova Vysotskyi
>            Priority: Major
>             Fix For: 1.18.0
>
>
> The following query does not create metadata for empty Parquet table:
> {code:java}
> @Test
>   public void testAnalyzeEmptyParquetTable() throws Exception {
>     dirTestWatcher.copyResourceToRoot(Paths.get("parquet", "empty"));
>     String tableName = "parquet/empty/simple/empty_simple.parquet";
>     try {
>       client.alterSession(ExecConstants.METASTORE_ENABLED, true);
>       testBuilder()
>           .sqlQuery("ANALYZE TABLE dfs.`%s` REFRESH METADATA", tableName)
>           .unOrdered()
>           .baselineColumns("ok", "summary")
>           .baselineValues(true, String.format("Collected / refreshed metadata for table [dfs.default.%s]", tableName))
>           .go();
>     } finally {
>       run("analyze table dfs.`%s` drop metadata if exists", tableName);
>       client.resetSession(ExecConstants.METASTORE_ENABLED);
>     }
>   }
> {code}
> but yields
> {code:java}
> java.lang.AssertionError: Different number of records returned 
> Expected :1
> Actual   :0
> <Click to see difference>
> 	at org.apache.drill.test.DrillTestWrapper.compareResults(DrillTestWrapper.java:862)
> 	at org.apache.drill.test.DrillTestWrapper.compareUnorderedResults(DrillTestWrapper.java:567)
> 	at org.apache.drill.test.DrillTestWrapper.run(DrillTestWrapper.java:171)
> 	at org.apache.drill.test.TestBuilder.go(TestBuilder.java:145)
> 	at org.apache.drill.exec.store.parquet.TestEmptyParquet.testSelectWithDisabledMetastore(TestEmptyParquet.java:430)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> When changing expected result set to empty ({{TestBuilder#expectsEmptyResultSet()}}), {{SHOW TABLES}} command after {{ANALYZE TABLE ...}} does not show any table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)