You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Willi Raschkowski (Jira)" <ji...@apache.org> on 2022/02/12 15:01:00 UTC
[jira] [Created] (PARQUET-2120) parquet-cli dictionary fails on pages without dictionary encoding
Willi Raschkowski created PARQUET-2120:
------------------------------------------
Summary: parquet-cli dictionary fails on pages without dictionary encoding
Key: PARQUET-2120
URL: https://issues.apache.org/jira/browse/PARQUET-2120
Project: Parquet
Issue Type: Bug
Components: parquet-cli
Affects Versions: 1.12.2
Reporter: Willi Raschkowski
parquet-cli's {{dictionary}} command fails with an NPE if a page does not have dictionary encoding:
{code}
$ parquet dictionary --column col a-b-c.snappy.parquet
Unknown error
java.lang.NullPointerException: Cannot invoke "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is null
at org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
at org.apache.parquet.cli.Main.run(Main.java:155)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.parquet.cli.Main.main(Main.java:185)
$ parquet meta a-b-c.snappy.parquet
...
Row group 0: count: 1 46.00 B records start: 4 total: 46 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
col BINARY S _ 1 46.00 B 0 "a" / "a"
Row group 1: count: 200 0.34 B records start: 50 total: 69 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
col BINARY S _ R 200 0.34 B 0 "b" / "c"
{code}
(Note the missing {{R}} / dictionary encoding on that first page.)
The problem is that [this line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76] assumes {{readDictionaryPage}} always returns a page and doesn't handle when it does not, i.e. when it returns {{null}}.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)