You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Willi Raschkowski (Jira)" <ji...@apache.org> on 2022/02/12 15:01:00 UTC

[jira] [Created] (PARQUET-2120) parquet-cli dictionary fails on pages without dictionary encoding

Willi Raschkowski created PARQUET-2120:
------------------------------------------

             Summary: parquet-cli dictionary fails on pages without dictionary encoding
                 Key: PARQUET-2120
                 URL: https://issues.apache.org/jira/browse/PARQUET-2120
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cli
    Affects Versions: 1.12.2
            Reporter: Willi Raschkowski


parquet-cli's {{dictionary}} command fails with an NPE if a page does not have dictionary encoding:

{code}
$ parquet dictionary --column col a-b-c.snappy.parquet                
Unknown error
java.lang.NullPointerException: Cannot invoke "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is null
	at org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
	at org.apache.parquet.cli.Main.run(Main.java:155)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.parquet.cli.Main.main(Main.java:185)

$ parquet meta a-b-c.snappy.parquet      
...
Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B
--------------------------------------------------------------------------------
     type      encodings count     avg size   nulls   min / max
col  BINARY    S   _     1         46.00 B    0       "a" / "a"

Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B
--------------------------------------------------------------------------------
     type      encodings count     avg size   nulls   min / max
col  BINARY    S _ R     200       0.34 B     0       "b" / "c"
{code}
(Note the missing {{R}} / dictionary encoding on that first page.)

The problem is that [this line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76] assumes {{readDictionaryPage}} always returns a page and doesn't handle when it does not, i.e. when it returns {{null}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)