You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2016/12/12 18:18:58 UTC

[jira] [Commented] (PARQUET-796) Delta Encoding is not used when dictionary enabled

    [ https://issues.apache.org/jira/browse/PARQUET-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742672#comment-15742672 ] 

Ryan Blue commented on PARQUET-796:
-----------------------------------

Dictionary encoding usually produces better results than delta encoding. But, the dictionary fall-back is based on what the plain encoding would do, so it is biased toward dictionary encoding. Do you think that the data would be smaller with delta rather than dictionary?

> Delta Encoding is not used when dictionary enabled
> --------------------------------------------------
>
>                 Key: PARQUET-796
>                 URL: https://issues.apache.org/jira/browse/PARQUET-796
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.9.0
>            Reporter: Jakub Liska
>            Priority: Critical
>             Fix For: 1.9.1
>
>
> Current code doesn't enable using both Delta Encoding and Dictionary Encoding. If I instantiate ParquetWriter like this : 
> {code}
> val writer = new ParquetWriter[Group](outFile, new GroupWriteSupport, codec, blockSize, pageSize, dictPageSize, enableDictionary = true, true, ParquetProperties.WriterVersion.PARQUET_2_0, configuration)
> {code}
> Then this piece of code : 
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultValuesWriterFactory.java#L78-L86
> Causes that DictionaryValuesWriter is used instead of the inferred DeltaLongEncodingWriter. 
> The original issue is here : https://github.com/apache/parquet-mr/pull/154#issuecomment-266489768



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)