You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2018/10/05 23:44:00 UTC

[jira] [Resolved] (SPARK-25635) Support selective direct encoding in native ORC write

     [ https://issues.apache.org/jira/browse/SPARK-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiao Li resolved SPARK-25635.
-----------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

> Support selective direct encoding in native ORC write
> -----------------------------------------------------
>
>                 Key: SPARK-25635
>                 URL: https://issues.apache.org/jira/browse/SPARK-25635
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold` is applied for all columns. This is a big huddle to enable dictionary encoding.
> From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively in a column-wise manner. This issue aims to add that feature by upgrading ORC from 1.5.2 to 1.5.3.
> The followings are the patches in ORC 1.5.3 and this feature is the only one related to Spark directly.
> {code}
> ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte data (gopalv)
> ORC-403: [C++] Add checks to avoid invalid offsets in InputStream
> ORC-405. Remove calcite as a dependency from the benchmarks.
> ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places.
> ORC-383: Parallel builds fails with ConcurrentModificationException
> ORC-382: Apache rat exclusions + add rat check to travis
> ORC-401: Fix incorrect quoting in specification.
> ORC-385. Change RecordReader to extend Closeable.
> ORC-384: [C++] fix memory leak when loading non-ORC files
> ORC-391: [c++] parseType does not accept underscore in the field name
> ORC-397. Allow selective disabling of dictionary encoding. Original patch was by Mithun Radhakrishnan.
> ORC-389: Add ability to not decode Acid metadata columns
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org