You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2017/04/26 06:54:04 UTC
[jira] [Created] (PARQUET-966) Store `dictionary entries` of
parquet columns that will be used for joins
Ruslan Dautkhanov created PARQUET-966:
-----------------------------------------
Summary: Store `dictionary entries` of parquet columns that will be used for joins
Key: PARQUET-966
URL: https://issues.apache.org/jira/browse/PARQUET-966
Project: Parquet
Issue Type: Improvement
Components: parquet-format
Affects Versions: format-2.3.1, 1.8.0
Reporter: Ruslan Dautkhanov
It would be great if Parquet would store `dictionary entries` for columns marked to be used for joins.
When a column is used for a join (it could be a [surrogate key|https://en.wikipedia.org/wiki/Surrogate_key] or a [natural key|https://en.wikipedia.org/wiki/Natural_key]) - the value of a cloumn used for join itself is actually not so important.
So we could join directly on `dictionary entries` instead of values
and save CPU cycles. (no need to decompress etc)
Inspired by [Oracle In-memory columnar storage improvements in 12.2|https://blogs.oracle.com/In-Memory/entry/what_s_new_in_12]
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)