You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2017/04/26 06:54:04 UTC

[jira] [Created] (PARQUET-966) Store `dictionary entries` of parquet columns that will be used for joins

Ruslan Dautkhanov created PARQUET-966:
-----------------------------------------

             Summary: Store `dictionary entries` of parquet columns that will be used for joins
                 Key: PARQUET-966
                 URL: https://issues.apache.org/jira/browse/PARQUET-966
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-format
    Affects Versions: format-2.3.1, 1.8.0
            Reporter: Ruslan Dautkhanov


It would be great if Parquet would store `dictionary entries` for columns marked to be used for joins. 

When a column is used for a join (it could be a [surrogate key|https://en.wikipedia.org/wiki/Surrogate_key] or a [natural key|https://en.wikipedia.org/wiki/Natural_key]) - the value of a cloumn used for join itself is actually not so important. 

So we could join directly on `dictionary entries` instead of values 
and save CPU cycles. (no need to decompress etc)

Inspired by [Oracle In-memory columnar storage improvements in 12.2|https://blogs.oracle.com/In-Memory/entry/what_s_new_in_12]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)