You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Mujtaba Chohan (JIRA)" <ji...@apache.org> on 2017/03/07 22:29:38 UTC

[jira] [Commented] (PHOENIX-3582) No significant space saving with immutable encoded column with large number of dense columns

    [ https://issues.apache.org/jira/browse/PHOENIX-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900305#comment-15900305 ] 

Mujtaba Chohan commented on PHOENIX-3582:
-----------------------------------------

Missed reading your comment earlier [~ankit@apache.org]. As far as I remember for test#1, it was in ballpark of 1GB standard vs 4GB encoded. For test #2 2.5GB standard vs 2GB encoded with 5K dense column * 20K rows. Applying Snappy reduced size for encoded table significantly and made the size disparity less obvious but it does still remain as size of non-encoded table gets reduced by compression as well although to a lesser degree.

> No significant space saving with immutable encoded column with large number of dense columns
> --------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3582
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3582
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>
> Tested with 2 schemas both with 5K varchar columns. In test #1 columns were named as column_1 ... column5000 whereas in test #2 columns were 10 byte random alphanumeric. Each columns is filled 15 random bytes and all column have values.
> For test #1, Immutable encoded column uses ~4X *more* space than non-encoded column. Fast Diff encoding really shines when column names are highly compressible (column_1 ... column_5000)
> For test #2, For worst case where column names are not compressible since they are random 10 byte alpha numeric, immutable encoded column uses 25% less space.  
> Data generation class is attached to https://issues.apache.org/jira/browse/PHOENIX-3560. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)