You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2017/01/04 01:47:58 UTC

[jira] [Commented] (PHOENIX-3560) Aggregate query performance is worse with encoded columns for schema with large number of columns

    [ https://issues.apache.org/jira/browse/PHOENIX-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796855#comment-15796855 ] 

James Taylor commented on PHOENIX-3560:
---------------------------------------

This is a tricky one - the scan will be forced to load the KeyValue with all the data instead of the one single empty key value. You could potentially fake it out by adding another column family (as the first one) with a dummy column.

> Aggregate query performance is worse with encoded columns for schema with large number of columns
> -------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3560
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3560
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.10.0
>
>
> Schema with 5K columns
> {noformat}
> create table (k1 integer, k2 integer, c1 varchar ... c5000 varchar CONSTRAINT PK PRIMARY KEY (K1, K2)) 
> VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=true
> {noformat}
> In this test, there are no null columns and each column contains 200 chars i.e. 1MB of data per row.
> Count * aggregation is about 5X slower with encoded columns when compared to table non-encoded columns using the same schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)