You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Gerald Sangudi (JIRA)" <ji...@apache.org> on 2018/09/13 22:36:00 UTC
[jira] [Commented] (PHOENIX-4757) composite key salt_buckets

    [ https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614129#comment-16614129 ] 

Gerald Sangudi commented on PHOENIX-4757:
-----------------------------------------

We have a similar requirement. From our discussion document:

*Background*

We make extensive use of multi-column rowkeys and [salting|https://phoenix.apache.org/salted.html] in our different apache phoenix deployments. We frequently perform group-by aggregations on these data along a specific dimension that would benefit from predictably partitioning the data along that dimension. 

 

*Proposal:*

We propose to add table metadata to allow schema designers to constrain salting to a subset of the rowkey, rather than the full rowkey as it is today. This will introduce a mechanism to partition data on a per-table basis along a single dimension without application changes or much change to the phoenix runtime logic. We expect this will result in substantially faster group-by’s along the salted dimension and negligible penalties elsewhere. This feature has also been proposed in PHOENIX-4757 where it was pointed out that partitioning and sorting data along different dimensions is a common pattern in other datastores as well.

Theoretically, it could cause hotspotting when querying along the salted dimension without the leading rowkey - that would be an anti-pattern.

 

*Usage Example*

*Current:*

*Schema:*

CREATE TABLE relationship (

id_1 BIGINT NOT NULL,

id_2 BIGINT NOT NULL,

other_key BIGINT NOT NULL,

val SMALLINT,

CONSTRAINT pk PRIMARY KEY (id_1, id_2, other_key)

)

SALT_BUCKETS=60;

 

*Query:*

Select id_2, sum(val)

From relationship

Where id_1 in (2,3)

Group by id_2

 

*Explain:*

0: jdbc:phoenix:> EXPLAIN Select id_2, sum(val) From relationship Where id_1 in (2,3) Group by id_2 ;

+-----------------------------------------------------------------------------------------+--------+

|                                          PLAN    | EST_BY |

+-----------------------------------------------------------------------------------------+--------+

| CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP [0,2] - [59,3]  | null |

|     SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]                                       | null |

| CLIENT MERGE SORT                                                                       | null |

+-----------------------------------------------------------------------------------------+--------+

3 rows selected (0.048 seconds)

 

In this case, although the group by is performed on both the client and regionserver, almost all of the actual grouping happens on the client because the id_2’s are randomly distributed across the regionservers. As a result, a lot of unnecessary data is serialized to the client and grouped serially there. This can become quite material with large resultsets.

 

*Proposed:*

*Schema:*

CREATE TABLE relationship (

id_1 BIGINT NOT NULL,

id_2 BIGINT NOT NULL,

other_key BIGINT NOT NULL,

val SMALLINT,

CONSTRAINT pk PRIMARY KEY (id_1, id_2, other_key),

SALT_BUCKETS=60,

SALT_COLUMN = id_2

);

 

*Query (unchanged):*

Select id_2, sum(val)

From relationship

Where id_1 in (2,3)

Group by id_2

 

*Explain (unchanged)*

 

Under the proposal, the data are merely partitioned so that all rows containing the same id_2 are on the same regionserver, the above query will perform almost all of the grouping in parallel on the regionservers. No special hint or changes to the query plan would be required to benefit. Tables would need to be re-salted to take advantage of the new functionality.

 

*Technical changes proposed to phoenix:*
 * Create a new piece of table-level metadata: SALT_COLUMN. SALT_COLUMN will instruct the salting logic to generate a salt-byte based only on the specified column. If unspecified, it will behave as it does today and default to salting the entire rowkey. This metadata may be specified only when the table is created and may not be modified. The specified column must be part of the rowkey.  
 * Modify all callers of [getSaltingByte|https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/SaltingUtil.java#L77](byte[] value, int offset, int length, int bucketNum) to consistently leverage the new metadata.
 * Tests
 * Docs





*Design points:*

*One salt column vs multiple salt columns:* Based on the existing signature for getSaltingByte, it seems simpler to only support a single SALT_COLUMN rather than multiple arbitrary SALT_COLUMNS. Known use-cases are completely supported by a single column.

*Syntax:*  PHOENIX-4757 suggests an alternate, less verbose syntax for defining the salt bucket. The SALT_COLUMN syntax is suggested for clarity and consistency with other Phoenix table options.

 

*Future Enhancements (not in scope)*

Different aspects of the query execution runtime could take advantage of new metadata and implied knowledge that the data are partitioned in a predictable manner. For example:
 * It could be that client side grouping is completely unnecessary in cases where the SALT_COLUMN is part of the group-by expression.
 * A query that contains a literal equality predicate for the SALT_COLUMN can be isolated to a single regionserver, rather than broadcast to all regionservers.
 * A client-side merge-sort-join based on the SALT_COLUMN could optimize organization of merges. 
 * Similarly, a server-side hash join could distribute only ‘necessary’ portions of the hash table to each regionserver.

If additional advantages of these types come for free, then that’s great but can be follow on enhancements from the initial commit.

> composite key salt_buckets
> --------------------------
>
>                 Key: PHOENIX-4757
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4757
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.11.0
>            Reporter: cmd
>            Priority: Major
>             Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='xxxx' group by event_type
>  2.select count(0) from us_population where user_id='xxxx' and event_type='0101'
>  3.select * from us_population where user_id='xxxx' and event_type='0101' and event_time>'20180101' and event_time<'20180201' order by event_time limit 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)