You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2017/06/01 22:39:04 UTC

[jira] [Updated] (PHOENIX-3905) Allow dynamic filtered join queries in UPSERT SELECT to be distributed across cluster

     [ https://issues.apache.org/jira/browse/PHOENIX-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Taylor updated PHOENIX-3905:
----------------------------------
    Description: 
Joins on the leading part of the primary key end up doing batches of point queries (as opposed to a broadcast hash join), and thus could be distributed across the cluster to improve performance when used in an UPSERT SELECT. The explain plan of these indicate that a dynamic filter will be performed like this:
{code}
DYNAMIC SERVER FILTER BY (DML.PK1 DML.PK2, DML.PK3) 
IN ((COM.PK1, COM.PK2, COM.PK3))
{code}

Currently, for these types of UPSERT SELECT queries, the selected data will flow back to the client and then back out to the appropriate server. It'll still be parallelized, but only on a single client as opposed to across multiple region servers in the cluster. The benefit would depend on how many regions servers would be involved in fetching the data for the select part of the query.

  was:
Joins on the leading part of the primary key end up doing batches of point queries (as opposed to a broadcast hash join), and thus could be distributed across the cluster to improve performance when used in an UPSERT SELECT. The explain plan of these indicate that a dynamic filter will be performed like this:
{code}
DYNAMIC SERVER FILTER BY (DML.PK1 DML.PK2, DML.PK3) 
IN ((COM.PK1, COM.PK2, COM.PK3))
{code}



> Allow dynamic filtered join queries in UPSERT SELECT to be distributed across cluster
> -------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3905
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3905
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>
> Joins on the leading part of the primary key end up doing batches of point queries (as opposed to a broadcast hash join), and thus could be distributed across the cluster to improve performance when used in an UPSERT SELECT. The explain plan of these indicate that a dynamic filter will be performed like this:
> {code}
> DYNAMIC SERVER FILTER BY (DML.PK1 DML.PK2, DML.PK3) 
> IN ((COM.PK1, COM.PK2, COM.PK3))
> {code}
> Currently, for these types of UPSERT SELECT queries, the selected data will flow back to the client and then back out to the appropriate server. It'll still be parallelized, but only on a single client as opposed to across multiple region servers in the cluster. The benefit would depend on how many regions servers would be involved in fetching the data for the select part of the query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)