You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@trafodion.apache.org by "David Wayne Birdsall (JIRA)" <ji...@apache.org> on 2017/08/01 16:00:02 UTC

[jira] [Commented] (TRAFODION-2700) Query that selects only a single salt value gets parallel plan

    [ https://issues.apache.org/jira/browse/TRAFODION-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109154#comment-16109154 ] 

David Wayne Birdsall commented on TRAFODION-2700:
-------------------------------------------------

The example cited is a unique access; I hope we get a serial plan for that?

I suppose a more interesting example might be when the primary key is two or more columns, but we salt on only one column? (More generally a proper subset of the key columns? with a WHERE clause of equality predicates on that proper subset?)

> Query that selects only a single salt value gets parallel plan
> --------------------------------------------------------------
>
>                 Key: TRAFODION-2700
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2700
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: any
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>
> For some queries we saw parallel plans where the parallelism didn't really help, because the WHERE predicate selected only a single salt values. The overhead isn't huge, but it can add up when executing many such queries.
> Example:
> create table ts(a integer not null primary key, b char(2000)) salt using 4 partitions;
> explain  select count(*) from ts <<+ cardinality 1e7>> where a =1;
> The problem, I think, is in method SimpleFileScanOptimizer::scmComputeCostVectorsForHbase(), file core/sql/optimizer/ScmCostMethod.cpp. This computes separate degrees of parallelism for the region server side and the client side and scales the costs incurred on each side separately.
> However, if there are more ESPs (clients) than regions, some ESPs have nothing to do, limiting the parallelism. On the other hand, if there are more regions than ESPs, each ESP reads regions sequentially, so that limits the DoP on the region server side.
> Therefore, my suggested fix is to use the minimum of those two DoPs to compute the cost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)