You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "chenglei (JIRA)" <ji...@apache.org> on 2019/03/04 14:45:00 UTC
[jira] [Issue Comment Deleted] (PHOENIX-5148) Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan and ClientAggregatePlan

     [ https://issues.apache.org/jira/browse/PHOENIX-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chenglei updated PHOENIX-5148:
------------------------------
    Comment: was deleted

(was: some cases in existing IT tests could be optimized by this patch.)

> Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan and ClientAggregatePlan
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5148
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5148
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.1
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>             Fix For: 4.15.0
>
>         Attachments: PHOENIX-5148-4.x-HBase-1.4.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Given a table
> {code:java}
>   create table test ( 
>        pk1 varchar not null , 
>        pk2 varchar not null, 
>        pk3 varchar not null,
>         v1 varchar, 
>         v2 varchar, 
>         CONSTRAINT TEST_PK PRIMARY KEY ( 
>               pk1,
>               pk2,
>               pk3 ))
> {code}
> Consider following three cases:
> *1. OrderBy of ClientScanPlan*
> for sql:
> {code:java}
> select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3
> {code}
> Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out.
> *2. GroupBy of ClientAggregatePlan*
> for sql :
> {code:java}
> select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1
> {code}
> Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false.
> *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)*
> for sql:
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code}
> The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time.
>  Thus, both 1)
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b{code}
> and 2)
> {code:java}
> SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b{code}
> should avoid doing an extra order-by after the sort-merge-join operation.
>  
> All the above three cases are caused by the same problem that the {{OrderPreservingTracker}} relies solely on row keys for inferring alignment between the target {{Expression}} and the source sortedness.
> For following {{QueryCompiler.compileSingleQuery}}, because the inner query of above cases has {{OrderBy}}, so in line 534, local variable {{isInRowKeyOrder}} is false:
> {code:java}
> 518   protected QueryPlan compileSingleQuery(StatementContext context, SelectStatement select, List<Object> binds, boolean asSubquery, boolean allowPageFilter) throws SQLException{
> 519           SelectStatement innerSelect = select.getInnerSelectStatement();
> 520           if (innerSelect == null) {
> 521                return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, null, null, true);
> 522           }
> 523  
> 524           QueryPlan innerPlan = compileSubquery(innerSelect, false);
> 525           TupleProjector tupleProjector = new TupleProjector(innerPlan.getProjector());
> 526           innerPlan = new TupleProjectionPlan(innerPlan, tupleProjector, null);
> 527
> 528           // Replace the original resolver and table with those having compiled type info.
> 529           TableRef tableRef = context.getResolver().getTables().get(0);
> 530            ColumnResolver resolver = FromCompiler.getResolverForCompiledDerivedTable(statement.getConnection(), tableRef, innerPlan.getProjector());
> 531            context.setResolver(resolver);
> 532           tableRef = resolver.getTables().get(0);
> 533           context.setCurrentTable(tableRef);
> 534           boolean isInRowKeyOrder = innerPlan.getGroupBy() == GroupBy.EMPTY_GROUP_BY && innerPlan.getOrderBy() == OrderBy.EMPTY_ORDER_BY;
> 535
> 536           return compileSingleFlatQuery(context, select, binds, asSubquery, allowPageFilter, innerPlan, tupleProjector, isInRowKeyOrder);
> 537       }
> {code}
> Because the {{OrderPreservingTracker}} relies solely on row keys, so in following line 170 of {{OrderByCompiler.compile}} , when {{isInRowKeyOrder}} is false, the {{OrderPreservingTracker.isOrderPreserving()}} checking is skipped:
> {code:java}
> 169     // If we're ordering by the order returned by the scan, we don't need an order by
> 170      if (isInRowKeyOrder && tracker.isOrderPreserving()) {
> 171             if (tracker.isReverse()) {
> 172                  // Don't use reverse scan if:
> 173                  // 1) we're using a skip scan, as our skip scan doesn't support this yet.
> 174                  // 2) we have the FORWARD_SCAN hint set to choose to keep loading of column
> 175                  //    families on demand versus doing a reverse scan
> 176                  // REV_ROW_KEY_ORDER_BY scan would not take effect for a projected table, so don't return it for such table types.
> 177                if (context.getConnection().getQueryServices().getProps().getBoolean(QueryServices.USE_REVERSE_SCAN_ATTRIB, QueryServicesOptions.DEFAULT_USE_REVERSE_SCAN)
> 178                        && !context.getScanRanges().useSkipScanFilter()
> 179                        && context.getCurrentTable().getTable().getType() != PTableType.PROJECTED
> 180                        && context.getCurrentTable().getTable().getType() != PTableType.SUBQUERY
> 181                        && !statement.getHint().hasHint(Hint.FORWARD_SCAN)) {
> 182                    return OrderBy.REV_ROW_KEY_ORDER_BY;
> 183               }
> 184            } else {
> 185                 return OrderBy.FWD_ROW_KEY_ORDER_BY;
> 186           }
> 187        }
> {code}
>  
> What we can do to fix this problem are :
> - refactor the {{OrderPreservingTracker}}  to make it inferring alignment between the target {{Expression}} and the source sortedness based on {{OrderByExpression}} , not the row keys, so we can remove the variable {{isInRowKeyOrder}}.
> - add a new method {{getActualOutputOrderBys}} to the {{QueryPlan}} to get the output OrderBys of the query result , eg. for the above third {{SortMergeJoinPlan}} example ,{{getActualOutputOrderBys}} returns (T1.a, T1.b) and (T2.a, T2.b) .
> Besides above three cases, a implicit semantic optimization could be implemented also:
> for following sql , even the literal {{OrderBy}} of inner query is "order by t.pk1,t.pk2" .  By {{getActualOutputOrderBys}} of inner {{QueryPlan}}, we can infer that the actual {{OrderBy}} of inner query is "order by t.pk1,t.pk2,t.pk3" , so the outer OrderBy "order by pk1,pk2,pk3"  could be compiled out:
> {code}
> select pk1 from (select pk3,pk2,pk1 from test t where v1 = '6' order by t.pk1,t.pk2 limit 10) a where pk3 > '8' order by pk1,pk2,pk3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)