You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2019/01/07 02:36:00 UTC
[jira] [Comment Edited] (KYLIN-3722) Error Limit Push Down in Join
Related Query
[ https://issues.apache.org/jira/browse/KYLIN-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735394#comment-16735394 ]
Shaofeng SHI edited comment on KYLIN-3722 at 1/7/19 2:35 AM:
-------------------------------------------------------------
# Add a switch "{{kylin.storage.limit-push-down-enabled=true"}} which allows turn off limit push down (default is enabled);
# Turn off limit push down automatically when multi OLAP context detected in one query (such as sub query);
# Add a test sql in integration test;
was (Author: shaofengshi):
# add a switch {{kylin.storage.limit-push-down-enable=true}} which could turn off limit push down
# turn off limit push down when multi olap context related
# add a new sql in IT
> Error Limit Push Down in Join Related Query
> -------------------------------------------
>
> Key: KYLIN-3722
> URL: https://issues.apache.org/jira/browse/KYLIN-3722
> Project: Kylin
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: all
> Reporter: XiaoXiang Yu
> Assignee: XiaoXiang Yu
> Priority: Major
> Labels: LimitPushDown
> Fix For: v2.6.0
>
> Attachments: 0001-storageLimitLevel-set-to-NO_LIMIT-when-the-query-has.patch, image-2018-12-16-17-06-16-341.png, image-2018-12-16-17-24-21-017.png, image-2018-12-16-17-38-13-454.png, image-2018-12-16-22-40-36-857.png, image-2018-12-16-22-44-32-875.png, image-2018-12-16-22-45-29-841.png, image-2018-12-16-22-46-43-168.png, image-2018-12-16-22-48-25-622.png, image-2018-12-16-22-54-25-718.png, image-2018-12-16-22-55-23-827.png, image-2018-12-16-22-56-27-258.png, image-2018-12-16-22-56-50-607.png, image-2018-12-16-22-58-04-486.png
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> +*Kylin limit-pushdown is sometimes cause data reduction.*+
> For example:
> {quote}select uid, sum(active_minutes) as am
> from useraction
> where item_id in (
> select distinct item_id
> from iteminfo
> where item_type in ('Video')
> ) and act_type != 'share'
> group by uid
> limit 10
> {quote}
> +*In hive, we got correct result(Five row).*+
> {quote}hive>
> > select uid, sum(active_minutes) as am
> > from useraction
> > where item_id in (
> > select distinct item_id
> > from iteminfo
> > where item_type in ('Video')
> > ) and act_type != 'share'
> > group by uid
> > limit 10;
> Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id application_1539833412107_0414)
> --------------------------------------------------------------------------------
> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> --------------------------------------------------------------------------------
> Map 1 .......... SUCCEEDED 1 1 0 0 0 0
> Map 3 .......... SUCCEEDED 1 1 0 0 0 0
> Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
> Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
> --------------------------------------------------------------------------------
> VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
> --------------------------------------------------------------------------------
> OK
> 1 14565.470000000008
> 2 64744.89000000003
> 3 64939.01999999984
> 5 36563.76999999997
> 6 36641.64999999999
> Time taken: 11.02 seconds, Fetched: 5 row(s)
> {quote}
> +*In Kylin, same query got error result(only THREE row). But when you set limit to 50000(original value). It is OK.*+
> !image-2018-12-16-17-06-16-341.png!
>
>
> We can find following things in log:
> {quote}
> KYLIN [ DEBUG ] 12-16 17:04:28.299 org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
> > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not clustered at head, groupsD: \{0} with cuboid columns: \{1}
> KYLIN [ INFO ] 12-16 17:04:28.299 org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
> > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
> {quote}
>
> {quote}KYLIN [ INFO ] 12-16 17:04:28.405 org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
> >
> ==========================[QUERY]===============================
> Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708
> SQL: select uid, sum(active_minutes) as am
> from useraction
> where item_id in (
> select distinct item_id
> from iteminfo
> where item_type in ('Video')
> ) and act_type != 'share'
> group by uid
> User: ADMIN
> Success: true
> Duration: 0.202
> Project: PearVideo
> Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]]
> Cuboid Ids: [14]
> Total scan count: 120
> Total scan bytes: 6442
> Result row count: 3
> Accept Partial: true
> Is Partial Result: false
> Hit Exception Cache: false
> Storage cache used: false
> Is Query Push-Down: false
> Is Prepare: false
> Trace URL: null
> Message: null
> ==========================[QUERY]===============================
> {quote}
>
> h3. Execution Plan
> {quote}EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
> OLAPLimitRel(ctx=[], fetch=[1])
> OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[])
> OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[])
> OLAPFilterRel(condition=[<>($1, 'share')], ctx=[])
> OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[])
> OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
> OLAPAggregateRel(group=[\{0}], ctx=[])
> OLAPProjectRel(ITEM_ID=[$0], ctx=[])
> OLAPFilterRel(condition=[=($1, 'Video')], ctx=[])
> OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[], fields=[[0, 1]])
> EXECUTION PLAN AFTER OLAPCONTEXT IS SET
> OLAPToEnumerableConverter
> OLAPLimitRel(ctx=[0@null], fetch=[1])
> OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[0@null])
> OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[0@null])
> OLAPFilterRel(condition=[<>($1, 'share')], ctx=[0@null])
> OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[0@null])
> OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[0@null], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
> OLAPAggregateRel(group=[\{0}], ctx=[1@null])
> OLAPProjectRel(ITEM_ID=[$0], ctx=[1@null])
> OLAPFilterRel(condition=[=($1, 'Video')], ctx=[1@null])
> OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[1@null], fields=[[0, 1]])
> {quote}
>
> This bug was reported by Meituan's Dev [~kangkaisen].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)