You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by 张磊 <12...@qq.com> on 2016/10/27 09:56:28 UTC

回复： Exceed scan threshold at 10000001

Do you mean when i query, i should add where clause, 
but in some case, the number of records > threshold, how can i do?
For example, order by all groups, the number of the  all groups >  threshold




------------------ 原始邮件 ------------------
发件人: "Alberto Ramón";<a....@gmail.com>;
发送时间: 2016年10月27日(星期四) 下午5:47
收件人: "dev"<de...@kylin.apache.org>; 

主题: Re: Exceed scan threshold at 10000001



 ERROR: Scan row count exceeded threshold

MailList
<http://mail-archives.apache.org/mod_mbox/kylin-user/201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%2BamN2AX3hetnF0%3D9uFnow%40mail.gmail.com%3E>
Kilin
1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3

*Scan row count exceeded threshold: 1000000, please add filter condition to
narrow down backend scan range, like where clause*


BR, Alb

2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:

> Hi
>
>
> When i query a sql, I do not know why should scan hbase? How can i do?
> Thanks!
>
>
> Table: lineorder  12,000,000 row records
> Dimensions: LO_CUSTKEY,LO_PARTKEY
> Measures: count(1), sum(LO_REVENUE)
>
>
> Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
>
>
> I build a cude with two Dimensions and two Measures(count and sum), the
> size of the Htable is 98 MB, when i execute a query in insight, it shows
> Error in coprocessor; and i check the hbase log, i find blow messages
>
>
> 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020]
> gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020]
> endpoint.CubeVisitService: Scanned 1 rows from HBase.
>
>
> 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,queue=0,port=16020]
> endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,queue=0,port=16020]
> endpoint.CubeVisitService: The cube visit did not finish normally because
> scan num exceeds threshold
> org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed scan
> threshold at 10000001
>         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
>         at org.apache.kylin.storage.hbase.cube.v2.HBaseReadonlyStore$1$1.
> hasNext(HBaseReadonlyStore.java:111)
>         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
>         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> CubeVisitProtos.java:3952)
>         at org.apache.hadoop.hbase.regionserver.HRegion.
> execService(HRegion.java:7815)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> execServiceOnRegion(RSRpcServices.java:1986)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> execService(RSRpcServices.java:1968)
>         at org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2178)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> java:108)
>         at java.lang.Thread.run(Thread.java:745)

回复： Exceed scan threshold at 10000001

Posted by 张磊 <12...@qq.com>.

Hi 


there is something i can not see, can you send again?




Query:
select sum(m1) from table group by dim1,dim2 limit 10001
select sum(m1) from table group by dim1,dim2 limit 10000


cude:
dimensions only contain dim1 and dim2


count(distinct dim1) 200000, count(distinct dim 2) 20000


The query result is 
limit 10000  --> scan 1000 rows

limit 10001  ---> scan millions of rows




I do not konw why?  data isn't balanced? How can i avoid it?


Who can help me? thanks!








------------------ 原始邮件 ------------------
发件人: "a.ramonportoles";<a....@gmail.com>;
发送时间: 2016年10月28日(星期五) 晚上6:51
收件人: "dev"<de...@kylin.apache.org>; 

主题: Re: Exceed scan threshold at 10000001



Q1: If i query select count(1) from table group by letter,number limit 2,it  should scan the first two rows(letter number Agg groups)?


A1: Kylin build Hbase Key with Dimensions:


Kylin is "smart" when compose Hbase Row Key:
Is not the same Group by / filter by Dim1 that Dim3   :)

Dim1: Range scan--> you read that you need --> fast

Dim3: full scan --> you read more rows that you need --> slow


how to solve it?  (I think:) you can build several cubes / uses different aggregation groups  on same project
 




Q2: when i query select count(1) from table group by letter limit 2,it should scan the two rows(letter Agg group)


A2: Yes,, if you define count(1) as measure and letter as Dim, you will have a pre-calculated results



Also: check the cardinaliy of your data, Isn't normal:

limit 10000  --> scan 1000 rows

limit 10001  ---> scan millions of rows

If this is true your data isn't balanced, I don't know any solution for this


Alb















2016-10-28 10:01 GMT+02:00 张磊 <12...@qq.com>:
kylin do put pre-calculate results in hbase, if cude desc is below:
 Dimensions:letter number
 Measures:count
 in hbase the result is
 count letter number
 1         A        1
 1         A        2
 1         B        1
 1         B        2
 1         B        3
 1         B        4
 count  letter
 2          A
 4          B
 If i query select count(1) from table group by letter,number limit 2,it should scan the first two rows(letter number Agg groups)?
 when i query select count(1) from table group by letter limit 2,it should scan the two rows(letter Agg group)
 Do i say right?
 
 
 ------------------ 原始邮件 ------------------
 发件人: "a.ramonportoles";<a....@gmail.com>;
 发送时间: 2016年10月28日(星期五) 下午3:43
 收件人: "dev"<de...@kylin.apache.org>;
 
 主题: Re: Exceed scan threshold at 10000001
 
 
 
 hummmm
 but are you using "group by LO_CUSTKEY,LO_PARTKEY"
 
 And limit apply to final result no to scan rows
 
 Example:
 table with two columns Letter / Number
 A:1
 A:2
 B:1
 B:2
 B:3
 B:4
 
 select count (1), Letter from TB group by Letter limit 1
    Result: 2:A
    Scans 2 rows
 
 select count (1), Letter from TB group by Letter limit 2
    Result: 2:A
                4:B
    Scans 2 +4 rows
 
 
 Alb
 
 
 
 2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:
 
 > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
 > LO_CUSTKEY,LO_PARTKEY
 > LIMIT 10000
 >
 >
 > I find it scan 10000 rows from HBase
 >
 >
 > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
 > LO_CUSTKEY,LO_PARTKEY
 > LIMIT 10001
 >
 >
 > I find it scan 10000001 rows from Hbase
 >
 >
 > I do not know why?  Should not  scan 10001 row?
 >
 >
 > The two query i scan the same HTable KYLIN_78ROC49NQY
 > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
 >
 >
 >
 >
 > ------------------ 原始邮件 ------------------
 > 发件人: "ShaoFeng Shi";<sh...@apache.org>;
 > 发送时间: 2016年10月28日(星期五) 中午11:20
 > 收件人: "dev"<de...@kylin.apache.org>;
 >
 > 主题: Re: Exceed scan threshold at 10000001
 >
 >
 >
 > Alberto, thanks for your explaination, you got the points and is already an
 > Kylin expert I believe.
 >
 > In order to protect HBase and Kylin from crashing by bad queries (which
 > scan too many rows), Kylin add this mechnisam to interrupt when reach some
 > threshold. Usually in an OLAP scenario, the result wouldn't be too large.
 > This is also a reminder for user to rethink the design; If you really want
 > to get the threshold be enlarged, you can allocate more memory to Kylin and
 > set "kylin.query.mem.budget" to bigger value.
 >
 > 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
 >
 > > NOTE: I'm not a expert on Kylin  ;)
 > >
 > > Where is mandatory? No
 > > Where is recommended? yes
 > > Where bypass the threshold? No, I think this limit is hardcoded ¿?
 > >
 > > The real question must be: why this limit exists ?: (opinion)
 > > - The target of Kylin is Real / Near RT, limit rows --> limit response
 > time
 > > - If Your are using JDBC, this is not a good option by performance
 > > - Protect the HBase Coprocesor
 > > - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
 > > this new Dim
 > >
 > > For Extra-Large queries, you can also check:
 > >  -kylin.query.mem.budget= 3GB
 > >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you
 > can
 > > disable with -1)
 > >
 > > Good Luck, Alb
 > >
 > > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
 > >
 > > > Do you mean when i query, i should add where clause,
 > > > but in some case, the number of records > threshold, how can i do?
 > > > For example, order by all groups, the number of the  all groups >
 > > > threshold
 > > >
 > > >
 > > >
 > > >
 > > > ------------------ 原始邮件 ------------------
 > > > 发件人: "Alberto Ramón";<a....@gmail.com>;
 > > > 发送时间: 2016年10月27日(星期四) 下午5:47
 > > > 收件人: "dev"<de...@kylin.apache.org>;
 > > >
 > > > 主题: Re: Exceed scan threshold at 10000001
 > > >
 > > >
 > > >
 > > >  ERROR: Scan row count exceeded threshold
 > > >
 > > > MailList
 > > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
 > > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
 > 2BamN2AX3hetnF0%3D9uFnow%
 > > > 40mail.gmail.com%3E>
 > > > Kilin
 > > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
 > > >
 > > > *Scan row count exceeded threshold: 1000000, please add filter
 > condition
 > > to
 > > > narrow down backend scan range, like where clause*
 > > >
 > > >
 > > > BR, Alb
 > > >
 > > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
 > > >
 > > > > Hi
 > > > >
 > > > >
 > > > > When i query a sql, I do not know why should scan hbase? How can i
 > do?
 > > > > Thanks!
 > > > >
 > > > >
 > > > > Table: lineorder  12,000,000 row records
 > > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
 > > > > Measures: count(1), sum(LO_REVENUE)
 > > > >
 > > > >
 > > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
 > > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
 > > > >
 > > > >
 > > > > I build a cude with two Dimensions and two Measures(count and sum),
 > the
 > > > > size of the Htable is 98 MB, when i execute a query in insight, it
 > > shows
 > > > > Error in coprocessor; and i check the hbase log, i find blow messages
 > > > >
 > > > >
 > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
 > > > queue=1,port=16020]
 > > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
 > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
 > > > queue=1,port=16020]
 > > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
 > > > >
 > > > >
 > > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
 > > > queue=0,port=16020]
 > > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
 > > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
 > > > queue=0,port=16020]
 > > > > endpoint.CubeVisitService: The cube visit did not finish normally
 > > because
 > > > > scan num exceeds threshold
 > > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
 > scan
 > > > > threshold at 10000001
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
 > > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.
 > > > HBaseReadonlyStore$1$1.
 > > > > hasNext(HBaseReadonlyStore.java:111)
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
 > > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
 > > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
 > > > > CubeVisitProtos.java:3952)
 > > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
 > > > > execService(HRegion.java:7815)
 > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
 > > > > execServiceOnRegion(RSRpcServices.java:1986)
 > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
 > > > > execService(RSRpcServices.java:1968)
 > > > >         at org.apache.hadoop.hbase.protobuf.generated.
 > > > > ClientProtos$ClientService$2.callBlockingMethod(
 > > ClientProtos.java:33652)
 > > > >         at org.apache.hadoop.hbase.ipc.
 > RpcServer.call(RpcServer.java:
 > > > 2178)
 > > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
 > > > java:112)
 > > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
 > > > > RpcExecutor.java:133)
 > > > >         at org.apache.hadoop.hbase.ipc.
 > RpcExecutor$1.run(RpcExecutor.
 > > > > java:108)
 > > > >         at java.lang.Thread.run(Thread.java:745)
 > > >
 > >
 >
 >
 >
 > --
 > Best regards,
 >
 > Shaofeng Shi 史少锋
 >

Re: Exceed scan threshold at 10000001

Posted by ShaoFeng Shi <sh...@apache.org>.

Hi Lei,

When you create the cube, in the "Advanced settings" step, the rowkey is
composed by the dimension columns; The sequence can be adjusted by drag &
drop; Usually we suggest to move the columns that used as filter (in
"where" condition") be ahead of other columns, so they can be used to
narrow down the scan range in HBase; You can look through this presentation
(may need VPN):
http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin

2016-11-01 18:43 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> Sorry, The pictures can't arrive you OK
>
> See this:
> http://www.slideshare.net/HBaseCon/apache-kylins-
> performance-boost-from-apache-hbase#9
>
> 2016-11-01 7:52 GMT+01:00 张磊 <12...@qq.com>:
>
> > You say Kylin is "smart" when compose Hbase Row Key, there is something i
> > can not see. Could you send again How Hbase Row Key compose?
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "a.ramonportoles";<a....@gmail.com>;
> > 发送时间: 2016年10月28日(星期五) 晚上6:51
> > 收件人: "dev"<de...@kylin.apache.org>;
> >
> > 主题: Re: Exceed scan threshold at 10000001
> >
> >
> >
> > Q1: If i query select count(1) from table group by letter,number limit
> > 2,it  should scan the first two rows(letter number Agg groups)?
> >
> >
> > A1: Kylin build Hbase Key with Dimensions:
> >
> >
> > Kylin is "smart" when compose Hbase Row Key:
> > Is not the same Group by / filter by Dim1 that Dim3   :)
> >
> > Dim1: Range scan--> you read that you need --> fast
> >
> > Dim3: full scan --> you read more rows that you need --> slow
> >
> >
> > how to solve it?  (I think:) you can build several cubes / uses different
> > aggregation groups  on same project
> >
> >
> >
> >
> >
> > Q2: when i query select count(1) from table group by letter limit 2,it
> > should scan the two rows(letter Agg group)
> >
> >
> > A2: Yes,, if you define count(1) as measure and letter as Dim, you will
> > have a pre-calculated results
> >
> >
> >
> > Also: check the cardinaliy of your data, Isn't normal:
> >
> > limit 10000  --> scan 1000 rows
> >
> > limit 10001  ---> scan millions of rows
> >
> > If this is true your data isn't balanced, I don't know any solution for
> > this
> >
> >
> > Alb
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2016-10-28 10:01 GMT+02:00 张磊 <12...@qq.com>:
> > kylin do put pre-calculate results in hbase, if cude desc is below:
> >  Dimensions:letter number
> >  Measures:count
> >  in hbase the result is
> >  count letter number
> >  1         A        1
> >  1         A        2
> >  1         B        1
> >  1         B        2
> >  1         B        3
> >  1         B        4
> >  count  letter
> >  2          A
> >  4          B
> >  If i query select count(1) from table group by letter,number limit 2,it
> > should scan the first two rows(letter number Agg groups)?
> >  when i query select count(1) from table group by letter limit 2,it
> should
> > scan the two rows(letter Agg group)
> >  Do i say right?
> >
> >
> >  ------------------ 原始邮件 ------------------
> >  发件人: "a.ramonportoles";<a....@gmail.com>;
> >  发送时间: 2016年10月28日(星期五) 下午3:43
> >  收件人: "dev"<de...@kylin.apache.org>;
> >
> >  主题: Re: Exceed scan threshold at 10000001
> >
> >
> >
> >  hummmm
> >  but are you using "group by LO_CUSTKEY,LO_PARTKEY"
> >
> >  And limit apply to final result no to scan rows
> >
> >  Example:
> >  table with two columns Letter / Number
> >  A:1
> >  A:2
> >  B:1
> >  B:2
> >  B:3
> >  B:4
> >
> >  select count (1), Letter from TB group by Letter limit 1
> >     Result: 2:A
> >     Scans 2 rows
> >
> >  select count (1), Letter from TB group by Letter limit 2
> >     Result: 2:A
> >                 4:B
> >     Scans 2 +4 rows
> >
> >
> >  Alb
> >
> >
> >
> >  2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:
> >
> >  > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
> >  > LO_CUSTKEY,LO_PARTKEY
> >  > LIMIT 10000
> >  >
> >  >
> >  > I find it scan 10000 rows from HBase
> >  >
> >  >
> >  > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
> >  > LO_CUSTKEY,LO_PARTKEY
> >  > LIMIT 10001
> >  >
> >  >
> >  > I find it scan 10000001 rows from Hbase
> >  >
> >  >
> >  > I do not know why?  Should not  scan 10001 row?
> >  >
> >  >
> >  > The two query i scan the same HTable KYLIN_78ROC49NQY
> >  > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
> >  >
> >  >
> >  >
> >  >
> >  > ------------------ 原始邮件 ------------------
> >  > 发件人: "ShaoFeng Shi";<sh...@apache.org>;
> >  > 发送时间: 2016年10月28日(星期五) 中午11:20
> >  > 收件人: "dev"<de...@kylin.apache.org>;
> >  >
> >  > 主题: Re: Exceed scan threshold at 10000001
> >  >
> >  >
> >  >
> >  > Alberto, thanks for your explaination, you got the points and is
> > already an
> >  > Kylin expert I believe.
> >  >
> >  > In order to protect HBase and Kylin from crashing by bad queries
> (which
> >  > scan too many rows), Kylin add this mechnisam to interrupt when reach
> > some
> >  > threshold. Usually in an OLAP scenario, the result wouldn't be too
> > large.
> >  > This is also a reminder for user to rethink the design; If you really
> > want
> >  > to get the threshold be enlarged, you can allocate more memory to
> Kylin
> > and
> >  > set "kylin.query.mem.budget" to bigger value.
> >  >
> >  > 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
> >  >
> >  > > NOTE: I'm not a expert on Kylin  ;)
> >  > >
> >  > > Where is mandatory? No
> >  > > Where is recommended? yes
> >  > > Where bypass the threshold? No, I think this limit is hardcoded ¿?
> >  > >
> >  > > The real question must be: why this limit exists ?: (opinion)
> >  > > - The target of Kylin is Real / Near RT, limit rows --> limit
> response
> >  > time
> >  > > - If Your are using JDBC, this is not a good option by performance
> >  > > - Protect the HBase Coprocesor
> >  > > - Perhaps you need a new Dim, to precalculate This Aggregate or
> > filter by
> >  > > this new Dim
> >  > >
> >  > > For Extra-Large queries, you can also check:
> >  > >  -kylin.query.mem.budget= 3GB
> >  > >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase,
> > you
> >  > can
> >  > > disable with -1)
> >  > >
> >  > > Good Luck, Alb
> >  > >
> >  > > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
> >  > >
> >  > > > Do you mean when i query, i should add where clause,
> >  > > > but in some case, the number of records > threshold, how can i do?
> >  > > > For example, order by all groups, the number of the  all groups >
> >  > > > threshold
> >  > > >
> >  > > >
> >  > > >
> >  > > >
> >  > > > ------------------ 原始邮件 ------------------
> >  > > > 发件人: "Alberto Ramón";<a....@gmail.com>;
> >  > > > 发送时间: 2016年10月27日(星期四) 下午5:47
> >  > > > 收件人: "dev"<de...@kylin.apache.org>;
> >  > > >
> >  > > > 主题: Re: Exceed scan threshold at 10000001
> >  > > >
> >  > > >
> >  > > >
> >  > > >  ERROR: Scan row count exceeded threshold
> >  > > >
> >  > > > MailList
> >  > > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> >  > > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
> >  > 2BamN2AX3hetnF0%3D9uFnow%
> >  > > > 40mail.gmail.com%3E>
> >  > > > Kilin
> >  > > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> >  > > >
> >  > > > *Scan row count exceeded threshold: 1000000, please add filter
> >  > condition
> >  > > to
> >  > > > narrow down backend scan range, like where clause*
> >  > > >
> >  > > >
> >  > > > BR, Alb
> >  > > >
> >  > > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
> >  > > >
> >  > > > > Hi
> >  > > > >
> >  > > > >
> >  > > > > When i query a sql, I do not know why should scan hbase? How
> can i
> >  > do?
> >  > > > > Thanks!
> >  > > > >
> >  > > > >
> >  > > > > Table: lineorder  12,000,000 row records
> >  > > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> >  > > > > Measures: count(1), sum(LO_REVENUE)
> >  > > > >
> >  > > > >
> >  > > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group
> by
> >  > > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> >  > > > >
> >  > > > >
> >  > > > > I build a cude with two Dimensions and two Measures(count and
> > sum),
> >  > the
> >  > > > > size of the Htable is 98 MB, when i execute a query in insight,
> it
> >  > > shows
> >  > > > > Error in coprocessor; and i check the hbase log, i find blow
> > messages
> >  > > > >
> >  > > > >
> >  > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> >  > > > queue=1,port=16020]
> >  > > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip
> > it
> >  > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> >  > > > queue=1,port=16020]
> >  > > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> >  > > > >
> >  > > > >
> >  > > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> >  > > > queue=0,port=16020]
> >  > > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> >  > > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> >  > > > queue=0,port=16020]
> >  > > > > endpoint.CubeVisitService: The cube visit did not finish
> normally
> >  > > because
> >  > > > > scan num exceeds threshold
> >  > > > > org.apache.kylin.gridtable.GTScanExceedThresholdException:
> Exceed
> >  > scan
> >  > > > > threshold at 10000001
> >  > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> >  > > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> >  > > > >         at org.apache.kylin.storage.hbase.cube.v2.
> >  > > > HBaseReadonlyStore$1$1.
> >  > > > > hasNext(HBaseReadonlyStore.java:111)
> >  > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> >  > > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> >  > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> >  > > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> >  > > > > CubeVisitProtos.java:3952)
> >  > > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> >  > > > > execService(HRegion.java:7815)
> >  > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> >  > > > > execServiceOnRegion(RSRpcServices.java:1986)
> >  > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> >  > > > > execService(RSRpcServices.java:1968)
> >  > > > >         at org.apache.hadoop.hbase.protobuf.generated.
> >  > > > > ClientProtos$ClientService$2.callBlockingMethod(
> >  > > ClientProtos.java:33652)
> >  > > > >         at org.apache.hadoop.hbase.ipc.
> >  > RpcServer.call(RpcServer.java:
> >  > > > 2178)
> >  > > > >         at org.apache.hadoop.hbase.ipc.
> CallRunner.run(CallRunner.
> >  > > > java:112)
> >  > > > >         at org.apache.hadoop.hbase.ipc.
> RpcExecutor.consumerLoop(
> >  > > > > RpcExecutor.java:133)
> >  > > > >         at org.apache.hadoop.hbase.ipc.
> >  > RpcExecutor$1.run(RpcExecutor.
> >  > > > > java:108)
> >  > > > >         at java.lang.Thread.run(Thread.java:745)
> >  > > >
> >  > >
> >  >
> >  >
> >  >
> >  > --
> >  > Best regards,
> >  >
> >  > Shaofeng Shi 史少锋
> >  >
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Exceed scan threshold at 10000001

Posted by Alberto Ramón <a....@gmail.com>.

Sorry, The pictures can't arrive you OK

See this:
http://www.slideshare.net/HBaseCon/apache-kylins-performance-boost-from-apache-hbase#9

2016-11-01 7:52 GMT+01:00 张磊 <12...@qq.com>:

> You say Kylin is "smart" when compose Hbase Row Key, there is something i
> can not see. Could you send again How Hbase Row Key compose?
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "a.ramonportoles";<a....@gmail.com>;
> 发送时间: 2016年10月28日(星期五) 晚上6:51
> 收件人: "dev"<de...@kylin.apache.org>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
> Q1: If i query select count(1) from table group by letter,number limit
> 2,it  should scan the first two rows(letter number Agg groups)?
>
>
> A1: Kylin build Hbase Key with Dimensions:
>
>
> Kylin is "smart" when compose Hbase Row Key:
> Is not the same Group by / filter by Dim1 that Dim3   :)
>
> Dim1: Range scan--> you read that you need --> fast
>
> Dim3: full scan --> you read more rows that you need --> slow
>
>
> how to solve it?  (I think:) you can build several cubes / uses different
> aggregation groups  on same project
>
>
>
>
>
> Q2: when i query select count(1) from table group by letter limit 2,it
> should scan the two rows(letter Agg group)
>
>
> A2: Yes,, if you define count(1) as measure and letter as Dim, you will
> have a pre-calculated results
>
>
>
> Also: check the cardinaliy of your data, Isn't normal:
>
> limit 10000  --> scan 1000 rows
>
> limit 10001  ---> scan millions of rows
>
> If this is true your data isn't balanced, I don't know any solution for
> this
>
>
> Alb
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 2016-10-28 10:01 GMT+02:00 张磊 <12...@qq.com>:
> kylin do put pre-calculate results in hbase, if cude desc is below:
>  Dimensions:letter number
>  Measures:count
>  in hbase the result is
>  count letter number
>  1         A        1
>  1         A        2
>  1         B        1
>  1         B        2
>  1         B        3
>  1         B        4
>  count  letter
>  2          A
>  4          B
>  If i query select count(1) from table group by letter,number limit 2,it
> should scan the first two rows(letter number Agg groups)?
>  when i query select count(1) from table group by letter limit 2,it should
> scan the two rows(letter Agg group)
>  Do i say right?
>
>
>  ------------------ 原始邮件 ------------------
>  发件人: "a.ramonportoles";<a....@gmail.com>;
>  发送时间: 2016年10月28日(星期五) 下午3:43
>  收件人: "dev"<de...@kylin.apache.org>;
>
>  主题: Re: Exceed scan threshold at 10000001
>
>
>
>  hummmm
>  but are you using "group by LO_CUSTKEY,LO_PARTKEY"
>
>  And limit apply to final result no to scan rows
>
>  Example:
>  table with two columns Letter / Number
>  A:1
>  A:2
>  B:1
>  B:2
>  B:3
>  B:4
>
>  select count (1), Letter from TB group by Letter limit 1
>     Result: 2:A
>     Scans 2 rows
>
>  select count (1), Letter from TB group by Letter limit 2
>     Result: 2:A
>                 4:B
>     Scans 2 +4 rows
>
>
>  Alb
>
>
>
>  2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:
>
>  > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
>  > LO_CUSTKEY,LO_PARTKEY
>  > LIMIT 10000
>  >
>  >
>  > I find it scan 10000 rows from HBase
>  >
>  >
>  > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
>  > LO_CUSTKEY,LO_PARTKEY
>  > LIMIT 10001
>  >
>  >
>  > I find it scan 10000001 rows from Hbase
>  >
>  >
>  > I do not know why?  Should not  scan 10001 row?
>  >
>  >
>  > The two query i scan the same HTable KYLIN_78ROC49NQY
>  > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
>  >
>  >
>  >
>  >
>  > ------------------ 原始邮件 ------------------
>  > 发件人: "ShaoFeng Shi";<sh...@apache.org>;
>  > 发送时间: 2016年10月28日(星期五) 中午11:20
>  > 收件人: "dev"<de...@kylin.apache.org>;
>  >
>  > 主题: Re: Exceed scan threshold at 10000001
>  >
>  >
>  >
>  > Alberto, thanks for your explaination, you got the points and is
> already an
>  > Kylin expert I believe.
>  >
>  > In order to protect HBase and Kylin from crashing by bad queries (which
>  > scan too many rows), Kylin add this mechnisam to interrupt when reach
> some
>  > threshold. Usually in an OLAP scenario, the result wouldn't be too
> large.
>  > This is also a reminder for user to rethink the design; If you really
> want
>  > to get the threshold be enlarged, you can allocate more memory to Kylin
> and
>  > set "kylin.query.mem.budget" to bigger value.
>  >
>  > 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>  >
>  > > NOTE: I'm not a expert on Kylin  ;)
>  > >
>  > > Where is mandatory? No
>  > > Where is recommended? yes
>  > > Where bypass the threshold? No, I think this limit is hardcoded ¿?
>  > >
>  > > The real question must be: why this limit exists ?: (opinion)
>  > > - The target of Kylin is Real / Near RT, limit rows --> limit response
>  > time
>  > > - If Your are using JDBC, this is not a good option by performance
>  > > - Protect the HBase Coprocesor
>  > > - Perhaps you need a new Dim, to precalculate This Aggregate or
> filter by
>  > > this new Dim
>  > >
>  > > For Extra-Large queries, you can also check:
>  > >  -kylin.query.mem.budget= 3GB
>  > >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase,
> you
>  > can
>  > > disable with -1)
>  > >
>  > > Good Luck, Alb
>  > >
>  > > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
>  > >
>  > > > Do you mean when i query, i should add where clause,
>  > > > but in some case, the number of records > threshold, how can i do?
>  > > > For example, order by all groups, the number of the  all groups >
>  > > > threshold
>  > > >
>  > > >
>  > > >
>  > > >
>  > > > ------------------ 原始邮件 ------------------
>  > > > 发件人: "Alberto Ramón";<a....@gmail.com>;
>  > > > 发送时间: 2016年10月27日(星期四) 下午5:47
>  > > > 收件人: "dev"<de...@kylin.apache.org>;
>  > > >
>  > > > 主题: Re: Exceed scan threshold at 10000001
>  > > >
>  > > >
>  > > >
>  > > >  ERROR: Scan row count exceeded threshold
>  > > >
>  > > > MailList
>  > > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
>  > > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
>  > 2BamN2AX3hetnF0%3D9uFnow%
>  > > > 40mail.gmail.com%3E>
>  > > > Kilin
>  > > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
>  > > >
>  > > > *Scan row count exceeded threshold: 1000000, please add filter
>  > condition
>  > > to
>  > > > narrow down backend scan range, like where clause*
>  > > >
>  > > >
>  > > > BR, Alb
>  > > >
>  > > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
>  > > >
>  > > > > Hi
>  > > > >
>  > > > >
>  > > > > When i query a sql, I do not know why should scan hbase? How can i
>  > do?
>  > > > > Thanks!
>  > > > >
>  > > > >
>  > > > > Table: lineorder  12,000,000 row records
>  > > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
>  > > > > Measures: count(1), sum(LO_REVENUE)
>  > > > >
>  > > > >
>  > > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
>  > > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
>  > > > >
>  > > > >
>  > > > > I build a cude with two Dimensions and two Measures(count and
> sum),
>  > the
>  > > > > size of the Htable is 98 MB, when i execute a query in insight, it
>  > > shows
>  > > > > Error in coprocessor; and i check the hbase log, i find blow
> messages
>  > > > >
>  > > > >
>  > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
>  > > > queue=1,port=16020]
>  > > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip
> it
>  > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
>  > > > queue=1,port=16020]
>  > > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
>  > > > >
>  > > > >
>  > > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
>  > > > queue=0,port=16020]
>  > > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
>  > > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
>  > > > queue=0,port=16020]
>  > > > > endpoint.CubeVisitService: The cube visit did not finish normally
>  > > because
>  > > > > scan num exceeds threshold
>  > > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
>  > scan
>  > > > > threshold at 10000001
>  > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
>  > > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
>  > > > >         at org.apache.kylin.storage.hbase.cube.v2.
>  > > > HBaseReadonlyStore$1$1.
>  > > > > hasNext(HBaseReadonlyStore.java:111)
>  > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
>  > > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
>  > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
>  > > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
>  > > > > CubeVisitProtos.java:3952)
>  > > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
>  > > > > execService(HRegion.java:7815)
>  > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>  > > > > execServiceOnRegion(RSRpcServices.java:1986)
>  > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>  > > > > execService(RSRpcServices.java:1968)
>  > > > >         at org.apache.hadoop.hbase.protobuf.generated.
>  > > > > ClientProtos$ClientService$2.callBlockingMethod(
>  > > ClientProtos.java:33652)
>  > > > >         at org.apache.hadoop.hbase.ipc.
>  > RpcServer.call(RpcServer.java:
>  > > > 2178)
>  > > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
>  > > > java:112)
>  > > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
>  > > > > RpcExecutor.java:133)
>  > > > >         at org.apache.hadoop.hbase.ipc.
>  > RpcExecutor$1.run(RpcExecutor.
>  > > > > java:108)
>  > > > >         at java.lang.Thread.run(Thread.java:745)
>  > > >
>  > >
>  >
>  >
>  >
>  > --
>  > Best regards,
>  >
>  > Shaofeng Shi 史少锋
>  >
>

回复： Exceed scan threshold at 10000001

Posted by 张磊 <12...@qq.com>.

You say Kylin is "smart" when compose Hbase Row Key, there is something i can not see. Could you send again How Hbase Row Key compose?




------------------ 原始邮件 ------------------
发件人: "a.ramonportoles";<a....@gmail.com>;
发送时间: 2016年10月28日(星期五) 晚上6:51
收件人: "dev"<de...@kylin.apache.org>; 

主题: Re: Exceed scan threshold at 10000001



Q1: If i query select count(1) from table group by letter,number limit 2,it  should scan the first two rows(letter number Agg groups)?


A1: Kylin build Hbase Key with Dimensions:


Kylin is "smart" when compose Hbase Row Key:
Is not the same Group by / filter by Dim1 that Dim3   :)

Dim1: Range scan--> you read that you need --> fast

Dim3: full scan --> you read more rows that you need --> slow


how to solve it?  (I think:) you can build several cubes / uses different aggregation groups  on same project
 




Q2: when i query select count(1) from table group by letter limit 2,it should scan the two rows(letter Agg group)


A2: Yes,, if you define count(1) as measure and letter as Dim, you will have a pre-calculated results



Also: check the cardinaliy of your data, Isn't normal:

limit 10000  --> scan 1000 rows

limit 10001  ---> scan millions of rows

If this is true your data isn't balanced, I don't know any solution for this


Alb















2016-10-28 10:01 GMT+02:00 张磊 <12...@qq.com>:
kylin do put pre-calculate results in hbase, if cude desc is below:
 Dimensions:letter number
 Measures:count
 in hbase the result is
 count letter number
 1         A        1
 1         A        2
 1         B        1
 1         B        2
 1         B        3
 1         B        4
 count  letter
 2          A
 4          B
 If i query select count(1) from table group by letter,number limit 2,it should scan the first two rows(letter number Agg groups)?
 when i query select count(1) from table group by letter limit 2,it should scan the two rows(letter Agg group)
 Do i say right?
 
 
 ------------------ 原始邮件 ------------------
 发件人: "a.ramonportoles";<a....@gmail.com>;
 发送时间: 2016年10月28日(星期五) 下午3:43
 收件人: "dev"<de...@kylin.apache.org>;
 
 主题: Re: Exceed scan threshold at 10000001
 
 
 
 hummmm
 but are you using "group by LO_CUSTKEY,LO_PARTKEY"
 
 And limit apply to final result no to scan rows
 
 Example:
 table with two columns Letter / Number
 A:1
 A:2
 B:1
 B:2
 B:3
 B:4
 
 select count (1), Letter from TB group by Letter limit 1
    Result: 2:A
    Scans 2 rows
 
 select count (1), Letter from TB group by Letter limit 2
    Result: 2:A
                4:B
    Scans 2 +4 rows
 
 
 Alb
 
 
 
 2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:
 
 > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
 > LO_CUSTKEY,LO_PARTKEY
 > LIMIT 10000
 >
 >
 > I find it scan 10000 rows from HBase
 >
 >
 > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
 > LO_CUSTKEY,LO_PARTKEY
 > LIMIT 10001
 >
 >
 > I find it scan 10000001 rows from Hbase
 >
 >
 > I do not know why?  Should not  scan 10001 row?
 >
 >
 > The two query i scan the same HTable KYLIN_78ROC49NQY
 > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
 >
 >
 >
 >
 > ------------------ 原始邮件 ------------------
 > 发件人: "ShaoFeng Shi";<sh...@apache.org>;
 > 发送时间: 2016年10月28日(星期五) 中午11:20
 > 收件人: "dev"<de...@kylin.apache.org>;
 >
 > 主题: Re: Exceed scan threshold at 10000001
 >
 >
 >
 > Alberto, thanks for your explaination, you got the points and is already an
 > Kylin expert I believe.
 >
 > In order to protect HBase and Kylin from crashing by bad queries (which
 > scan too many rows), Kylin add this mechnisam to interrupt when reach some
 > threshold. Usually in an OLAP scenario, the result wouldn't be too large.
 > This is also a reminder for user to rethink the design; If you really want
 > to get the threshold be enlarged, you can allocate more memory to Kylin and
 > set "kylin.query.mem.budget" to bigger value.
 >
 > 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
 >
 > > NOTE: I'm not a expert on Kylin  ;)
 > >
 > > Where is mandatory? No
 > > Where is recommended? yes
 > > Where bypass the threshold? No, I think this limit is hardcoded ¿?
 > >
 > > The real question must be: why this limit exists ?: (opinion)
 > > - The target of Kylin is Real / Near RT, limit rows --> limit response
 > time
 > > - If Your are using JDBC, this is not a good option by performance
 > > - Protect the HBase Coprocesor
 > > - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
 > > this new Dim
 > >
 > > For Extra-Large queries, you can also check:
 > >  -kylin.query.mem.budget= 3GB
 > >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you
 > can
 > > disable with -1)
 > >
 > > Good Luck, Alb
 > >
 > > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
 > >
 > > > Do you mean when i query, i should add where clause,
 > > > but in some case, the number of records > threshold, how can i do?
 > > > For example, order by all groups, the number of the  all groups >
 > > > threshold
 > > >
 > > >
 > > >
 > > >
 > > > ------------------ 原始邮件 ------------------
 > > > 发件人: "Alberto Ramón";<a....@gmail.com>;
 > > > 发送时间: 2016年10月27日(星期四) 下午5:47
 > > > 收件人: "dev"<de...@kylin.apache.org>;
 > > >
 > > > 主题: Re: Exceed scan threshold at 10000001
 > > >
 > > >
 > > >
 > > >  ERROR: Scan row count exceeded threshold
 > > >
 > > > MailList
 > > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
 > > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
 > 2BamN2AX3hetnF0%3D9uFnow%
 > > > 40mail.gmail.com%3E>
 > > > Kilin
 > > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
 > > >
 > > > *Scan row count exceeded threshold: 1000000, please add filter
 > condition
 > > to
 > > > narrow down backend scan range, like where clause*
 > > >
 > > >
 > > > BR, Alb
 > > >
 > > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
 > > >
 > > > > Hi
 > > > >
 > > > >
 > > > > When i query a sql, I do not know why should scan hbase? How can i
 > do?
 > > > > Thanks!
 > > > >
 > > > >
 > > > > Table: lineorder  12,000,000 row records
 > > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
 > > > > Measures: count(1), sum(LO_REVENUE)
 > > > >
 > > > >
 > > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
 > > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
 > > > >
 > > > >
 > > > > I build a cude with two Dimensions and two Measures(count and sum),
 > the
 > > > > size of the Htable is 98 MB, when i execute a query in insight, it
 > > shows
 > > > > Error in coprocessor; and i check the hbase log, i find blow messages
 > > > >
 > > > >
 > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
 > > > queue=1,port=16020]
 > > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
 > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
 > > > queue=1,port=16020]
 > > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
 > > > >
 > > > >
 > > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
 > > > queue=0,port=16020]
 > > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
 > > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
 > > > queue=0,port=16020]
 > > > > endpoint.CubeVisitService: The cube visit did not finish normally
 > > because
 > > > > scan num exceeds threshold
 > > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
 > scan
 > > > > threshold at 10000001
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
 > > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.
 > > > HBaseReadonlyStore$1$1.
 > > > > hasNext(HBaseReadonlyStore.java:111)
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
 > > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
 > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
 > > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
 > > > > CubeVisitProtos.java:3952)
 > > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
 > > > > execService(HRegion.java:7815)
 > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
 > > > > execServiceOnRegion(RSRpcServices.java:1986)
 > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
 > > > > execService(RSRpcServices.java:1968)
 > > > >         at org.apache.hadoop.hbase.protobuf.generated.
 > > > > ClientProtos$ClientService$2.callBlockingMethod(
 > > ClientProtos.java:33652)
 > > > >         at org.apache.hadoop.hbase.ipc.
 > RpcServer.call(RpcServer.java:
 > > > 2178)
 > > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
 > > > java:112)
 > > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
 > > > > RpcExecutor.java:133)
 > > > >         at org.apache.hadoop.hbase.ipc.
 > RpcExecutor$1.run(RpcExecutor.
 > > > > java:108)
 > > > >         at java.lang.Thread.run(Thread.java:745)
 > > >
 > >
 >
 >
 >
 > --
 > Best regards,
 >
 > Shaofeng Shi 史少锋
 >

Re: Exceed scan threshold at 10000001

Posted by Alberto Ramón <a....@gmail.com>.

*Q1*: If i query select count(1) from table group by letter,number limit
2,it should scan the first two rows(letter number Agg groups)?

*A1*: Kylin build Hbase Key with Dimensions:

Kylin is "smart" when compose Hbase Row Key:[image: Imágenes integradas 1]
Is not the same Group by / filter by Dim1 that Dim3   :)
Dim1: Range scan--> you read that you need --> fast
Dim3: full scan --> you read more rows that you need --> slow

how to solve it?  (I think:) you can build several cubes / uses different
aggregation groups on same project



*Q2:* when i query select count(1) from table group by letter limit 2,it
should scan the two rows(letter Agg group)

*A2*: Yes,, if you define count(1) as measure and letter as Dim, you will
have a pre-calculated results


*Also*: check the cardinaliy of your data, Isn't normal:
limit 10000  --> scan 1000 rows
limit 10001  ---> scan millions of rows
If this is true your data isn't balanced, I don't know any solution for this

Alb




2016-10-28 10:01 GMT+02:00 张磊 <12...@qq.com>:

> kylin do put pre-calculate results in hbase, if cude desc is below:
> Dimensions:letter number
> Measures:count
> in hbase the result is
> count letter number
> 1         A        1
> 1         A        2
> 1         B        1
> 1         B        2
> 1         B        3
> 1         B        4
> count  letter
> 2          A
> 4          B
> If i query select count(1) from table group by letter,number limit 2,it
> should scan the first two rows(letter number Agg groups)?
> when i query select count(1) from table group by letter limit 2,it should
> scan the two rows(letter Agg group)
> Do i say right?
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "a.ramonportoles";<a....@gmail.com>;
> 发送时间: 2016年10月28日(星期五) 下午3:43
> 收件人: "dev"<de...@kylin.apache.org>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
> hummmm
> but are you using "group by LO_CUSTKEY,LO_PARTKEY"
>
> And limit apply to final result no to scan rows
>
> Example:
> table with two columns Letter / Number
> A:1
> A:2
> B:1
> B:2
> B:3
> B:4
>
> select count (1), Letter from TB group by Letter limit 1
>    Result: 2:A
>    Scans 2 rows
>
> select count (1), Letter from TB group by Letter limit 2
>    Result: 2:A
>                4:B
>    Scans 2 +4 rows
>
>
> Alb
>
>
>
> 2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:
>
> > Query1:select count(1),sum(LO_REVENUE) from lineorder group by
> > LO_CUSTKEY,LO_PARTKEY
> > LIMIT 10000
> >
> >
> > I find it scan 10000 rows from HBase
> >
> >
> > Query2: select count(1),sum(LO_REVENUE) from lineorder group by
> > LO_CUSTKEY,LO_PARTKEY
> > LIMIT 10001
> >
> >
> > I find it scan 10000001 rows from Hbase
> >
> >
> > I do not know why?  Should not  scan 10001 row?
> >
> >
> > The two query i scan the same HTable KYLIN_78ROC49NQY
> > Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "ShaoFeng Shi";<sh...@apache.org>;
> > 发送时间: 2016年10月28日(星期五) 中午11:20
> > 收件人: "dev"<de...@kylin.apache.org>;
> >
> > 主题: Re: Exceed scan threshold at 10000001
> >
> >
> >
> > Alberto, thanks for your explaination, you got the points and is already
> an
> > Kylin expert I believe.
> >
> > In order to protect HBase and Kylin from crashing by bad queries (which
> > scan too many rows), Kylin add this mechnisam to interrupt when reach
> some
> > threshold. Usually in an OLAP scenario, the result wouldn't be too large.
> > This is also a reminder for user to rethink the design; If you really
> want
> > to get the threshold be enlarged, you can allocate more memory to Kylin
> and
> > set "kylin.query.mem.budget" to bigger value.
> >
> > 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
> >
> > > NOTE: I'm not a expert on Kylin  ;)
> > >
> > > Where is mandatory? No
> > > Where is recommended? yes
> > > Where bypass the threshold? No, I think this limit is hardcoded ¿?
> > >
> > > The real question must be: why this limit exists ?: (opinion)
> > > - The target of Kylin is Real / Near RT, limit rows --> limit response
> > time
> > > - If Your are using JDBC, this is not a good option by performance
> > > - Protect the HBase Coprocesor
> > > - Perhaps you need a new Dim, to precalculate This Aggregate or filter
> by
> > > this new Dim
> > >
> > > For Extra-Large queries, you can also check:
> > >  -kylin.query.mem.budget= 3GB
> > >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you
> > can
> > > disable with -1)
> > >
> > > Good Luck, Alb
> > >
> > > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
> > >
> > > > Do you mean when i query, i should add where clause,
> > > > but in some case, the number of records > threshold, how can i do?
> > > > For example, order by all groups, the number of the  all groups >
> > > > threshold
> > > >
> > > >
> > > >
> > > >
> > > > ------------------ 原始邮件 ------------------
> > > > 发件人: "Alberto Ramón";<a....@gmail.com>;
> > > > 发送时间: 2016年10月27日(星期四) 下午5:47
> > > > 收件人: "dev"<de...@kylin.apache.org>;
> > > >
> > > > 主题: Re: Exceed scan threshold at 10000001
> > > >
> > > >
> > > >
> > > >  ERROR: Scan row count exceeded threshold
> > > >
> > > > MailList
> > > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
> > 2BamN2AX3hetnF0%3D9uFnow%
> > > > 40mail.gmail.com%3E>
> > > > Kilin
> > > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> > > >
> > > > *Scan row count exceeded threshold: 1000000, please add filter
> > condition
> > > to
> > > > narrow down backend scan range, like where clause*
> > > >
> > > >
> > > > BR, Alb
> > > >
> > > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
> > > >
> > > > > Hi
> > > > >
> > > > >
> > > > > When i query a sql, I do not know why should scan hbase? How can i
> > do?
> > > > > Thanks!
> > > > >
> > > > >
> > > > > Table: lineorder  12,000,000 row records
> > > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > > > Measures: count(1), sum(LO_REVENUE)
> > > > >
> > > > >
> > > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > > > >
> > > > >
> > > > > I build a cude with two Dimensions and two Measures(count and sum),
> > the
> > > > > size of the Htable is 98 MB, when i execute a query in insight, it
> > > shows
> > > > > Error in coprocessor; and i check the hbase log, i find blow
> messages
> > > > >
> > > > >
> > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > > queue=1,port=16020]
> > > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > > queue=1,port=16020]
> > > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > > > >
> > > > >
> > > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > > > queue=0,port=16020]
> > > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > > > queue=0,port=16020]
> > > > > endpoint.CubeVisitService: The cube visit did not finish normally
> > > because
> > > > > scan num exceeds threshold
> > > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
> > scan
> > > > > threshold at 10000001
> > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > > > >         at org.apache.kylin.storage.hbase.cube.v2.
> > > > HBaseReadonlyStore$1$1.
> > > > > hasNext(HBaseReadonlyStore.java:111)
> > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > > > CubeVisitProtos.java:3952)
> > > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > > execService(HRegion.java:7815)
> > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > > execServiceOnRegion(RSRpcServices.java:1986)
> > > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > > execService(RSRpcServices.java:1968)
> > > > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > > > ClientProtos$ClientService$2.callBlockingMethod(
> > > ClientProtos.java:33652)
> > > > >         at org.apache.hadoop.hbase.ipc.
> > RpcServer.call(RpcServer.java:
> > > > 2178)
> > > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > > > java:112)
> > > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > > > RpcExecutor.java:133)
> > > > >         at org.apache.hadoop.hbase.ipc.
> > RpcExecutor$1.run(RpcExecutor.
> > > > > java:108)
> > > > >         at java.lang.Thread.run(Thread.java:745)
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> >
>

回复： Exceed scan threshold at 10000001

Posted by 张磊 <12...@qq.com>.

kylin do put pre-calculate results in hbase, if cude desc is below:
Dimensions:letter number
Measures:count
in hbase the result is 
count letter number
1         A        1
1         A        2
1         B        1
1         B        2
1         B        3
1         B        4
count  letter
2          A
4          B
If i query select count(1) from table group by letter,number limit 2,it should scan the first two rows(letter number Agg groups)?
when i query select count(1) from table group by letter limit 2,it should scan the two rows(letter Agg group) 
Do i say right?


------------------ 原始邮件 ------------------
发件人: "a.ramonportoles";<a....@gmail.com>;
发送时间: 2016年10月28日(星期五) 下午3:43
收件人: "dev"<de...@kylin.apache.org>; 

主题: Re: Exceed scan threshold at 10000001



hummmm
but are you using "group by LO_CUSTKEY,LO_PARTKEY"

And limit apply to final result no to scan rows

Example:
table with two columns Letter / Number
A:1
A:2
B:1
B:2
B:3
B:4

select count (1), Letter from TB group by Letter limit 1
   Result: 2:A
   Scans 2 rows

select count (1), Letter from TB group by Letter limit 2
   Result: 2:A
               4:B
   Scans 2 +4 rows


Alb



2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:

> Query1:select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY
> LIMIT 10000
>
>
> I find it scan 10000 rows from HBase
>
>
> Query2: select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY
> LIMIT 10001
>
>
> I find it scan 10000001 rows from Hbase
>
>
> I do not know why?  Should not  scan 10001 row?
>
>
> The two query i scan the same HTable KYLIN_78ROC49NQY
> Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "ShaoFeng Shi";<sh...@apache.org>;
> 发送时间: 2016年10月28日(星期五) 中午11:20
> 收件人: "dev"<de...@kylin.apache.org>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
> Alberto, thanks for your explaination, you got the points and is already an
> Kylin expert I believe.
>
> In order to protect HBase and Kylin from crashing by bad queries (which
> scan too many rows), Kylin add this mechnisam to interrupt when reach some
> threshold. Usually in an OLAP scenario, the result wouldn't be too large.
> This is also a reminder for user to rethink the design; If you really want
> to get the threshold be enlarged, you can allocate more memory to Kylin and
> set "kylin.query.mem.budget" to bigger value.
>
> 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
> > NOTE: I'm not a expert on Kylin  ;)
> >
> > Where is mandatory? No
> > Where is recommended? yes
> > Where bypass the threshold? No, I think this limit is hardcoded ¿?
> >
> > The real question must be: why this limit exists ?: (opinion)
> > - The target of Kylin is Real / Near RT, limit rows --> limit response
> time
> > - If Your are using JDBC, this is not a good option by performance
> > - Protect the HBase Coprocesor
> > - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
> > this new Dim
> >
> > For Extra-Large queries, you can also check:
> >  -kylin.query.mem.budget= 3GB
> >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you
> can
> > disable with -1)
> >
> > Good Luck, Alb
> >
> > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
> >
> > > Do you mean when i query, i should add where clause,
> > > but in some case, the number of records > threshold, how can i do?
> > > For example, order by all groups, the number of the  all groups >
> > > threshold
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Alberto Ramón";<a....@gmail.com>;
> > > 发送时间: 2016年10月27日(星期四) 下午5:47
> > > 收件人: "dev"<de...@kylin.apache.org>;
> > >
> > > 主题: Re: Exceed scan threshold at 10000001
> > >
> > >
> > >
> > >  ERROR: Scan row count exceeded threshold
> > >
> > > MailList
> > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
> 2BamN2AX3hetnF0%3D9uFnow%
> > > 40mail.gmail.com%3E>
> > > Kilin
> > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> > >
> > > *Scan row count exceeded threshold: 1000000, please add filter
> condition
> > to
> > > narrow down backend scan range, like where clause*
> > >
> > >
> > > BR, Alb
> > >
> > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
> > >
> > > > Hi
> > > >
> > > >
> > > > When i query a sql, I do not know why should scan hbase? How can i
> do?
> > > > Thanks!
> > > >
> > > >
> > > > Table: lineorder  12,000,000 row records
> > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > > Measures: count(1), sum(LO_REVENUE)
> > > >
> > > >
> > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > > >
> > > >
> > > > I build a cude with two Dimensions and two Measures(count and sum),
> the
> > > > size of the Htable is 98 MB, when i execute a query in insight, it
> > shows
> > > > Error in coprocessor; and i check the hbase log, i find blow messages
> > > >
> > > >
> > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > queue=1,port=16020]
> > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > queue=1,port=16020]
> > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > > >
> > > >
> > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > > queue=0,port=16020]
> > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > > queue=0,port=16020]
> > > > endpoint.CubeVisitService: The cube visit did not finish normally
> > because
> > > > scan num exceeds threshold
> > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
> scan
> > > > threshold at 10000001
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.
> > > HBaseReadonlyStore$1$1.
> > > > hasNext(HBaseReadonlyStore.java:111)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > > CubeVisitProtos.java:3952)
> > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > execService(HRegion.java:7815)
> > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > execServiceOnRegion(RSRpcServices.java:1986)
> > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > execService(RSRpcServices.java:1968)
> > > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > > ClientProtos$ClientService$2.callBlockingMethod(
> > ClientProtos.java:33652)
> > > >         at org.apache.hadoop.hbase.ipc.
> RpcServer.call(RpcServer.java:
> > > 2178)
> > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > > java:112)
> > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > > RpcExecutor.java:133)
> > > >         at org.apache.hadoop.hbase.ipc.
> RpcExecutor$1.run(RpcExecutor.
> > > > java:108)
> > > >         at java.lang.Thread.run(Thread.java:745)
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: Exceed scan threshold at 10000001

Posted by Alberto Ramón <a....@gmail.com>.

hummmm
but are you using "group by LO_CUSTKEY,LO_PARTKEY"

And limit apply to final result no to scan rows

Example:
table with two columns Letter / Number
A:1
A:2
B:1
B:2
B:3
B:4

select count (1), Letter from TB group by Letter limit 1
   Result: 2:A
   Scans 2 rows

select count (1), Letter from TB group by Letter limit 2
   Result: 2:A
               4:B
   Scans 2 +4 rows


Alb



2016-10-28 8:33 GMT+02:00 张磊 <12...@qq.com>:

> Query1:select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY
> LIMIT 10000
>
>
> I find it scan 10000 rows from HBase
>
>
> Query2: select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY
> LIMIT 10001
>
>
> I find it scan 10000001 rows from Hbase
>
>
> I do not know why?  Should not  scan 10001 row?
>
>
> The two query i scan the same HTable KYLIN_78ROC49NQY
> Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "ShaoFeng Shi";<sh...@apache.org>;
> 发送时间: 2016年10月28日(星期五) 中午11:20
> 收件人: "dev"<de...@kylin.apache.org>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
> Alberto, thanks for your explaination, you got the points and is already an
> Kylin expert I believe.
>
> In order to protect HBase and Kylin from crashing by bad queries (which
> scan too many rows), Kylin add this mechnisam to interrupt when reach some
> threshold. Usually in an OLAP scenario, the result wouldn't be too large.
> This is also a reminder for user to rethink the design; If you really want
> to get the threshold be enlarged, you can allocate more memory to Kylin and
> set "kylin.query.mem.budget" to bigger value.
>
> 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:
>
> > NOTE: I'm not a expert on Kylin  ;)
> >
> > Where is mandatory? No
> > Where is recommended? yes
> > Where bypass the threshold? No, I think this limit is hardcoded ¿?
> >
> > The real question must be: why this limit exists ?: (opinion)
> > - The target of Kylin is Real / Near RT, limit rows --> limit response
> time
> > - If Your are using JDBC, this is not a good option by performance
> > - Protect the HBase Coprocesor
> > - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
> > this new Dim
> >
> > For Extra-Large queries, you can also check:
> >  -kylin.query.mem.budget= 3GB
> >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you
> can
> > disable with -1)
> >
> > Good Luck, Alb
> >
> > 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
> >
> > > Do you mean when i query, i should add where clause,
> > > but in some case, the number of records > threshold, how can i do?
> > > For example, order by all groups, the number of the  all groups >
> > > threshold
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Alberto Ramón";<a....@gmail.com>;
> > > 发送时间: 2016年10月27日(星期四) 下午5:47
> > > 收件人: "dev"<de...@kylin.apache.org>;
> > >
> > > 主题: Re: Exceed scan threshold at 10000001
> > >
> > >
> > >
> > >  ERROR: Scan row count exceeded threshold
> > >
> > > MailList
> > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
> 2BamN2AX3hetnF0%3D9uFnow%
> > > 40mail.gmail.com%3E>
> > > Kilin
> > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> > >
> > > *Scan row count exceeded threshold: 1000000, please add filter
> condition
> > to
> > > narrow down backend scan range, like where clause*
> > >
> > >
> > > BR, Alb
> > >
> > > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
> > >
> > > > Hi
> > > >
> > > >
> > > > When i query a sql, I do not know why should scan hbase? How can i
> do?
> > > > Thanks!
> > > >
> > > >
> > > > Table: lineorder  12,000,000 row records
> > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > > Measures: count(1), sum(LO_REVENUE)
> > > >
> > > >
> > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > > >
> > > >
> > > > I build a cude with two Dimensions and two Measures(count and sum),
> the
> > > > size of the Htable is 98 MB, when i execute a query in insight, it
> > shows
> > > > Error in coprocessor; and i check the hbase log, i find blow messages
> > > >
> > > >
> > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > queue=1,port=16020]
> > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > queue=1,port=16020]
> > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > > >
> > > >
> > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > > queue=0,port=16020]
> > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > > queue=0,port=16020]
> > > > endpoint.CubeVisitService: The cube visit did not finish normally
> > because
> > > > scan num exceeds threshold
> > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
> scan
> > > > threshold at 10000001
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.
> > > HBaseReadonlyStore$1$1.
> > > > hasNext(HBaseReadonlyStore.java:111)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > > CubeVisitProtos.java:3952)
> > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > execService(HRegion.java:7815)
> > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > execServiceOnRegion(RSRpcServices.java:1986)
> > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > execService(RSRpcServices.java:1968)
> > > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > > ClientProtos$ClientService$2.callBlockingMethod(
> > ClientProtos.java:33652)
> > > >         at org.apache.hadoop.hbase.ipc.
> RpcServer.call(RpcServer.java:
> > > 2178)
> > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > > java:112)
> > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > > RpcExecutor.java:133)
> > > >         at org.apache.hadoop.hbase.ipc.
> RpcExecutor$1.run(RpcExecutor.
> > > > java:108)
> > > >         at java.lang.Thread.run(Thread.java:745)
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

回复： Exceed scan threshold at 10000001

Posted by 张磊 <12...@qq.com>.

Query1:select count(1),sum(LO_REVENUE) from lineorder group by LO_CUSTKEY,LO_PARTKEY
LIMIT 10000


I find it scan 10000 rows from HBase


Query2: select count(1),sum(LO_REVENUE) from lineorder group by LO_CUSTKEY,LO_PARTKEY
LIMIT 10001


I find it scan 10000001 rows from Hbase


I do not know why?  Should not  scan 10001 row?


The two query i scan the same HTable KYLIN_78ROC49NQY
Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY




------------------ 原始邮件 ------------------
发件人: "ShaoFeng Shi";<sh...@apache.org>;
发送时间: 2016年10月28日(星期五) 中午11:20
收件人: "dev"<de...@kylin.apache.org>; 

主题: Re: Exceed scan threshold at 10000001



Alberto, thanks for your explaination, you got the points and is already an
Kylin expert I believe.

In order to protect HBase and Kylin from crashing by bad queries (which
scan too many rows), Kylin add this mechnisam to interrupt when reach some
threshold. Usually in an OLAP scenario, the result wouldn't be too large.
This is also a reminder for user to rethink the design; If you really want
to get the threshold be enlarged, you can allocate more memory to Kylin and
set "kylin.query.mem.budget" to bigger value.

2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> NOTE: I'm not a expert on Kylin  ;)
>
> Where is mandatory? No
> Where is recommended? yes
> Where bypass the threshold? No, I think this limit is hardcoded ¿?
>
> The real question must be: why this limit exists ?: (opinion)
> - The target of Kylin is Real / Near RT, limit rows --> limit response time
> - If Your are using JDBC, this is not a good option by performance
> - Protect the HBase Coprocesor
> - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
> this new Dim
>
> For Extra-Large queries, you can also check:
>  -kylin.query.mem.budget= 3GB
>  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you can
> disable with -1)
>
> Good Luck, Alb
>
> 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
>
> > Do you mean when i query, i should add where clause,
> > but in some case, the number of records > threshold, how can i do?
> > For example, order by all groups, the number of the  all groups >
> > threshold
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Alberto Ramón";<a....@gmail.com>;
> > 发送时间: 2016年10月27日(星期四) 下午5:47
> > 收件人: "dev"<de...@kylin.apache.org>;
> >
> > 主题: Re: Exceed scan threshold at 10000001
> >
> >
> >
> >  ERROR: Scan row count exceeded threshold
> >
> > MailList
> > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%2BamN2AX3hetnF0%3D9uFnow%
> > 40mail.gmail.com%3E>
> > Kilin
> > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> >
> > *Scan row count exceeded threshold: 1000000, please add filter condition
> to
> > narrow down backend scan range, like where clause*
> >
> >
> > BR, Alb
> >
> > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
> >
> > > Hi
> > >
> > >
> > > When i query a sql, I do not know why should scan hbase? How can i do?
> > > Thanks!
> > >
> > >
> > > Table: lineorder  12,000,000 row records
> > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > Measures: count(1), sum(LO_REVENUE)
> > >
> > >
> > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > >
> > >
> > > I build a cude with two Dimensions and two Measures(count and sum), the
> > > size of the Htable is 98 MB, when i execute a query in insight, it
> shows
> > > Error in coprocessor; and i check the hbase log, i find blow messages
> > >
> > >
> > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > queue=1,port=16020]
> > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > queue=1,port=16020]
> > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > >
> > >
> > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > queue=0,port=16020]
> > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > queue=0,port=16020]
> > > endpoint.CubeVisitService: The cube visit did not finish normally
> because
> > > scan num exceeds threshold
> > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed scan
> > > threshold at 10000001
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > >         at org.apache.kylin.storage.hbase.cube.v2.
> > HBaseReadonlyStore$1$1.
> > > hasNext(HBaseReadonlyStore.java:111)
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > CubeVisitProtos.java:3952)
> > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > execService(HRegion.java:7815)
> > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > execServiceOnRegion(RSRpcServices.java:1986)
> > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > execService(RSRpcServices.java:1968)
> > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > ClientProtos$ClientService$2.callBlockingMethod(
> ClientProtos.java:33652)
> > >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> > 2178)
> > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > java:112)
> > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > RpcExecutor.java:133)
> > >         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > > java:108)
> > >         at java.lang.Thread.run(Thread.java:745)
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Exceed scan threshold at 10000001

Posted by ShaoFeng Shi <sh...@apache.org>.

Alberto, thanks for your explaination, you got the points and is already an
Kylin expert I believe.

In order to protect HBase and Kylin from crashing by bad queries (which
scan too many rows), Kylin add this mechnisam to interrupt when reach some
threshold. Usually in an OLAP scenario, the result wouldn't be too large.
This is also a reminder for user to rethink the design; If you really want
to get the threshold be enlarged, you can allocate more memory to Kylin and
set "kylin.query.mem.budget" to bigger value.

2016-10-27 18:39 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> NOTE: I'm not a expert on Kylin  ;)
>
> Where is mandatory? No
> Where is recommended? yes
> Where bypass the threshold? No, I think this limit is hardcoded ¿?
>
> The real question must be: why this limit exists ?: (opinion)
> - The target of Kylin is Real / Near RT, limit rows --> limit response time
> - If Your are using JDBC, this is not a good option by performance
> - Protect the HBase Coprocesor
> - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
> this new Dim
>
> For Extra-Large queries, you can also check:
>  -kylin.query.mem.budget= 3GB
>  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you can
> disable with -1)
>
> Good Luck, Alb
>
> 2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:
>
> > Do you mean when i query, i should add where clause,
> > but in some case, the number of records > threshold, how can i do?
> > For example, order by all groups, the number of the  all groups >
> > threshold
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Alberto Ramón";<a....@gmail.com>;
> > 发送时间: 2016年10月27日(星期四) 下午5:47
> > 收件人: "dev"<de...@kylin.apache.org>;
> >
> > 主题: Re: Exceed scan threshold at 10000001
> >
> >
> >
> >  ERROR: Scan row count exceeded threshold
> >
> > MailList
> > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%2BamN2AX3hetnF0%3D9uFnow%
> > 40mail.gmail.com%3E>
> > Kilin
> > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> >
> > *Scan row count exceeded threshold: 1000000, please add filter condition
> to
> > narrow down backend scan range, like where clause*
> >
> >
> > BR, Alb
> >
> > 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
> >
> > > Hi
> > >
> > >
> > > When i query a sql, I do not know why should scan hbase? How can i do?
> > > Thanks!
> > >
> > >
> > > Table: lineorder  12,000,000 row records
> > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > Measures: count(1), sum(LO_REVENUE)
> > >
> > >
> > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > >
> > >
> > > I build a cude with two Dimensions and two Measures(count and sum), the
> > > size of the Htable is 98 MB, when i execute a query in insight, it
> shows
> > > Error in coprocessor; and i check the hbase log, i find blow messages
> > >
> > >
> > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > queue=1,port=16020]
> > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > queue=1,port=16020]
> > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > >
> > >
> > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > queue=0,port=16020]
> > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > queue=0,port=16020]
> > > endpoint.CubeVisitService: The cube visit did not finish normally
> because
> > > scan num exceeds threshold
> > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed scan
> > > threshold at 10000001
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > >         at org.apache.kylin.storage.hbase.cube.v2.
> > HBaseReadonlyStore$1$1.
> > > hasNext(HBaseReadonlyStore.java:111)
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > CubeVisitProtos.java:3952)
> > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > execService(HRegion.java:7815)
> > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > execServiceOnRegion(RSRpcServices.java:1986)
> > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > execService(RSRpcServices.java:1968)
> > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > ClientProtos$ClientService$2.callBlockingMethod(
> ClientProtos.java:33652)
> > >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> > 2178)
> > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > java:112)
> > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > RpcExecutor.java:133)
> > >         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > > java:108)
> > >         at java.lang.Thread.run(Thread.java:745)
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Exceed scan threshold at 10000001

Posted by Alberto Ramón <a....@gmail.com>.

NOTE: I'm not a expert on Kylin  ;)

Where is mandatory? No
Where is recommended? yes
Where bypass the threshold? No, I think this limit is hardcoded ¿?

The real question must be: why this limit exists ?: (opinion)
- The target of Kylin is Real / Near RT, limit rows --> limit response time
- If Your are using JDBC, this is not a good option by performance
- Protect the HBase Coprocesor
- Perhaps you need a new Dim, to precalculate This Aggregate or filter by
this new Dim

For Extra-Large queries, you can also check:
 -kylin.query.mem.budget= 3GB
 -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you can
disable with -1)

Good Luck, Alb

2016-10-27 11:56 GMT+02:00 张磊 <12...@qq.com>:

> Do you mean when i query, i should add where clause,
> but in some case, the number of records > threshold, how can i do?
> For example, order by all groups, the number of the  all groups >
> threshold
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Alberto Ramón";<a....@gmail.com>;
> 发送时间: 2016年10月27日(星期四) 下午5:47
> 收件人: "dev"<de...@kylin.apache.org>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
>  ERROR: Scan row count exceeded threshold
>
> MailList
> <http://mail-archives.apache.org/mod_mbox/kylin-user/
> 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%2BamN2AX3hetnF0%3D9uFnow%
> 40mail.gmail.com%3E>
> Kilin
> 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
>
> *Scan row count exceeded threshold: 1000000, please add filter condition to
> narrow down backend scan range, like where clause*
>
>
> BR, Alb
>
> 2016-10-27 11:40 GMT+02:00 张磊 <12...@qq.com>:
>
> > Hi
> >
> >
> > When i query a sql, I do not know why should scan hbase? How can i do?
> > Thanks!
> >
> >
> > Table: lineorder  12,000,000 row records
> > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > Measures: count(1), sum(LO_REVENUE)
> >
> >
> > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> >
> >
> > I build a cude with two Dimensions and two Measures(count and sum), the
> > size of the Htable is 98 MB, when i execute a query in insight, it shows
> > Error in coprocessor; and i check the hbase log, i find blow messages
> >
> >
> > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> queue=1,port=16020]
> > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> queue=1,port=16020]
> > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> >
> >
> > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> queue=0,port=16020]
> > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> queue=0,port=16020]
> > endpoint.CubeVisitService: The cube visit did not finish normally because
> > scan num exceeds threshold
> > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed scan
> > threshold at 10000001
> >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> >         at org.apache.kylin.storage.hbase.cube.v2.
> HBaseReadonlyStore$1$1.
> > hasNext(HBaseReadonlyStore.java:111)
> >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > CubeVisitProtos.java:3952)
> >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > execService(HRegion.java:7815)
> >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > execServiceOnRegion(RSRpcServices.java:1986)
> >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > execService(RSRpcServices.java:1968)
> >         at org.apache.hadoop.hbase.protobuf.generated.
> > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652)
> >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> 2178)
> >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:112)
> >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > RpcExecutor.java:133)
> >         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > java:108)
> >         at java.lang.Thread.run(Thread.java:745)
>