You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Johnson <it...@163.com> on 2020/03/08 10:21:18 UTC

kylin超高基维查询奔溃

最近构建了一个包含超高基维的cube,查询时直接 hbase 协处理器超时,sql如下:select sum(filesize)/1024/1024 fs ,path6 from impala_monitor.V_MONITOR_HDFS_INFO where par_dt = '2020-03-06' group by path6 order by fs desc limit 100 。
path6 这个维度 基数在10亿+。大家在处理这种超高基维度时有什么优化吗?


报错:
org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Coprocessor passed deadline! Maybe server is overloaded at org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.checkDeadline(CubeVisitService.java:226) at org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:261) at org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:5555) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7996) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1986) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1968) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) while executing SQL: "select sum(filesize)/1024/1024 fs ,path6 from impala_monitor.V_MONITOR_HDFS_INFO where par_dt = '2020-03-06' group by path6 order by fs desc limit 100"

Re: kylin超高基维查询奔溃

Posted by ShaoFeng Shi <sh...@apache.org>.
Sorting on such a high cardinality dimension in memory is very hard to
finish in seconds. Please try Top-N pre-calculation:

https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Johnson <it...@163.com> 于2020年3月8日周日 下午6:21写道:

> 最近构建了一个包含超高基维的cube,查询时直接 hbase 协处理器超时,sql如下:select
> sum(filesize)/1024/1024 fs ,path6 from impala_monitor.V_MONITOR_HDFS_INFO
> where par_dt = '2020-03-06' group by path6 order by fs desc limit 100 。
> path6 这个维度 基数在10亿+。大家在处理这种超高基维度时有什么优化吗?
>
> 报错:
> org.apache.hadoop.hbase.DoNotRetryIOException:
> org.apache.hadoop.hbase.DoNotRetryIOException: Coprocessor passed deadline!
> Maybe server is overloaded at
> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.checkDeadline(CubeVisitService.java:226)
> at
> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService.visitCube(CubeVisitService.java:261)
> at
> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(CubeVisitProtos.java:5555)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7996)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1986)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1968)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) at
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163)
> while executing SQL: "select sum(filesize)/1024/1024 fs ,path6 from
> impala_monitor.V_MONITOR_HDFS_INFO where par_dt = '2020-03-06' group by
> path6 order by fs desc limit 100"
>
>
>
>