You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Árki Gábor <ar...@gmail.com> on 2021/07/16 11:03:35 UTC

Unique Kylin coprocessor per table causes excessive disk usage

 Dear All,

We have been noticing an issue with our long-running Kylin clusters (Kylin
3.1.0 and HBase 1.4.10 / EMR 5.28).
As our data grows, the HBase region servers are running out of disk space.
This seems to be happening because Kylin is configuring a uniquely named
coprocessor jar for each table. HBase region servers are downloading these
jars to a tmp folder but probably because each table has a uniquely named
jar, this coprocessor is now duplicated as many times as there are tables
created by Kylin. This issue has been reported in KYLIN-5022
<https://issues.apache.org/jira/browse/KYLIN-5022> by someone else and I
also added some findings.

For now, the only workaround I found was extending the disk size for our
clusters but that is not a great solution for scaling and cost perspective.
Is there a way to reconfigure this behavior? Is it even intentional to use
a unique jar name for every table?

Regards,
Gabor