You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Singh Sonu <so...@gmail.com> on 2022/06/03 13:14:16 UTC

dictionary cannot be bigger than 2GB

Hi Experts,

Any help on this will be appreciated.
We are building a cube and getting the below issue with Fact Distinct Job
Step:

2022-06-03 07:41:22,088 ERROR [IPC Server handler 7 on 38377]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1654084270426_0161_r_000000_0 - exited :
org.apache.kylin.common.exceptions.TooBigDictionaryException: Too big
dictionary, dictionary cannot be bigger than 2GB
at
org.apache.kylin.dict.TrieDictionaryForestBuilder.checkDictSize(TrieDictionaryForestBuilder.java:143)
at
org.apache.kylin.dict.TrieDictionaryForestBuilder.addTree(TrieDictionaryForestBuilder.java:132)
at
org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:104)
at
org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:219)
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:203)
at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:96)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


Please help, we got stuck with this issue.
Kylin Version - 3.1.3
Hadoop - 3.1.0

In code, I updated to Dict default size from 2GB to 4GB and after this
change, we are getting the below error:

Failure task Diagnostics:
Error: java.lang.NegativeArraySizeException
at
org.apache.commons.io.output.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:366)
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.outputDict(FactDistinctColumnsReducer.java:235)
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:204)
at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:96)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:226)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:172)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:62)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:172)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:106)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Also, we tried this step in Spark and got stuck by the below error:































*26_0141/container_e08_1654084270426_0141_01_000010/__app__.jar!/kylin-defaults.properties2022-06-02
17:48:19,971 WARN common.KylinConfigBase: KYLIN_HOME was not set2022-06-02
17:48:19,974 ERROR executor.Executor: Exception in task 0.0 in stage 1.0
(TID 226)org.apache.kylin.common.KylinConfigCannotInitException: Didn't
find QUBZ_CONF or QUBZ_HOME, please set one of themat
org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:341)at
org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:383)at
org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:363)at
org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:142)at
org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:430)at
org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:415)at
org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:95)at
org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:72)at
org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.addValue(DictionaryGenerator.java:213)at
org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.calculateColData(SparkFactDistinct.java:823)at
org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:758)at
org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:642)at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)at
org.apache.spark.rdd.RDD.iterator(RDD.scala:310)at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at
org.apache.spark.scheduler.Task.run(Task.scala:123)at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)at
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at
java.lang.Thread.run(Thread.java:748)*


 You can reach me out at
 Mb. No- 7092292112
 Email- sonusingh.javatech@gmail.com

 with regards,
 Sonu Kumar Singh