You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Ahmad Hammad <a....@beyegroup.com> on 2021/01/13 11:25:25 UTC

cube build failing in step 3 -memory heap issue

Dear ,

hope all is well,

we are looking to use Apache Kylin instead of SSAS for our business analysis -dashboard product . we are facing a problem in building the cube , it contains two hive tables one fact table and one dimension table .

fact table total number of rows is 47271784  and total size is 5326550430 as shown in show tblproperties query in hive cmd .

and dimision tble totoal number of rows is 5261766 and total size is 1174440814 as shown in show tblproperties query in hive cmd.




the build process failed in step 3 //
 #3 Step Name: Extract Fact Table Distinct Columns
Data Size: 16.19 KB
Duration: 11.78 mins Waiting: 13 seconds


the logs give Java heap space Error as follow :

org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 55
File System Counters
FILE: Number of bytes read=323698
FILE: Number of bytes written=29783830
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=252673677
HDFS: Number of bytes written=16576
HDFS: Number of read operations=195
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Failed reduce tasks=4
Launched map tasks=47
Launched reduce tasks=5
Data-local map tasks=47
Total time spent by all maps in occupied slots (ms)=4363352
Total time spent by all reduces in occupied slots (ms)=2032100
Total time spent by all map tasks (ms)=1090838
Total time spent by all reduce tasks (ms)=508025
Total vcore-milliseconds taken by all map tasks=1090838
Total vcore-milliseconds taken by all reduce tasks=508025
Total megabyte-milliseconds taken by all map tasks=1117018112
Total megabyte-milliseconds taken by all reduce tasks=520217600
Map-Reduce Framework
Map input records=47271784
Map output records=5261813
Map output bytes=57539075
Map output materialized bytes=15536194
Input split bytes=138932
Combine input records=5261813
Combine output records=5261813
Reduce input groups=1
Reduce shuffle bytes=340412
Reduce input records=47
Reduce output records=0
Spilled Records=5261860
Shuffled Maps =47
Failed Shuffles=0
Merged Map outputs=47
GC time elapsed (ms)=68095
CPU time spent (ms)=1246430
Physical memory (bytes) snapshot=44485660672
Virtual memory (bytes) snapshot=137661587456
Total committed heap usage (bytes)=41749577728
Peak Map Physical memory (bytes)=960831488
Peak Map Virtual memory (bytes)=2891886592
Peak Reduce Physical memory (bytes)=305377280
Peak Reduce Virtual memory (bytes)=2667810816
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter
BYTES=1563833108
Job Diagnostics:Task failed task_1610370996803_0012_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0

Failure task Diagnostics:
Error: Java heap space

at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:234)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


i tried to increase the memory located to Kylin to 17 gb in the setenv.sh file as recommended

 as follow in setenv.sh file

export KYLIN_JVM_SETTINGS="-Xms17g -Xmx17g -Xss1024K -XX:MaxPermSize=1g -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.%p -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

but still give this error ,

aim using Kylin v3.1.1 on HDP 3.0 , the server resources are 32 GB RAM and 4 cores i7 CPU.

please let me know if you need any more information from my side . to guide us where is the problem with the needed solution , and recommended setting .

your quick response is highly appreciated , we need to know how much Kylin is reliable and what level of support it provides .

best regards

Ahmad Hammad
chief technology officer
webiste:http://beyegroup.com/
mobile:962 79640 1490
email:a.hammad@beyegroup.com

Re: cube build failing in step 3 -memory heap issue

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi,

You can check the "Extract Fact Table Distinct Columns" section in
https://kylin.apache.org/docs/howto/howto_optimize_build.html

Usually it may be caused by: 1) cube may have too many dimensions; 2) there
is ultra high cardinality column in the dimension list (e.g, a UUID column,
timestamp column, etc); 3) hadoop map/reduce memory configuration is small.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Ahmad Hammad <a....@beyegroup.com> 于2021年1月14日周四 上午11:22写道:

> Dear ,
>
> hope all is well,
>
> we are looking to use Apache Kylin instead of SSAS for our business
> analysis -dashboard product . we are facing a problem in building the cube
> , it contains two hive tables one fact table and one dimension table .
>
> fact table total number of rows is 47271784  and total size is 5326550430
> as shown in show tblproperties query in hive cmd .
>
> and dimision tble totoal number of rows is 5261766 and total size is
> 1174440814 as shown in show tblproperties query in hive cmd.
>
>
>
>
> the build process failed in step 3 //
>  #3 Step Name: Extract Fact Table Distinct Columns
> Data Size: 16.19 KB
> Duration: 11.78 mins Waiting: 13 seconds
>
>
> the logs give Java heap space Error as follow :
>
> org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 55
> File System Counters
> FILE: Number of bytes read=323698
> FILE: Number of bytes written=29783830
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=252673677
> HDFS: Number of bytes written=16576
> HDFS: Number of read operations=195
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=3
> Job Counters
> Failed reduce tasks=4
> Launched map tasks=47
> Launched reduce tasks=5
> Data-local map tasks=47
> Total time spent by all maps in occupied slots (ms)=4363352
> Total time spent by all reduces in occupied slots (ms)=2032100
> Total time spent by all map tasks (ms)=1090838
> Total time spent by all reduce tasks (ms)=508025
> Total vcore-milliseconds taken by all map tasks=1090838
> Total vcore-milliseconds taken by all reduce tasks=508025
> Total megabyte-milliseconds taken by all map tasks=1117018112
> Total megabyte-milliseconds taken by all reduce tasks=520217600
> Map-Reduce Framework
> Map input records=47271784
> Map output records=5261813
> Map output bytes=57539075
> Map output materialized bytes=15536194
> Input split bytes=138932
> Combine input records=5261813
> Combine output records=5261813
> Reduce input groups=1
> Reduce shuffle bytes=340412
> Reduce input records=47
> Reduce output records=0
> Spilled Records=5261860
> Shuffled Maps =47
> Failed Shuffles=0
> Merged Map outputs=47
> GC time elapsed (ms)=68095
> CPU time spent (ms)=1246430
> Physical memory (bytes) snapshot=44485660672
> Virtual memory (bytes) snapshot=137661587456
> Total committed heap usage (bytes)=41749577728
> Peak Map Physical memory (bytes)=960831488
> Peak Map Virtual memory (bytes)=2891886592
> Peak Reduce Physical memory (bytes)=305377280
> Peak Reduce Virtual memory (bytes)=2667810816
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=0
> File Output Format Counters
> Bytes Written=0
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter
> BYTES=1563833108
> Job Diagnostics:Task failed task_1610370996803_0012_r_000000
> Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0
> killedReduces: 0
>
> Failure task Diagnostics:
> Error: Java heap space
>
> at org.apache.kylin.engine.mr
> .common.MapReduceExecutable.doWork(MapReduceExecutable.java:234)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> i tried to increase the memory located to Kylin to 17 gb in the setenv.sh
> file as recommended
>
>  as follow in setenv.sh file
>
> export KYLIN_JVM_SETTINGS="-Xms17g -Xmx17g -Xss1024K -XX:MaxPermSize=1g
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:$KYLIN_HOME/logs/kylin.gc.%p -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
>
> but still give this error ,
>
> aim using Kylin v3.1.1 on HDP 3.0 , the server resources are 32 GB RAM and
> 4 cores i7 CPU.
>
> please let me know if you need any more information from my side . to
> guide us where is the problem with the needed solution , and recommended
> setting .
>
> your quick response is highly appreciated , we need to know how much Kylin
> is reliable and what level of support it provides .
>
> best regards
>
> Ahmad Hammad
> chief technology officer
> webiste:http://beyegroup.com/
> mobile:962 79640 1490
> email:a.hammad@beyegroup.com
>

Re: cube build failing in step 3 -memory heap issue

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi,

You can check the "Extract Fact Table Distinct Columns" section in
https://kylin.apache.org/docs/howto/howto_optimize_build.html

Usually it may be caused by: 1) cube may have too many dimensions; 2) there
is ultra high cardinality column in the dimension list (e.g, a UUID column,
timestamp column, etc); 3) hadoop map/reduce memory configuration is small.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Ahmad Hammad <a....@beyegroup.com> 于2021年1月14日周四 上午11:22写道:

> Dear ,
>
> hope all is well,
>
> we are looking to use Apache Kylin instead of SSAS for our business
> analysis -dashboard product . we are facing a problem in building the cube
> , it contains two hive tables one fact table and one dimension table .
>
> fact table total number of rows is 47271784  and total size is 5326550430
> as shown in show tblproperties query in hive cmd .
>
> and dimision tble totoal number of rows is 5261766 and total size is
> 1174440814 as shown in show tblproperties query in hive cmd.
>
>
>
>
> the build process failed in step 3 //
>  #3 Step Name: Extract Fact Table Distinct Columns
> Data Size: 16.19 KB
> Duration: 11.78 mins Waiting: 13 seconds
>
>
> the logs give Java heap space Error as follow :
>
> org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 55
> File System Counters
> FILE: Number of bytes read=323698
> FILE: Number of bytes written=29783830
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=252673677
> HDFS: Number of bytes written=16576
> HDFS: Number of read operations=195
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=3
> Job Counters
> Failed reduce tasks=4
> Launched map tasks=47
> Launched reduce tasks=5
> Data-local map tasks=47
> Total time spent by all maps in occupied slots (ms)=4363352
> Total time spent by all reduces in occupied slots (ms)=2032100
> Total time spent by all map tasks (ms)=1090838
> Total time spent by all reduce tasks (ms)=508025
> Total vcore-milliseconds taken by all map tasks=1090838
> Total vcore-milliseconds taken by all reduce tasks=508025
> Total megabyte-milliseconds taken by all map tasks=1117018112
> Total megabyte-milliseconds taken by all reduce tasks=520217600
> Map-Reduce Framework
> Map input records=47271784
> Map output records=5261813
> Map output bytes=57539075
> Map output materialized bytes=15536194
> Input split bytes=138932
> Combine input records=5261813
> Combine output records=5261813
> Reduce input groups=1
> Reduce shuffle bytes=340412
> Reduce input records=47
> Reduce output records=0
> Spilled Records=5261860
> Shuffled Maps =47
> Failed Shuffles=0
> Merged Map outputs=47
> GC time elapsed (ms)=68095
> CPU time spent (ms)=1246430
> Physical memory (bytes) snapshot=44485660672
> Virtual memory (bytes) snapshot=137661587456
> Total committed heap usage (bytes)=41749577728
> Peak Map Physical memory (bytes)=960831488
> Peak Map Virtual memory (bytes)=2891886592
> Peak Reduce Physical memory (bytes)=305377280
> Peak Reduce Virtual memory (bytes)=2667810816
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=0
> File Output Format Counters
> Bytes Written=0
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter
> BYTES=1563833108
> Job Diagnostics:Task failed task_1610370996803_0012_r_000000
> Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0
> killedReduces: 0
>
> Failure task Diagnostics:
> Error: Java heap space
>
> at org.apache.kylin.engine.mr
> .common.MapReduceExecutable.doWork(MapReduceExecutable.java:234)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> i tried to increase the memory located to Kylin to 17 gb in the setenv.sh
> file as recommended
>
>  as follow in setenv.sh file
>
> export KYLIN_JVM_SETTINGS="-Xms17g -Xmx17g -Xss1024K -XX:MaxPermSize=1g
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:$KYLIN_HOME/logs/kylin.gc.%p -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
>
> but still give this error ,
>
> aim using Kylin v3.1.1 on HDP 3.0 , the server resources are 32 GB RAM and
> 4 cores i7 CPU.
>
> please let me know if you need any more information from my side . to
> guide us where is the problem with the needed solution , and recommended
> setting .
>
> your quick response is highly appreciated , we need to know how much Kylin
> is reliable and what level of support it provides .
>
> best regards
>
> Ahmad Hammad
> chief technology officer
> webiste:http://beyegroup.com/
> mobile:962 79640 1490
> email:a.hammad@beyegroup.com
>