You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (Jira)" <ji...@apache.org> on 2019/11/25 07:20:00 UTC
[jira] [Updated] (HIVEMALL-199) Reduce memory usage of lda_predict
[ https://issues.apache.org/jira/browse/HIVEMALL-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Makoto Yui updated HIVEMALL-199:
--------------------------------
Fix Version/s: (was: 0.6.0)
0.7.0
> Reduce memory usage of lda_predict
> ----------------------------------
>
> Key: HIVEMALL-199
> URL: https://issues.apache.org/jira/browse/HIVEMALL-199
> Project: Hivemall
> Issue Type: Wish
> Affects Versions: 0.5.0
> Reporter: Makoto Yui
> Assignee: Makoto Yui
> Priority: Major
> Fix For: 0.7.0
>
>
> LDA predict does not provide [@AggregationType(estimable = true)|https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/sketch/hll/ApproxCountDistinctUDAF.java#L233] and then optimizer does not perform reduce parallelization.
> And, we should revise LDAPredictUDAF to use less memory to avoid OOM.
> {code}
> 2018-04-23 04:04:34,081 FATAL [Thread-5] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
> at org.apache.hadoop.io.Text.decode(Text.java:389)
> at org.apache.hadoop.io.Text.toString(Text.java:280)
> at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
> at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getString(PrimitiveObjectInspectorUtils.java:823)
> at hivemall.topicmodel.LDAPredictUDAF$Evaluator.iterate(LDAPredictUDAF.java:298)
> at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:184)
> at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838)
> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735)
> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)
> at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:651)
> at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:654)
> at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
> at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:311)
> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)