You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (Jira)" <ji...@apache.org> on 2019/11/25 07:20:00 UTC

[jira] [Updated] (HIVEMALL-199) Reduce memory usage of lda_predict

     [ https://issues.apache.org/jira/browse/HIVEMALL-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Makoto Yui updated HIVEMALL-199:
--------------------------------
    Fix Version/s:     (was: 0.6.0)
                   0.7.0

> Reduce memory usage of lda_predict
> ----------------------------------
>
>                 Key: HIVEMALL-199
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-199
>             Project: Hivemall
>          Issue Type: Wish
>    Affects Versions: 0.5.0
>            Reporter: Makoto Yui
>            Assignee: Makoto Yui
>            Priority: Major
>             Fix For: 0.7.0
>
>
> LDA predict does not provide [@AggregationType(estimable = true)|https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/sketch/hll/ApproxCountDistinctUDAF.java#L233] and then optimizer does not perform reduce parallelization.
> And, we should revise LDAPredictUDAF to use less memory to avoid OOM.
> {code}
> 2018-04-23 04:04:34,081 FATAL [Thread-5] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
>     at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
>     at org.apache.hadoop.io.Text.decode(Text.java:389)
>     at org.apache.hadoop.io.Text.toString(Text.java:280)
>     at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
>     at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getString(PrimitiveObjectInspectorUtils.java:823)
>     at hivemall.topicmodel.LDAPredictUDAF$Evaluator.iterate(LDAPredictUDAF.java:298)
>     at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:184)
>     at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
>     at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838)
>     at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735)
>     at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>     at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>     at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)
>     at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:651)
>     at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:654)
>     at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
>     at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:311)
>     at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)