You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (JIRA)" <ji...@apache.org> on 2018/04/19 08:26:00 UTC

[jira] [Comment Edited] (HIVEMALL-194) Improve the thoughtput of LDA training

    [ https://issues.apache.org/jira/browse/HIVEMALL-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443724#comment-16443724 ] 

Makoto Yui edited comment on HIVEMALL-194 at 4/19/18 8:25 AM:
--------------------------------------------------------------

I'm not sure why perplexity is increasing. It could happen though.


was (Author: myui):
I'm not sure why perplexity is increasing.

> Improve the thoughtput of LDA training
> --------------------------------------
>
>                 Key: HIVEMALL-194
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-194
>             Project: Hivemall
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Makoto Yui
>            Assignee: Makoto Yui
>            Priority: Minor
>             Fix For: 0.5.2
>
>
> LDA training performance was not good for a production workload.
> Better to do profiling and improve the training throughput. (cc: [~nzw] )
> {code}
> 2018-04-18 06:32:01,410 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Wrote 341047 records to a temporary file for iterative training: /mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/18/appcache/application_1522730964147_209083/container_1522730964147_209083_01_000004/tmp/hivemall_topicmodel8295452490442575792.sgmt (259.4 MiB)
> 2018-04-18 07:50:41,979 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4437.724
> 2018-04-18 09:05:55,765 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4579.0825
> 2018-04-18 10:21:48,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4651.425
> 2018-04-18 11:37:47,772 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4711.779
> 2018-04-18 12:58:02,262 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4739.12
> 2018-04-18 14:15:19,689 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4774.822
> 2018-04-18 15:30:12,067 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4788.2305
> 2018-04-18 16:51:48,425 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4808.8013
> 2018-04-18 18:31:14,548 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4826.866
> 2018-04-18 19:49:41,266 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4834.5537
> 2018-04-18 21:13:19,976 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4850.7837
> 2018-04-18 22:29:45,115 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4848.2095
> 2018-04-18 23:48:47,483 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4867.3945
> 2018-04-19 01:09:23,242 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4861.012
> 2018-04-19 02:24:50,819 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.5796
> 2018-04-19 03:42:27,052 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.4126
> 2018-04-19 04:57:24,786 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4880.2183
> 2018-04-19 06:12:14,056 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4889.6064
> 2018-04-19 07:27:26,864 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4885.1523
> 2018-04-19 07:27:26,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Performed 20 iterations of 341,047 training examples on a secondary storage (thus 6,820,940 training updates in total)
> 2018-04-19 07:27:27,078 WARN [Thread-5] org.apache.hadoop.hive.ql.exec.GroupByOperator: Disable Hash Aggr: #hash table = 99999 #total = 100000 reduction = 0.0 minReduction = 0.5
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)