You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (JIRA)" <ji...@apache.org> on 2018/04/19 08:25:00 UTC

[jira] [Created] (HIVEMALL-194) Improve the thoughtput of LDA training

Makoto Yui created HIVEMALL-194:
-----------------------------------

             Summary: Improve the thoughtput of LDA training
                 Key: HIVEMALL-194
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-194
             Project: Hivemall
          Issue Type: Improvement
    Affects Versions: 0.5.0
            Reporter: Makoto Yui
            Assignee: Makoto Yui
             Fix For: 0.5.2


LDA training performance was not good for a production workload.
Better to do profiling and improve the training throughput. (cc: [~nzw] )

{code}
2018-04-18 06:32:01,410 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Wrote 341047 records to a temporary file for iterative training: /mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/18/appcache/application_1522730964147_209083/container_1522730964147_209083_01_000004/tmp/hivemall_topicmodel8295452490442575792.sgmt (259.4 MiB)
2018-04-18 07:50:41,979 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4437.724
2018-04-18 09:05:55,765 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4579.0825
2018-04-18 10:21:48,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4651.425
2018-04-18 11:37:47,772 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4711.779
2018-04-18 12:58:02,262 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4739.12
2018-04-18 14:15:19,689 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4774.822
2018-04-18 15:30:12,067 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4788.2305
2018-04-18 16:51:48,425 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4808.8013
2018-04-18 18:31:14,548 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4826.866
2018-04-18 19:49:41,266 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4834.5537
2018-04-18 21:13:19,976 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4850.7837
2018-04-18 22:29:45,115 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4848.2095
2018-04-18 23:48:47,483 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4867.3945
2018-04-19 01:09:23,242 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4861.012
2018-04-19 02:24:50,819 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.5796
2018-04-19 03:42:27,052 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.4126
2018-04-19 04:57:24,786 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4880.2183
2018-04-19 06:12:14,056 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4889.6064
2018-04-19 07:27:26,864 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4885.1523
2018-04-19 07:27:26,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Performed 20 iterations of 341,047 training examples on a secondary storage (thus 6,820,940 training updates in total)
2018-04-19 07:27:27,078 WARN [Thread-5] org.apache.hadoop.hive.ql.exec.GroupByOperator: Disable Hash Aggr: #hash table = 99999 #total = 100000 reduction = 0.0 minReduction = 0.5
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)