You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Himanshu Pandey (Jira)" <ji...@apache.org> on 2019/08/19 16:05:00 UTC

[jira] [Comment Edited] (MADLIB-1351) Add stopping criteria on perplexity to LDA

    [ https://issues.apache.org/jira/browse/MADLIB-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909396#comment-16909396 ] 

Himanshu Pandey edited comment on MADLIB-1351 at 8/19/19 4:04 PM:
------------------------------------------------------------------

[~fmcquillan]
{code:java}
If 'iter_num=5' and 'evaluate_every=1', then 'perplexity_iters' value would be {1,2,3,4,5}{code}
However,we are also updating the model table one final time after all iterations are completed. So, Perplexity will have 6 values something like this: 
{code:java}
{74.9531135523,70.7078733742,69.531331269,68.3480936661,72.3446381087,68.940249051}{code}
What we will update in the perplexity_iters for the "final update" of the model table?


was (Author: hpandey@pivotal.io):
[~fmcquillan]


{code}If 'iter_num=5' and 'evaluate_every=1', then 'perplexity_iters' value would be {1,2,3,4,5}{code}

However,we are also updating the model table one final time after all iterations are completed. So, Perplexity will have 6 values something like this: 
{code:java}
{73.7550415613786,70.5237666023843,70.6146354978257,71.6661000896055,69.7403205794835, 72.8881000896057}{code}
What we will update in the perplexity_iters for the "final update" of the model table?

> Add stopping criteria on perplexity to LDA
> ------------------------------------------
>
>                 Key: MADLIB-1351
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1351
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Parallel Latent Dirichlet Allocation
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>            Priority: Major
>             Fix For: v1.17
>
>
> In LDA 
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> make stopping criteria on perplexity rather than just number of iterations.
> Suggested approach is to do what scikit-learn does
> https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> evaluate_every : int, optional (default=0)
> How often to evaluate perplexity. Set it to 0 or negative number to not evaluate perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold.
> perplexity_tol : float, optional (default=1e-1)
> Perplexity tolerance to stop iterating. Only used when evaluate_every is greater than 0.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)