You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Rahul Iyer (JIRA)" <ji...@apache.org> on 2016/02/10 02:01:25 UTC

[jira] [Resolved] (MADLIB-933) MADlib LDA term_frequency function bugs

     [ https://issues.apache.org/jira/browse/MADLIB-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rahul Iyer resolved MADLIB-933.
-------------------------------
    Resolution: Fixed

Completed with commit [5952569|https://github.com/apache/incubator-madlib/commit/5952569bff0d721a1c54a4b6ac9b60a64f0111e9]

> MADlib LDA term_frequency function bugs
> ---------------------------------------
>
>                 Key: MADLIB-933
>                 URL: https://issues.apache.org/jira/browse/MADLIB-933
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Parallel Latent Dirichlet Allocation
>            Reporter: Srivatsan
>            Assignee: Rahul Iyer
>             Fix For: v1.9
>
>
> 1. madlib.term_frequency() function (http://doc.madlib.net/latest/group__grp__text__utilities.html) takes the docid column and words columns as inputs, but this just fools us into thinking that we could name our columns as whatever we want, coz it complains if the columns are not actually named "docid" and "words"!
> 2. Secondly, it takes an output table as well as input (ex: documents_tf), but it creates a temp table for the vocabulary (therefore i can't specify a schema name like vatsan.documents_tf). This is annoying for two reasons
> a. The user can't immediately senses what's with the vocabulary table and why is it a temp table while the documents_tf table itself is not.
> b. If i have a real world dataset for LDA, my models are going to run for quite sometime. I may even terminate one session and run the LDA model in another session, this would mean the vocabulary temp table won't be available in the other session (or would have gotten dropped)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)