You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/09/22 19:44:01 UTC
[jira] [Updated] (MADLIB-1160) Change term frequency indexes to
start at 1 not 0
[ https://issues.apache.org/jira/browse/MADLIB-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1160:
------------------------------------
Description:
Context
Please see this thread from the user mailing list
http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201709.mbox/%3CCA%2B9JwyW78-aoe-NCQZc_iMuqW6SpKXs0H4JeTMfo3b-G4cxm0w%40mail.gmail.com%3E
Currently term frequency
http://madlib.apache.org/docs/latest/group__grp__text__utilities.html
creates indexes that start at 0 (e.g., docid)
whereas LDA
http://madlib.apache.org/docs/latest/group__grp__lda.html
creates indexes that start at 1 (e.g., topicid)
Since these are often used together, they should be consistent. Recommend changing term frequency to start at 1.
Setting to 2.0 fix in case this is a breaking change for upgrading models.
was:
Currently term frequency
http://madlib.apache.org/docs/latest/group__grp__text__utilities.html
creates indexes that start at 0 (e.g., docid)
whereas LDA
http://madlib.apache.org/docs/latest/group__grp__lda.html
creates indexes that start at 1 (e.g., topicid)
Since these are often used together, they should be consistent. Recommend changing term frequency to start at 1.
Setting to 2.0 fix in case this is a breaking change for upgrading models.
> Change term frequency indexes to start at 1 not 0
> -------------------------------------------------
>
> Key: MADLIB-1160
> URL: https://issues.apache.org/jira/browse/MADLIB-1160
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v2.0
>
>
> Context
> Please see this thread from the user mailing list
> http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201709.mbox/%3CCA%2B9JwyW78-aoe-NCQZc_iMuqW6SpKXs0H4JeTMfo3b-G4cxm0w%40mail.gmail.com%3E
> Currently term frequency
> http://madlib.apache.org/docs/latest/group__grp__text__utilities.html
> creates indexes that start at 0 (e.g., docid)
> whereas LDA
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> creates indexes that start at 1 (e.g., topicid)
> Since these are often used together, they should be consistent. Recommend changing term frequency to start at 1.
> Setting to 2.0 fix in case this is a breaking change for upgrading models.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)