You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/09/22 19:44:01 UTC

[jira] [Updated] (MADLIB-1160) Change term frequency indexes to start at 1 not 0

     [ https://issues.apache.org/jira/browse/MADLIB-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan updated MADLIB-1160:
------------------------------------
    Description: 
Context

Please see this thread from the user mailing list
http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201709.mbox/%3CCA%2B9JwyW78-aoe-NCQZc_iMuqW6SpKXs0H4JeTMfo3b-G4cxm0w%40mail.gmail.com%3E

Currently term frequency
http://madlib.apache.org/docs/latest/group__grp__text__utilities.html
creates indexes that start at 0 (e.g., docid)
whereas LDA
http://madlib.apache.org/docs/latest/group__grp__lda.html
creates indexes that start at 1 (e.g., topicid)

Since these are often used together, they should be consistent.  Recommend changing term frequency to start at 1.

Setting to 2.0 fix in case this is a breaking change for upgrading models.

  was:
Currently term frequency
http://madlib.apache.org/docs/latest/group__grp__text__utilities.html
creates indexes that start at 0 (e.g., docid)
whereas LDA
http://madlib.apache.org/docs/latest/group__grp__lda.html
creates indexes that start at 1 (e.g., topicid)

Since these are often used together, they should be consistent.  Recommend changing term frequency to start at 1.

Setting to 2.0 fix in case this is a breaking change for upgrading models.


> Change term frequency indexes to start at 1 not 0
> -------------------------------------------------
>
>                 Key: MADLIB-1160
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1160
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v2.0
>
>
> Context
> Please see this thread from the user mailing list
> http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201709.mbox/%3CCA%2B9JwyW78-aoe-NCQZc_iMuqW6SpKXs0H4JeTMfo3b-G4cxm0w%40mail.gmail.com%3E
> Currently term frequency
> http://madlib.apache.org/docs/latest/group__grp__text__utilities.html
> creates indexes that start at 0 (e.g., docid)
> whereas LDA
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> creates indexes that start at 1 (e.g., topicid)
> Since these are often used together, they should be consistent.  Recommend changing term frequency to start at 1.
> Setting to 2.0 fix in case this is a breaking change for upgrading models.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)