You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by fmcquillan99 <gi...@git.apache.org> on 2018/02/14 00:46:52 UTC

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

GitHub user fmcquillan99 opened a pull request:

    https://github.com/apache/madlib/pull/235

    update KNN, DT and RF docs to match recent commits

    KNN
    * describe weighted average in more detail
    
    DT & RF
    * correct some doc errors and omissions
    * update example to show positive variable importance in RF

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #235
    
----
commit d15f9fcc1ea625514aeeb7418f52f3e5b80c532c
Author: Frank McQuillan <fm...@...>
Date:   2018-02-14T00:16:57Z

    update KNN, DT and RF docs to match recent commits

----


---

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/235#discussion_r168523757
  
    --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in ---
    @@ -208,13 +208,26 @@ forest_train(training_table_name,
     
         <tr>
         <th>dependent_var_levels</th>
    -    <td>itext. For classification, the distinct levels of the dependent variable.</td>
    +    <td>text. For classification, the distinct levels of the dependent variable.</td>
         </tr>
     
         <tr>
         <th>dependent_var_type</th>
         <td>text. The type of dependent variable.</td>
         </tr>
    +
    +    <tr>
    +    <th>independent_var_types</th>
    +    <td>text. A comma separated string for the types of independent variables.</td>
    +    </tr>
    +
    +    <tr>
    +    <th>null_proxy</th>
    +    <td>text. Describes how NULLs are handled.  If NULL is not 
    +    treated as a separate categorical variable, this will be blank.
    --- End diff --
    
    again `this will be NULL` is more appropriate. 


---

[GitHub] madlib issue #235: update KNN, DT and RF docs to match recent commits

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/235
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/347/



---

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/235


---

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

Posted by fmcquillan99 <gi...@git.apache.org>.
Github user fmcquillan99 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/235#discussion_r168557191
  
    --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in ---
    @@ -208,13 +208,26 @@ forest_train(training_table_name,
     
         <tr>
         <th>dependent_var_levels</th>
    -    <td>itext. For classification, the distinct levels of the dependent variable.</td>
    +    <td>text. For classification, the distinct levels of the dependent variable.</td>
         </tr>
     
         <tr>
         <th>dependent_var_type</th>
         <td>text. The type of dependent variable.</td>
         </tr>
    +
    +    <tr>
    +    <th>independent_var_types</th>
    +    <td>text. A comma separated string for the types of independent variables.</td>
    +    </tr>
    +
    +    <tr>
    +    <th>null_proxy</th>
    +    <td>text. Describes how NULLs are handled.  If NULL is not 
    +    treated as a separate categorical variable, this will be blank.
    --- End diff --
    
    thanks I made the suggested changes


---

[GitHub] madlib issue #235: update KNN, DT and RF docs to match recent commits

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/235
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/344/



---

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/235#discussion_r168523662
  
    --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
    @@ -355,6 +355,19 @@ tree_train(
         <th>independent_var_types</th>
         <td>TEXT. A comma separated string for the types of independent variables.</td>
         </tr>
    +
    +    <tr>
    +    <th>n_folds</th>
    +    <td>BIGINT. Number of cross-validation folds used.</td>
    +    </tr>
    +
    +    <tr>
    +    <th>null_proxy</th>
    +    <td>TEXT. Describes how NULLs are handled.  If NULL is not 
    +    treated as a separate categorical variable, this will be blank.
    --- End diff --
    
    I suggest replacing `this will be blank` with `this will be NULL`. The `blank` for NULL is the default in psql but that can easily be changed. 


---