You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yueguoguo <gi...@git.apache.org> on 2018/08/13 08:27:40 UTC

[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

GitHub user yueguoguo opened a pull request:

    https://github.com/apache/spark/pull/22090

    [DOCS] Fixed NDCG formula issues

    When j is 0, log(j+1) will be 0, and this leads to division by 0 issue.
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yueguoguo/spark patch-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22090
    
----
commit b3e00edc3db4f05c9cb4eabfb21cbb36b86d32b0
Author: Zhang Le <yu...@...>
Date:   2018-08-13T08:27:24Z

    [DOCS] Fixed NDCG formula issues
    
    When j is 0, log(j+1) will be 0, and this leads to division by 0 issue.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    **[Test build #4281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4281/testReport)** for PR 22090 at commit [`f87cf61`](https://github.com/apache/spark/commit/f87cf61c3a4ef41aeab4c9368b7fc9aa4983ab3e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22090#discussion_r210772686
  
    --- Diff: docs/mllib-evaluation-metrics.md ---
    @@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{
         <tr>
           <td>Normalized Discounted Cumulative Gain</td>
           <td>
    -        $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1}
    +        $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=1}^{n}
               \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\
             \text{Where} \\
             \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
    -        \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+1)}$
    +        \hspace{5 mm} IDCG(D, k) = \sum_{j=1}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+1)}$
           </td>
           <td>
             <a href="https://en.wikipedia.org/wiki/Information_retrieval#Discounted_cumulative_gain">NDCG at k</a> is a
    --- End diff --
    
    We can update the link here to https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    Ping @yueguoguo 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22090


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by yueguoguo <gi...@git.apache.org>.
Github user yueguoguo commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    @srowen Thanks Sean. Good suggestion and I have pushed new commits. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22090: [DOCS] Fixed NDCG formula issues

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22090
  
    **[Test build #4281 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4281/testReport)** for PR 22090 at commit [`f87cf61`](https://github.com/apache/spark/commit/f87cf61c3a4ef41aeab4c9368b7fc9aa4983ab3e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22090#discussion_r210772634
  
    --- Diff: docs/mllib-evaluation-metrics.md ---
    @@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{
         <tr>
           <td>Normalized Discounted Cumulative Gain</td>
           <td>
    -        $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1}
    +        $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=1}^{n}
    --- End diff --
    
    We do need to fix this, but, this makes the subscripts incorrect for R_i(j). I think the expression should change to ln(j+2) in the next line; this is what the code does. For consistency I'd do the same below too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22090#discussion_r211133578
  
    --- Diff: docs/mllib-evaluation-metrics.md ---
    @@ -462,13 +462,13 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{
           <td>Normalized Discounted Cumulative Gain</td>
           <td>
             $NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1}
    -          \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\
    +          \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+2)}} \\
             \text{Where} \\
             \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
    -        \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+1)}$
    +        \hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+2)}$
    --- End diff --
    
    @yueguoguo I think the "- 1" in the upper bounds of the sum needs to be restored here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org