You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Krimit <gi...@git.apache.org> on 2017/02/05 20:21:42 UTC

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

GitHub user Krimit opened a pull request:

    https://github.com/apache/spark/pull/16811

    [SPARK-17629][ML] methods to return synonyms directly

    ## What changes were proposed in this pull request?
    provide methods to return synonyms directly, without wrapping them in a dataframe
    
    In performance sensitive applications (such as user facing apis) the roundtrip to and from dataframes is costly and unnecessary
    
    The name for these methods is tricky. If anyone has a better suggestion than ``findSynonymsLocal``, I'm happy to hear it
    ## How was this patch tested?
    
    existing word2vec tests 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Krimit/spark w2vFindSynonymsLocal

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16811
    
----
commit 51c260dbf9b0f85d24f8a313dd692763e1cdfb09
Author: Asher Krim <ak...@hubspot.com>
Date:   2017-02-05T20:05:32Z

    provide methods to return synonyms directly, without wrapping them in a dataframe
    
    In performance sensitive applications (such as user facing apis) the roundtrip to and from dataframes is costly and unnecessary

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r104861465
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
    @@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
           .fit(docDF)
     
         val expectedSimilarity = Array(0.2608488929093532, -0.8271274846926078)
    -    val (synonyms, similarity) = model.findSynonyms("a", 2).rdd.map {
    +    val result = model.findSynonyms("a", 2).rdd.map {
           case Row(w: String, sim: Double) => (w, sim)
    -    }.collect().unzip
    +    }.collect()
    +    val (synonyms, similarity) = result.unzip
     
         assert(synonyms === Array("b", "c"))
         expectedSimilarity.zip(similarity).foreach {
    -      case (expected, actual) => assert(math.abs((expected - actual) / expected) < 1E-5)
    +      case (expected, actual) => assert(expected ~== actual absTol 1E-5)
    +    }
    +
    +    result.zip(model.findSynonymsArray("a", 2)).foreach {
    --- End diff --
    
    It has - usually in this case either a Map as suggested, or explicitly sorting the result of `collect` before comparison, is best.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    LGTM
    Merging with master
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by Krimit <gi...@git.apache.org>.
Github user Krimit commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Thanks for your comments @jkbradley, updated. I also took the opportunity to replace the kinda-janky fuzzyEquals in the test with the ``TestingUtils`` implementation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #72961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72961/testReport)** for PR 16811 at commit [`7988385`](https://github.com/apache/spark/commit/7988385e8412a0176d1595f9d59dda843d0e4e23).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r99572391
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala ---
    @@ -232,19 +232,40 @@ class Word2VecModel private[ml] (
       @Since("1.5.0")
       def findSynonyms(word: String, num: Int): DataFrame = {
         val spark = SparkSession.builder().getOrCreate()
    -    spark.createDataFrame(wordVectors.findSynonyms(word, num)).toDF("word", "similarity")
    +    spark.createDataFrame(findSynonymsLocal(word, num)).toDF("word", "similarity")
       }
     
       /**
    -   * Find "num" number of words whose vector representation most similar to the supplied vector.
    +   * Find "num" number of words whose vector representation is most similar to the supplied vector.
        * If the supplied vector is the vector representation of a word in the model's vocabulary,
        * that word will be in the results.  Returns a dataframe with the words and the cosine
        * similarities between the synonyms and the given word vector.
        */
       @Since("2.0.0")
       def findSynonyms(vec: Vector, num: Int): DataFrame = {
         val spark = SparkSession.builder().getOrCreate()
    -    spark.createDataFrame(wordVectors.findSynonyms(vec, num)).toDF("word", "similarity")
    +    spark.createDataFrame(findSynonymsLocal(vec, num)).toDF("word", "similarity")
    +  }
    +
    +  /**
    +   * Find "num" number of words whose vector representation is most similar to the supplied vector.
    +   * If the supplied vector is the vector representation of a word in the model's vocabulary,
    +   * that word will be in the results. Returns an array of the words and the cosine
    +   * similarities between the synonyms and the given word vector.
    +   */
    +  @Since("2.2.0")
    +  def findSynonymsLocal(vec: Vector, num: Int): Array[(String, Double)] = {
    +    wordVectors.findSynonyms(vec, num)
    +  }
    +
    +  /**
    +   * Find "num" number of words closest in similarity to the given word, not
    +   * including the word itself. Returns a dataframe with the words and the
    --- End diff --
    
    (This doesn't return a DataFrame)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #73933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73933/testReport)** for PR 16811 at commit [`2cca29a`](https://github.com/apache/spark/commit/2cca29a9b2c4e779ab6d3282d4025786868d9dbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Hm you should be able to restart it by asking here. If it doesn't it may be stuck or busy. You can also trigger at spark-prs.appspot.com


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r103338146
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
    @@ -144,6 +144,31 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
         }
       }
     
    +  test("findSynonymsArray") {
    --- End diff --
    
    Can you please combine this with the findSynonyms test to avoid fitting the same model twice and to share code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Thanks for the PR!
    
    What about findSynonymsArray?  That still implies a local value and is more specific.
    
    Also, can you please add a unit test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by Krimit <gi...@git.apache.org>.
Github user Krimit commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r99612768
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala ---
    @@ -232,19 +232,40 @@ class Word2VecModel private[ml] (
       @Since("1.5.0")
       def findSynonyms(word: String, num: Int): DataFrame = {
         val spark = SparkSession.builder().getOrCreate()
    -    spark.createDataFrame(wordVectors.findSynonyms(word, num)).toDF("word", "similarity")
    +    spark.createDataFrame(findSynonymsLocal(word, num)).toDF("word", "similarity")
       }
     
       /**
    -   * Find "num" number of words whose vector representation most similar to the supplied vector.
    +   * Find "num" number of words whose vector representation is most similar to the supplied vector.
        * If the supplied vector is the vector representation of a word in the model's vocabulary,
        * that word will be in the results.  Returns a dataframe with the words and the cosine
        * similarities between the synonyms and the given word vector.
        */
       @Since("2.0.0")
       def findSynonyms(vec: Vector, num: Int): DataFrame = {
         val spark = SparkSession.builder().getOrCreate()
    -    spark.createDataFrame(wordVectors.findSynonyms(vec, num)).toDF("word", "similarity")
    +    spark.createDataFrame(findSynonymsLocal(vec, num)).toDF("word", "similarity")
    +  }
    +
    +  /**
    +   * Find "num" number of words whose vector representation is most similar to the supplied vector.
    +   * If the supplied vector is the vector representation of a word in the model's vocabulary,
    +   * that word will be in the results. Returns an array of the words and the cosine
    +   * similarities between the synonyms and the given word vector.
    +   */
    +  @Since("2.2.0")
    +  def findSynonymsLocal(vec: Vector, num: Int): Array[(String, Double)] = {
    +    wordVectors.findSynonyms(vec, num)
    +  }
    +
    +  /**
    +   * Find "num" number of words closest in similarity to the given word, not
    +   * including the word itself. Returns a dataframe with the words and the
    --- End diff --
    
    \U0001f50d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #72961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72961/testReport)** for PR 16811 at commit [`7988385`](https://github.com/apache/spark/commit/7988385e8412a0176d1595f9d59dda843d0e4e23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    AFAIK Jenkins won't listen to me or Yanbo. We tried couple times with other PRs - can try again next time.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by Krimit <gi...@git.apache.org>.
Github user Krimit commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Updated. I do kind of wish we had access to ``assertJ``, which would make unordered assertions a cakewalk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    @srowen could you please kick Jenkins to test this PR? :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16811


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by Krimit <gi...@git.apache.org>.
Github user Krimit commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r104402251
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
    @@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
           .fit(docDF)
     
         val expectedSimilarity = Array(0.2608488929093532, -0.8271274846926078)
    -    val (synonyms, similarity) = model.findSynonyms("a", 2).rdd.map {
    +    val result = model.findSynonyms("a", 2).rdd.map {
           case Row(w: String, sim: Double) => (w, sim)
    -    }.collect().unzip
    +    }.collect()
    +    val (synonyms, similarity) = result.unzip
     
         assert(synonyms === Array("b", "c"))
         expectedSimilarity.zip(similarity).foreach {
    -      case (expected, actual) => assert(math.abs((expected - actual) / expected) < 1E-5)
    +      case (expected, actual) => assert(expected ~== actual absTol 1E-5)
    +    }
    +
    +    result.zip(model.findSynonymsArray("a", 2)).foreach {
    --- End diff --
    
    Sure thing! I wonder if ``.collect()`` has ever caused flappy tests for this reason here or elsewhere 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72961/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73934/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #73933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73933/testReport)** for PR 16811 at commit [`2cca29a`](https://github.com/apache/spark/commit/2cca29a9b2c4e779ab6d3282d4025786868d9dbb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #74001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74001/testReport)** for PR 16811 at commit [`3a02800`](https://github.com/apache/spark/commit/3a02800dbb3bf03f5ecb58f15691848d8df77cf0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #73934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73934/testReport)** for PR 16811 at commit [`881353d`](https://github.com/apache/spark/commit/881353d7d104f3bff9246418c6149af6513c858b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r104330610
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
    @@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
           .fit(docDF)
     
         val expectedSimilarity = Array(0.2608488929093532, -0.8271274846926078)
    -    val (synonyms, similarity) = model.findSynonyms("a", 2).rdd.map {
    +    val result = model.findSynonyms("a", 2).rdd.map {
           case Row(w: String, sim: Double) => (w, sim)
    -    }.collect().unzip
    +    }.collect()
    +    val (synonyms, similarity) = result.unzip
     
         assert(synonyms === Array("b", "c"))
         expectedSimilarity.zip(similarity).foreach {
    -      case (expected, actual) => assert(math.abs((expected - actual) / expected) < 1E-5)
    +      case (expected, actual) => assert(expected ~== actual absTol 1E-5)
    +    }
    +
    +    result.zip(model.findSynonymsArray("a", 2)).foreach {
    --- End diff --
    
    Technically, collect() may not return elements in the same order each time, so this will be more robust if we convert each result to a map: synonym -> similarity and compare the maps.  Could you please do that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Thanks!  I made a follow-up JIRA for updating the Python API: https://issues.apache.org/jira/browse/SPARK-19866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74001/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #73934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73934/testReport)** for PR 16811 at commit [`881353d`](https://github.com/apache/spark/commit/881353d7d104f3bff9246418c6149af6513c858b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16811#discussion_r103338261
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
    @@ -144,6 +144,31 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul
         }
       }
     
    +  test("findSynonymsArray") {
    --- End diff --
    
    Once you combine them, you can check findSynonymsArray by comparing it against findSynonyms.collect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by Krimit <gi...@git.apache.org>.
Github user Krimit commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    @jkbradley - updated
    
    Added an explicit new test as requested, although the existing test already covers it (by virtue of the existing methods calling the new methods)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    **[Test build #74001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74001/testReport)** for PR 16811 at commit [`3a02800`](https://github.com/apache/spark/commit/3a02800dbb3bf03f5ecb58f15691848d8df77cf0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73933/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

Posted by Krimit <gi...@git.apache.org>.
Github user Krimit commented on the issue:

    https://github.com/apache/spark/pull/16811
  
    cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org