You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/09/21 08:46:22 UTC

[jira] [Updated] (SPARK-17595) Inefficient selection in Word2VecModel.findSynonyms

     [ https://issues.apache.org/jira/browse/SPARK-17595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-17595:
------------------------------
    Assignee: William Benton

> Inefficient selection in Word2VecModel.findSynonyms
> ---------------------------------------------------
>
>                 Key: SPARK-17595
>                 URL: https://issues.apache.org/jira/browse/SPARK-17595
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.0.0
>            Reporter: William Benton
>            Assignee: William Benton
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements with the highest similarity to the query vector currently sorts the similarities for every vocabulary element.  This involves making multiple copies of the collection of similarities while doing a (relatively) expensive sort.  It would be more efficient to find the best matches by maintaining a bounded priority queue and populating it with a single pass over the vocabulary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org