You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "William Benton (JIRA)" <ji...@apache.org> on 2016/09/19 14:35:20 UTC
[jira] [Created] (SPARK-17595) Inefficient selection in
Word2VecModel.findSynonyms
William Benton created SPARK-17595:
--------------------------------------
Summary: Inefficient selection in Word2VecModel.findSynonyms
Key: SPARK-17595
URL: https://issues.apache.org/jira/browse/SPARK-17595
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 2.0.0
Reporter: William Benton
Priority: Minor
The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements with the highest similarity to the query vector currently sorts the similarities for every vocabulary element. This involves making multiple copies of the collection of similarities while doing a (relatively) expensive sort. It would be more efficient to find the best matches by maintaining a bounded priority queue and populating it with a single pass over the vocabulary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org