You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Kim, Min-Seok" <ms...@gmail.com> on 2016/09/09 19:21:16 UTC
Approximate Nearest Neighbors (ann) for Scala Spark
Hi,
I wrote a Scala implementation of Annoy(https://github.com/spotify/annoy)
which is an ann library.
https://github.com/mskimm/annoy4s
Because building tree in Annoy is done by a single node,
I thought the following solution:
- building tree (index file) using `toLocalIterator` of RDD on the driver,
- then quering nns on executors using the `index file` which is downloaded
by `sc.addFile`
Anybody reviews the code and idea?
I tested this implementation in Spark 1.6.2, and it seems work.
The code I tested was like
```
https://github.com/mskimm/annoy4s#item-similarity-computation
```
Minseok