You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Kim, Min-Seok" <ms...@gmail.com> on 2016/09/09 19:21:16 UTC

Approximate Nearest Neighbors (ann) for Scala Spark

Hi,

I wrote a Scala implementation of Annoy(https://github.com/spotify/annoy)
which is an ann library.

https://github.com/mskimm/annoy4s

Because building tree in Annoy is done by a single node,
I thought the following solution:
 - building tree (index file) using `toLocalIterator` of RDD on the driver,
 - then quering nns on executors using the `index file` which is downloaded
by `sc.addFile`

Anybody reviews the code and idea?

I tested this implementation in Spark 1.6.2, and it seems work.

The code I tested was like
```
https://github.com/mskimm/annoy4s#item-similarity-computation
```

Minseok