You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by naveenkumarmarri <na...@gmail.com> on 2016/02/24 17:37:16 UTC
Implementing random walk in spark
Hi,
I'm new to spark, I'm trying to compute similarity between users/products.
I've a huge table which I can't do a self join with the cluster I have.
I'm trying to implement do self join using random walk methodology which
will approximately give the results. The table is a bipartite graph with 2
columns
Idea:
- take any element(t1) in the first column in random
- picking the corresponding element(t2) in for the element(t1) in the
graph.
- lookup for possible elements in the graph for t2 in random say t3
- create a edge between t1 and t3
- Iterate it in the order of atleat n*n so that results will be
approximate
Questions
- Is spark a suitable environment to do this?
- I've coded logic for picking elements in random but facing issue when
building graph
- Should consider graphx?
Any help is highly appreciated.
Regards,
Naveen