You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by naveenkumarmarri <na...@gmail.com> on 2016/02/24 17:37:16 UTC

Implementing random walk in spark

Hi,

I'm new to spark, I'm trying to compute similarity between users/products.
I've a huge table which I can't do a self join with the cluster I have.

I'm trying to implement do self join using random walk methodology which
will approximately give the results. The table is a bipartite graph with 2
columns

Idea:

   - take any element(t1) in the first column in random
   - picking the corresponding element(t2) in for the element(t1) in the
   graph.
   - lookup for possible elements in the graph for t2 in random say t3
   - create a edge between t1 and t3
   - Iterate it in the order of atleat n*n so that results will be
   approximate

Questions


   - Is spark a suitable environment to do this?
   - I've coded logic for picking elements in random but facing issue when
   building graph
   - Should consider graphx?

Any help is highly appreciated.

Regards,
Naveen