You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Guillaume Pitel <gu...@exensa.com> on 2014/03/27 16:12:02 UTC

Spark powered wikipedia analysis and exploration

Hi Spark users,

I don't know if it's the right place to announce it, but Spark has a new visible 
use case through a demo we put online here :

http://wikinsights.org

It allows you to explore the English Wikipedia with a few added benefits from 
our proprietary semantic and relations analysis method, so that you can see 
similar pages (based on text content or links), see the most relevant words for 
a page, and other stuff.

Spark is used for the processing of the English Wikipedia, and for the 
computation. It takes about 30 minutes for three iterations of our method on the 
whole 4.4M documents * 2.1M words matrix, on a smallish  cluster of 7 nodes with 
4 core, 32GB RAM.

Any feedback is welcome (except on the aesthetic aspect, we already know the UI 
is really bad)

Enjoy exploring Wikipedia in your spare time :)

Guillaume
-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05