You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guillaume Pitel <gu...@exensa.com> on 2014/03/27 16:12:02 UTC
Spark powered wikipedia analysis and exploration
Hi Spark users,
I don't know if it's the right place to announce it, but Spark has a new visible
use case through a demo we put online here :
http://wikinsights.org
It allows you to explore the English Wikipedia with a few added benefits from
our proprietary semantic and relations analysis method, so that you can see
similar pages (based on text content or links), see the most relevant words for
a page, and other stuff.
Spark is used for the processing of the English Wikipedia, and for the
computation. It takes about 30 minutes for three iterations of our method on the
whole 4.4M documents * 2.1M words matrix, on a smallish cluster of 7 nodes with
4 core, 32GB RAM.
Any feedback is welcome (except on the aesthetic aspect, we already know the UI
is really bad)
Enjoy exploring Wikipedia in your spare time :)
Guillaume
--
eXenSa
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80
eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05