You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Thamme Gowda N." <tg...@gmail.com> on 2016/05/04 00:05:21 UTC

Cluster Nutch Output based on Style and Structure: Join us at ApacheCon North America 2016

Hello everybody,

  I will be presenting clustering techniques for Nutch output at ApacheCon
NA 2016  later next week. I hope to see you there!
   Link to the event [1] and presentation [2].

In addition, we are also planning to contribute our toolkit to Nutch as
this is a useful post-processing step for a crawler.
As of now our algorithms runs on top of Apache Spark (Distributed Matrices
and GraphX are really helpful).
 Let us know your thoughts, details are in presentation [2]. Source code
and wiki at [3].

Oh, I missed to introduce myself?
   I am Thamme Gowda, a grad student at University of Southern California
(USC) and also a research assistant of Dr. Chris Mattmann.
Prior to the start of my graduate studies, I was building http://datoin.com
as a tech co-founder. I am excited to be at ApacheCon.


[1] http://sched.co/6OJN
[2]
http://schd.ws/hosted_files/apachecon2016/11/Apache%20Con%20Slides-Nutch-Clustering.pdf
[3] https://github.com/uscdataScience/autoextractor/wiki


Best,
Thamme

--
*Thamme Gowda N. *
@thammegowda <https://twitter.com/thammegowda>