You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Sebastian Nagel <wa...@googlemail.com> on 2016/05/22 20:02:35 UTC

[ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

Dear all,

it is my pleasure to announce that Thamme Gowda N. has joined us
as committer and member of the Nutch PMC.  Congratulations on your
new role within the Apache Nutch community!

Thamme, would you mind telling us about yourself, your relation
to Nutch, what you've done so far, etc.?

Cheers and welcome on board!

Sebastian (on behalf of the Nutch PMC)

Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

Posted by Thamme Gowda <tg...@gmail.com>.

Hi Sebastian,
 thanks for the invitation and setting this up.

Hello everybody,

I am so glad to be on board.

About me:
  I'm currently a grad student (masters) at Univ. of Southern California
(USC), Los Angeles. I'm fortunate enough to meet professor Chris Mattmann
at USC.
Prior to my grad studies, I worked as a full-stack developer at few
startups in Bangalore, India. I am also a tech co-founder of a text
analysis platform, http://datoin.com. I found my interest in A.I. so here I
am at USC grad school. I am on my way for an internship at NASA JPL this
summer.

How I met Nutch:
 In 2014, with my team at Datoin.com we integrated Crawler/Input component
to our platform. We picked Nutch because we had rest of the platform on
Hadoop. Boom! that was when I first put my hands on nutch code.
 Last fall I took a graduate level Information Retrieval (IR) course at USC
taught by prof. Mattmann. Then joined hands with his team at NASA JPL to
work on IR related projects. We use and improve Nutch.

Some of my recent work related to Nutch:
Added an extension point and an extension to pass certain external URLS
when db.ignore.external is set. Fixed bugs and improved common crawl
dumper. A clustering toolkit for clustering Nutch output based on CSS
styles and DOM structures [2]...

More coming soon this summer!

I am interested in after-crawl analysis and bringing them back to Nutch as
extensions.
I also presented "Clustering the output of Nutch ...." at recent ApacheCon
NA [1].

I also love work on these:

   - reusable JVM containers to make it fast and efficient. *Thinking of
   spark execution backend* (A step ahead - a switchable execution backend
   to support MR and Spark, just like what Gora did to storage backend).
   - stats and analytics of crawl job in real-time

I am exicted to be involved with the community to imrove Nutch.

-
Thanks and Regards,
Thamme

[1]
http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark
[2] https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial

--
*Thamme Gowda *
Grad Student at USC <http://usc.edu>
@thammegowda <https://twitter.com/thammegowda> | 213-536-3552
http://scf.usc.edu/~tnarayan/

On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> Dear all,
>
> it is my pleasure to announce that Thamme Gowda N. has joined us
> as committer and member of the Nutch PMC.  Congratulations on your
> new role within the Apache Nutch community!
>
> Thamme, would you mind telling us about yourself, your relation
> to Nutch, what you've done so far, etc.?
>
> Cheers and welcome on board!
>
> Sebastian (on behalf of the Nutch PMC)
>

RE: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

Posted by Markus Jelsma <ma...@openindex.io>.

Welcome Thamme Gowda!

Cheers,
Markus

 
 
-----Original message-----
> From:Thamme Gowda <tg...@gmail.com>
> Sent: Monday 23rd May 2016 0:56
> To: dev@nutch.apache.org; user@nutch.apache.org
> Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.
> 
> Hi Sebastian, 
>  thanks for the invitation and setting this up. 
> 
> Hello everybody, 
> 
> I am so glad to be on board. 
> 
> About me: 
>   Im currently a grad student (masters) at Univ. of Southern California (USC), Los Angeles. Im fortunate enough to meet professor Chris Mattmann at USC. 
> Prior to my grad studies, I worked as a full-stack developer at few startups in Bangalore, India. I am also a tech co-founder of a text analysis platform, http://datoin.com <http://datoin.com>. I found my interest in A.I. so here I am at USC grad school. I am on my way for an internship at NASA JPL this summer. 
> 
> How I met Nutch: 
>  In 2014, with my team at Datoin.com we integrated Crawler/Input component to our platform. We picked Nutch because we had rest of the platform on Hadoop. Boom! that was when I first put my hands on nutch code. 
>  Last fall I took a graduate level Information Retrieval (IR) course at USC taught by prof. Mattmann. Then joined hands with his team at NASA JPL to work on IR related projects. We use and improve Nutch. 
> 
> Some of my recent work related to Nutch: 
> Added an extension point and an extension to pass certain external URLS when db.ignore.external is set. Fixed bugs and improved common crawl dumper. A clustering toolkit for clustering Nutch output based on CSS styles and DOM structures [2]... 
> 
> More coming soon this summer! 
> 
> I am interested in after-crawl analysis and bringing them back to Nutch as extensions. 
> I also presented "Clustering the output of Nutch ...." at recent ApacheCon NA [1]. 
> 
> I also love work on these: 
> 	reusable JVM containers to make it fast and efficient. Thinking of spark execution backend (A step ahead - a switchable execution backend to support MR and Spark, just like what Gora did to storage backend).		stats and analytics of crawl job in real-time	 
> I am exicted to be involved with the community to imrove Nutch. 
> 
> - 
> Thanks and Regards, 
> Thamme 
> 
> [1] http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark <http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark>[2] https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial <https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial>
> 
> -- 
> Thamme Gowda  
> Grad Student at USC <http://usc.edu>  
> @thammegowda <https://twitter.com/thammegowda> | 213-536-3552 
> http://scf.usc.edu/~tnarayan/ <http://scf.usc.edu/~tnarayan/>
> 
> On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel <wastl.nagel@googlemail.com <ma...@googlemail.com>> wrote:
> Dear all,
 
> 
 
> it is my pleasure to announce that Thamme Gowda N. has joined us
 
> as committer and member of the Nutch PMC.  Congratulations on your
 
> new role within the Apache Nutch community!
 
> 
 
> Thamme, would you mind telling us about yourself, your relation
 
> to Nutch, what youve done so far, etc.?
 
> 
 
> Cheers and welcome on board!
 
> 
 
> Sebastian (on behalf of the Nutch PMC)
 
>

RE: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

Posted by Markus Jelsma <ma...@openindex.io>.

Welcome Thamme Gowda!

Cheers,
Markus

 
 
-----Original message-----
> From:Thamme Gowda <tg...@gmail.com>
> Sent: Monday 23rd May 2016 0:56
> To: dev@nutch.apache.org; user@nutch.apache.org
> Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.
> 
> Hi Sebastian, 
>  thanks for the invitation and setting this up. 
> 
> Hello everybody, 
> 
> I am so glad to be on board. 
> 
> About me: 
>   Im currently a grad student (masters) at Univ. of Southern California (USC), Los Angeles. Im fortunate enough to meet professor Chris Mattmann at USC. 
> Prior to my grad studies, I worked as a full-stack developer at few startups in Bangalore, India. I am also a tech co-founder of a text analysis platform, http://datoin.com <http://datoin.com>. I found my interest in A.I. so here I am at USC grad school. I am on my way for an internship at NASA JPL this summer. 
> 
> How I met Nutch: 
>  In 2014, with my team at Datoin.com we integrated Crawler/Input component to our platform. We picked Nutch because we had rest of the platform on Hadoop. Boom! that was when I first put my hands on nutch code. 
>  Last fall I took a graduate level Information Retrieval (IR) course at USC taught by prof. Mattmann. Then joined hands with his team at NASA JPL to work on IR related projects. We use and improve Nutch. 
> 
> Some of my recent work related to Nutch: 
> Added an extension point and an extension to pass certain external URLS when db.ignore.external is set. Fixed bugs and improved common crawl dumper. A clustering toolkit for clustering Nutch output based on CSS styles and DOM structures [2]... 
> 
> More coming soon this summer! 
> 
> I am interested in after-crawl analysis and bringing them back to Nutch as extensions. 
> I also presented "Clustering the output of Nutch ...." at recent ApacheCon NA [1]. 
> 
> I also love work on these: 
> 	reusable JVM containers to make it fast and efficient. Thinking of spark execution backend (A step ahead - a switchable execution backend to support MR and Spark, just like what Gora did to storage backend).		stats and analytics of crawl job in real-time	 
> I am exicted to be involved with the community to imrove Nutch. 
> 
> - 
> Thanks and Regards, 
> Thamme 
> 
> [1] http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark <http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark>[2] https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial <https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial>
> 
> -- 
> Thamme Gowda  
> Grad Student at USC <http://usc.edu>  
> @thammegowda <https://twitter.com/thammegowda> | 213-536-3552 
> http://scf.usc.edu/~tnarayan/ <http://scf.usc.edu/~tnarayan/>
> 
> On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel <wastl.nagel@googlemail.com <ma...@googlemail.com>> wrote:
> Dear all,
 
> 
 
> it is my pleasure to announce that Thamme Gowda N. has joined us
 
> as committer and member of the Nutch PMC.  Congratulations on your
 
> new role within the Apache Nutch community!
 
> 
 
> Thamme, would you mind telling us about yourself, your relation
 
> to Nutch, what youve done so far, etc.?
 
> 
 
> Cheers and welcome on board!
 
> 
 
> Sebastian (on behalf of the Nutch PMC)
 
>

Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

Posted by Thamme Gowda <tg...@gmail.com>.

Hi Sebastian,
 thanks for the invitation and setting this up.

Hello everybody,

I am so glad to be on board.

About me:
  I'm currently a grad student (masters) at Univ. of Southern California
(USC), Los Angeles. I'm fortunate enough to meet professor Chris Mattmann
at USC.
Prior to my grad studies, I worked as a full-stack developer at few
startups in Bangalore, India. I am also a tech co-founder of a text
analysis platform, http://datoin.com. I found my interest in A.I. so here I
am at USC grad school. I am on my way for an internship at NASA JPL this
summer.

How I met Nutch:
 In 2014, with my team at Datoin.com we integrated Crawler/Input component
to our platform. We picked Nutch because we had rest of the platform on
Hadoop. Boom! that was when I first put my hands on nutch code.
 Last fall I took a graduate level Information Retrieval (IR) course at USC
taught by prof. Mattmann. Then joined hands with his team at NASA JPL to
work on IR related projects. We use and improve Nutch.

Some of my recent work related to Nutch:
Added an extension point and an extension to pass certain external URLS
when db.ignore.external is set. Fixed bugs and improved common crawl
dumper. A clustering toolkit for clustering Nutch output based on CSS
styles and DOM structures [2]...

More coming soon this summer!

I am interested in after-crawl analysis and bringing them back to Nutch as
extensions.
I also presented "Clustering the output of Nutch ...." at recent ApacheCon
NA [1].

I also love work on these:

   - reusable JVM containers to make it fast and efficient. *Thinking of
   spark execution backend* (A step ahead - a switchable execution backend
   to support MR and Spark, just like what Gora did to storage backend).
   - stats and analytics of crawl job in real-time

I am exicted to be involved with the community to imrove Nutch.

-
Thanks and Regards,
Thamme

[1]
http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark
[2] https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial

--
*Thamme Gowda *
Grad Student at USC <http://usc.edu>
@thammegowda <https://twitter.com/thammegowda> | 213-536-3552
http://scf.usc.edu/~tnarayan/

On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> Dear all,
>
> it is my pleasure to announce that Thamme Gowda N. has joined us
> as committer and member of the Nutch PMC.  Congratulations on your
> new role within the Apache Nutch community!
>
> Thamme, would you mind telling us about yourself, your relation
> to Nutch, what you've done so far, etc.?
>
> Cheers and welcome on board!
>
> Sebastian (on behalf of the Nutch PMC)
>