You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/02/15 17:49:40 UTC

Welcome Alexis Detreglode as a Nutch Committer

Hi Folks,

A while back I nominated Alexis Detreglode for Nutch committership and PMC
membership. The VOTE tallies in Nutch PMC-ville have occurred and I'm happy
to announce that Alexis is now an Nutch committer!

Alexis, feel free to say a little bit about yourself, and, welcome aboard!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Welcome Alexis Detreglode as a Nutch Committer

Posted by Alexis <al...@gmail.com>.
Dear Nutch users & developers,


Thank you for the warm welcome. I guess I'm now part of the family. I
hope it will grow exponentially with the new version.


My Nutch story started in 2007 but only lasted for a few months. I
resumed it recently in November 2010 through an exchange of comments
on Julien's blog, about whether or not using Nuch 1.2 of Nutch 2
(trunk) for my personal purpose. The new design he suggested has
shifted radically from a full-fledged solution for search application,
to a minimalistic project that does not do indexing, neither storing,
neither parsing, but just crawling. Delegating all the subsidiary
tasks to more specialized projects should allow the Nutch community to
focus on it's core activity: Downloading pages from the web
automatically the fastest way and preparing the data for analysis,
still respecting the web standards regarding robots.


I take advantage of this announcement to urge all new and more
familiar users to migrate their crawls to this 2.0 version, even
though it is still in a very alpha version. It works, provided you
apply a few patches here and there. Help will be very much
appreciated, especially in helping kickstart with Gora, an embryonic
project for Data Access in Map/Reduce. IMHO, what's high-priority on
the road map would be:
- Setup an Ivy configuration to build the first Gora release.
Currently Nutch build fails because of the missing Gora dependency in
the Maven repository.
- Port http-protocol plugin that fetches content from the web to
HttpComponents' httpcore-nio in order to leverage Non blocking I/O.
- Design and improve Gora & Nutch unit tests.


Don't hesitate to share your own impressions on the new design, the
road map, the potential improvements. If you wish to participate
please refer to Nutch 2.0 section in the wiki. There are many ways to
contribute: send a message on the mailing-list, create an issue on
JIRA while attaching your patch to it or not, update the wiki...


Give it a shot!

Alexis
http://techvineyard.blogspot.com


On Tue, Feb 15, 2011 at 6:00 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Great!
>
> On Tuesday 15 February 2011 17:49:40 Mattmann, Chris A (388J) wrote:
>> Hi Folks,
>>
>> A while back I nominated Alexis Detreglode for Nutch committership and PMC
>> membership. The VOTE tallies in Nutch PMC-ville have occurred and I'm happy
>> to announce that Alexis is now an Nutch committer!
>>
>> Alexis, feel free to say a little bit about yourself, and, welcome aboard!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Welcome Alexis Detreglode as a Nutch Committer

Posted by Alexis <al...@gmail.com>.
Dear Nutch users & developers,


Thank you for the warm welcome. I guess I'm now part of the family. I
hope it will grow exponentially with the new version.


My Nutch story started in 2007 but only lasted for a few months. I
resumed it recently in November 2010 through an exchange of comments
on Julien's blog, about whether or not using Nuch 1.2 of Nutch 2
(trunk) for my personal purpose. The new design he suggested has
shifted radically from a full-fledged solution for search application,
to a minimalistic project that does not do indexing, neither storing,
neither parsing, but just crawling. Delegating all the subsidiary
tasks to more specialized projects should allow the Nutch community to
focus on it's core activity: Downloading pages from the web
automatically the fastest way and preparing the data for analysis,
still respecting the web standards regarding robots.


I take advantage of this announcement to urge all new and more
familiar users to migrate their crawls to this 2.0 version, even
though it is still in a very alpha version. It works, provided you
apply a few patches here and there. Help will be very much
appreciated, especially in helping kickstart with Gora, an embryonic
project for Data Access in Map/Reduce. IMHO, what's high-priority on
the road map would be:
- Setup an Ivy configuration to build the first Gora release.
Currently Nutch build fails because of the missing Gora dependency in
the Maven repository.
- Port http-protocol plugin that fetches content from the web to
HttpComponents' httpcore-nio in order to leverage Non blocking I/O.
- Design and improve Gora & Nutch unit tests.


Don't hesitate to share your own impressions on the new design, the
road map, the potential improvements. If you wish to participate
please refer to Nutch 2.0 section in the wiki. There are many ways to
contribute: send a message on the mailing-list, create an issue on
JIRA while attaching your patch to it or not, update the wiki...


Give it a shot!

Alexis
http://techvineyard.blogspot.com


On Tue, Feb 15, 2011 at 6:00 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Great!
>
> On Tuesday 15 February 2011 17:49:40 Mattmann, Chris A (388J) wrote:
>> Hi Folks,
>>
>> A while back I nominated Alexis Detreglode for Nutch committership and PMC
>> membership. The VOTE tallies in Nutch PMC-ville have occurred and I'm happy
>> to announce that Alexis is now an Nutch committer!
>>
>> Alexis, feel free to say a little bit about yourself, and, welcome aboard!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Welcome Alexis Detreglode as a Nutch Committer

Posted by Markus Jelsma <ma...@openindex.io>.
Great!

On Tuesday 15 February 2011 17:49:40 Mattmann, Chris A (388J) wrote:
> Hi Folks,
> 
> A while back I nominated Alexis Detreglode for Nutch committership and PMC
> membership. The VOTE tallies in Nutch PMC-ville have occurred and I'm happy
> to announce that Alexis is now an Nutch committer!
> 
> Alexis, feel free to say a little bit about yourself, and, welcome aboard!
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350