You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2011/01/04 21:27:54 UTC

Release planning

Hi users & devs,

As you probably know, there are currently two active lines of 
development for Nutch:

* Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely 
redesigned storage layer that uses Apache Gora, which in turn can use 
various storage implementations such as HBase, Cassandra, and MySQL. 
This branch is still largely experimental and unstable, but work is 
progressing, and at the current pace I think a release should be 
possible within the next ~6 months. Another important addition on this 
branch is a REST API that allows using Nutch as a black-box crawling 
service.

* Nutch branch-1.3: this started as a snapshot of Nutch trunk just 
before merging with nutchbase (i.e. switching to Gora as a storage 
layer). This branch is still largely similar to the previous versions of 
Nutch, and uses Hadoop MapFile/SequenceFile and "segments". As compared 
with release 1.2 it does NOT ship with any search infrastructure, 
because all search functionality has been delegated to Solr (via 
SolrIndexer). This is BTW also true about Nutch trunk.

Regarding branch-1.2 (which is a maintenance branch after release 1.2) 
there have been pretty no updates there, if any. Nutch committer 
resources are very limited (when it comes to active committers), so I 
don't expect any maintenance release from this branch to happen...

I think that considering the relatively remote release date for Nutch 
2.-0 it would make sense to roll out a 1.3 release based on branch-1.3, 
after making sure that all critical patches from trunk have been merged 
in there.

What do you think?

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Release planning

Posted by Julien Nioche <li...@gmail.com>.
+1 from me. I've committed today a bunch of patches which were in 1.2 but
not in 1.3 (just one last one to do) but haven't compared with 2.0

Having a release based on 1.3 would be great as it would be a nice
transition towards 2.0 (delegate indexing/search, dependency management with
Ivy, separation between local and remote deployment, removal of redondant
plugins etc...).

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

On 4 January 2011 20:27, Andrzej Bialecki <ab...@getopt.org> wrote:

> Hi users & devs,
>
> As you probably know, there are currently two active lines of development
> for Nutch:
>
> * Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely redesigned
> storage layer that uses Apache Gora, which in turn can use various storage
> implementations such as HBase, Cassandra, and MySQL. This branch is still
> largely experimental and unstable, but work is progressing, and at the
> current pace I think a release should be possible within the next ~6 months.
> Another important addition on this branch is a REST API that allows using
> Nutch as a black-box crawling service.
>
> * Nutch branch-1.3: this started as a snapshot of Nutch trunk just before
> merging with nutchbase (i.e. switching to Gora as a storage layer). This
> branch is still largely similar to the previous versions of Nutch, and uses
> Hadoop MapFile/SequenceFile and "segments". As compared with release 1.2 it
> does NOT ship with any search infrastructure, because all search
> functionality has been delegated to Solr (via SolrIndexer). This is BTW also
> true about Nutch trunk.
>
> Regarding branch-1.2 (which is a maintenance branch after release 1.2)
> there have been pretty no updates there, if any. Nutch committer resources
> are very limited (when it comes to active committers), so I don't expect any
> maintenance release from this branch to happen...
>
> I think that considering the relatively remote release date for Nutch 2.-0
> it would make sense to roll out a 1.3 release based on branch-1.3, after
> making sure that all critical patches from trunk have been merged in there.
>
> What do you think?
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Release planning

Posted by Markus Jelsma <ma...@openindex.io>.
Splendid +1!

On Tuesday 04 January 2011 21:27:54 Andrzej Bialecki wrote:
> Hi users & devs,
> 
> As you probably know, there are currently two active lines of
> development for Nutch:
> 
> * Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely
> redesigned storage layer that uses Apache Gora, which in turn can use
> various storage implementations such as HBase, Cassandra, and MySQL.
> This branch is still largely experimental and unstable, but work is
> progressing, and at the current pace I think a release should be
> possible within the next ~6 months. Another important addition on this
> branch is a REST API that allows using Nutch as a black-box crawling
> service.
> 
> * Nutch branch-1.3: this started as a snapshot of Nutch trunk just
> before merging with nutchbase (i.e. switching to Gora as a storage
> layer). This branch is still largely similar to the previous versions of
> Nutch, and uses Hadoop MapFile/SequenceFile and "segments". As compared
> with release 1.2 it does NOT ship with any search infrastructure,
> because all search functionality has been delegated to Solr (via
> SolrIndexer). This is BTW also true about Nutch trunk.
> 
> Regarding branch-1.2 (which is a maintenance branch after release 1.2)
> there have been pretty no updates there, if any. Nutch committer
> resources are very limited (when it comes to active committers), so I
> don't expect any maintenance release from this branch to happen...
> 
> I think that considering the relatively remote release date for Nutch
> 2.-0 it would make sense to roll out a 1.3 release based on branch-1.3,
> after making sure that all critical patches from trunk have been merged
> in there.
> 
> What do you think?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Release planning

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
(cc to dev@nutch since you are addressing devs too)

Hey Andrzej:

> 
> As you probably know, there are currently two active lines of 
> development for Nutch:
> [...snip...]
> 
> Regarding branch-1.2 (which is a maintenance branch after release 1.2) 
> there have been pretty no updates there, if any. Nutch committer 
> resources are very limited (when it comes to active committers), so I 
> don't expect any maintenance release from this branch to happen...

+1, agreed.

> 
> I think that considering the relatively remote release date for Nutch 
> 2.-0 it would make sense to roll out a 1.3 release based on branch-1.3, 
> after making sure that all critical patches from trunk have been merged 
> in there.
> 
> What do you think?

Sounds good to me. Count me in to RM it if you guys are OK with that!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Release planning

Posted by Julien Nioche <li...@gmail.com>.
+1 from me. I've committed today a bunch of patches which were in 1.2 but
not in 1.3 (just one last one to do) but haven't compared with 2.0

Having a release based on 1.3 would be great as it would be a nice
transition towards 2.0 (delegate indexing/search, dependency management with
Ivy, separation between local and remote deployment, removal of redondant
plugins etc...).

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

On 4 January 2011 20:27, Andrzej Bialecki <ab...@getopt.org> wrote:

> Hi users & devs,
>
> As you probably know, there are currently two active lines of development
> for Nutch:
>
> * Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely redesigned
> storage layer that uses Apache Gora, which in turn can use various storage
> implementations such as HBase, Cassandra, and MySQL. This branch is still
> largely experimental and unstable, but work is progressing, and at the
> current pace I think a release should be possible within the next ~6 months.
> Another important addition on this branch is a REST API that allows using
> Nutch as a black-box crawling service.
>
> * Nutch branch-1.3: this started as a snapshot of Nutch trunk just before
> merging with nutchbase (i.e. switching to Gora as a storage layer). This
> branch is still largely similar to the previous versions of Nutch, and uses
> Hadoop MapFile/SequenceFile and "segments". As compared with release 1.2 it
> does NOT ship with any search infrastructure, because all search
> functionality has been delegated to Solr (via SolrIndexer). This is BTW also
> true about Nutch trunk.
>
> Regarding branch-1.2 (which is a maintenance branch after release 1.2)
> there have been pretty no updates there, if any. Nutch committer resources
> are very limited (when it comes to active committers), so I don't expect any
> maintenance release from this branch to happen...
>
> I think that considering the relatively remote release date for Nutch 2.-0
> it would make sense to roll out a 1.3 release based on branch-1.3, after
> making sure that all critical patches from trunk have been merged in there.
>
> What do you think?
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Release planning

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
(cc to dev@nutch since you are addressing devs too)

Hey Andrzej:

> 
> As you probably know, there are currently two active lines of 
> development for Nutch:
> [...snip...]
> 
> Regarding branch-1.2 (which is a maintenance branch after release 1.2) 
> there have been pretty no updates there, if any. Nutch committer 
> resources are very limited (when it comes to active committers), so I 
> don't expect any maintenance release from this branch to happen...

+1, agreed.

> 
> I think that considering the relatively remote release date for Nutch 
> 2.-0 it would make sense to roll out a 1.3 release based on branch-1.3, 
> after making sure that all critical patches from trunk have been merged 
> in there.
> 
> What do you think?

Sounds good to me. Count me in to RM it if you guys are OK with that!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++