You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Julien Nioche <li...@gmail.com> on 2011/01/05 11:28:48 UTC
Backport to 1.3 (was: Release planning)
I've finished porting the changes from 1.2 which were missing in 1.3 and
were not related to the Lucene indexing or search
- NUTCH-878 ScoringFilters should not override the injected score
- NUTCH-901 Make index-more plug-in configurable (Markus Jelsma via
mattmann)
- NUTCH-905 Configurable file protocol parent directory crawling
(Thorsten Scherler, mattmann, ab)
- NUTCH-855 ScoringFilter and IndexingFilter: To allow for the
propagation of URL Metatags and their subsequent indexing (Scott Gonyea via
mattmann)
- NUTCH-716 Make subcollection index filed multivalued (Dmitry Lihachev
via jnioche)
I've compared the changes from 2.0 with 1.3 and found the following
differences (excluding anything specific to 2.0/GORA)
- * NUTCH-564 External parser supports encoding attribute (Antony
Bowesman, mattmann)*
- NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh,
mattmann)
- * NUTCH-825 Publish nutch artifacts to central maven repository
(mattmann)*
- NUTCH-851 Port logging to slf4j (jnioche)
- NUTCH-861 Renamed HTMLParseFilter into ParseFilter
- * NUTCH-872 Change the default fetcher.parse to FALSE (ab).*
- * NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab)*
- NUTCH-880 REST API for Nutch (ab)
- * NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche)*
- * NUTCH-884 FetcherJob should run more reduce tasks than default (ab)*
- * NUTCH-886 A .gitignore file for Nutch (dogacan)*
- * NUTCH-894 Move statistical language identification from indexing to
parsing step*
- * NUTCH-921 Reduce dependency of Nutch on config files (ab)*
- * NUTCH-930 Remove remaining dependencies on Lucene API (ab)*
- NUTCH-931 Simple admin API to fetch status and stop the service (ab)
- NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab)
I've created a new issue on
https://issues.apache.org/jira/browse/NUTCH-951to track this. I'd be
in favour of porting only the things that are not new
functionalities and put them in bold above.
Any thoughts on this?
Julien
On 4 January 2011 21:44, Julien Nioche <li...@gmail.com>wrote:
> +1 from me. I've committed today a bunch of patches which were in 1.2 but
> not in 1.3 (just one last one to do) but haven't compared with 2.0
>
> Having a release based on 1.3 would be great as it would be a nice
> transition towards 2.0 (delegate indexing/search, dependency management with
> Ivy, separation between local and remote deployment, removal of redondant
> plugins etc...).
>
> Julien
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>
>
> On 4 January 2011 20:27, Andrzej Bialecki <ab...@getopt.org> wrote:
>
>> Hi users & devs,
>>
>> As you probably know, there are currently two active lines of development
>> for Nutch:
>>
>> * Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely redesigned
>> storage layer that uses Apache Gora, which in turn can use various storage
>> implementations such as HBase, Cassandra, and MySQL. This branch is still
>> largely experimental and unstable, but work is progressing, and at the
>> current pace I think a release should be possible within the next ~6 months.
>> Another important addition on this branch is a REST API that allows using
>> Nutch as a black-box crawling service.
>>
>> * Nutch branch-1.3: this started as a snapshot of Nutch trunk just before
>> merging with nutchbase (i.e. switching to Gora as a storage layer). This
>> branch is still largely similar to the previous versions of Nutch, and uses
>> Hadoop MapFile/SequenceFile and "segments". As compared with release 1.2 it
>> does NOT ship with any search infrastructure, because all search
>> functionality has been delegated to Solr (via SolrIndexer). This is BTW also
>> true about Nutch trunk.
>>
>> Regarding branch-1.2 (which is a maintenance branch after release 1.2)
>> there have been pretty no updates there, if any. Nutch committer resources
>> are very limited (when it comes to active committers), so I don't expect any
>> maintenance release from this branch to happen...
>>
>> I think that considering the relatively remote release date for Nutch 2.-0
>> it would make sense to roll out a 1.3 release based on branch-1.3, after
>> making sure that all critical patches from trunk have been merged in there.
>>
>> What do you think?
>>
>> --
>> Best regards,
>> Andrzej Bialecki <><
>> ___. ___ ___ ___ _ _ __________________________________
>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
>> ___|||__|| \| || | Embedded Unix, System Integration
>> http://www.sigram.com Contact: info at sigram dot com
>>
>>
>
>
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com