You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/07/02 20:01:19 UTC

Re: [VOTE] Apache Nutch 1.5.1 Release Candidate

Hi Guys,

Just an update on this.

Please take a look at CHANGES to the new branch I created [0]

I'm waiting on Sebastian's comments as currently the zip and tar-src's
produce the desired output however the tar and zip-bin targets do not.

If this is not a blocker then I can release the artifacts for a VOTE
but I wanted to hear from you guys before I do so.

Best

Lewis

[0] http://svn.apache.org/repos/asf/nutch/branches/branch-1.5.1/CHANGES.txt

On Thu, Jun 28, 2012 at 6:42 PM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> OK this will be done ASAP.
>
> Thanks for the comments and the time.
>
> Lewis
>
> On Thu, Jun 28, 2012 at 8:32 AM, Markus Jelsma
> <ma...@openindex.io> wrote:
>> Hello,
>>
>> I'd opt for these additional patches
>> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche)
>> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus)
>> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche)
>>
>> -----Original message-----
>>> From:Lewis John Mcgibbney <le...@gmail.com>
>>> Sent: Wed 27-Jun-2012 20:33
>>> To: dev@nutch.apache.org
>>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate
>>>
>>> Hi,
>>>
>>>
>>> On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma
>>> <ma...@openindex.io> wrote:
>>> > Hello,
>>> >
>>> > I would prefer a minimal bugfix release. The stuff that i committed to trunk may still have some quirks that i haven't found yet, the HostURLNormalizer thing Sebastian noted was just one of them.
>>> >
>>>
>>> OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the
>>> following commits...
>>>
>>> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche)
>>>
>>> * NUTCH-1404 Nutch script fails to find job file in deploy mode
>>> (sidabatra, jnioche)
>>>
>>> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche)
>>>
>>> * NUTCH-1300 Indexer to filter normalize URL's (markus)
>>>
>>> * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus)
>>>
>>> * NUTCH-1319 HostNormalizer plugin (markus)
>>>
>>> * NUTCH-1386 Headings filter not to add empty values (markus)
>>>
>>> * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread
>>> handling (ferdy via markus)
>>>
>>> * NUTCH-1352 Improve regex urlfilters/normalizers synchronization
>>> (ferdy via markus)
>>>
>>> * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus)
>>>
>>> * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc)
>>>
>>> * NUTCH-1360 Suport the storing of IP address connected to when web
>>> crawling (lewismc)
>>>
>>> * NUTCH-1262 Map `duplicating` content-types to a single type (markus)
>>>
>>> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus)
>>>
>>> * NUTCH-1385 More robust plug-in order properties in nutch-site.xml
>>> (Andy Xue via markus)
>>>
>>> * NUTCH-1336 Optionally not index db_notmodified pages (markus)
>>>
>>> * NUTCH-1346 Follow outlinks to ignore external (markus)
>>>
>>> * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus)
>>>
>>> * NUTCH-1351 DomainStatistics to aggregate by TLD (markus)
>>>
>>> * NUTCH-1381 Allow to override default subcollection field name (markus)
>>>
>>> * NUTCH-XX Commit to add configuration for separation of ant
>>> distribution targets (lewismc + jnioche)
>>>
>>> Do we just wish to include
>>>
>>> * NUTCH-1404 Nutch script fails to find job file in deploy mode
>>> (sidabatra, jnioche) ???
>>>
>>> I can run this tomorrow. Thanks
>>>
>>> [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt
>>>
>
>
>
> --
> Lewis



-- 
Lewis