You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2017/12/18 19:09:36 UTC

[nutch] branch branch-1.14 created (now a8e60bd)

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch branch-1.14
in repository https://gitbox.apache.org/repos/asf/nutch.git.


      at a8e60bd  Nutch 1.14 release - update version number - add changes / release notes

This branch includes the following new commits:

     new a8e60bd  Nutch 1.14 release - update version number - add changes / release notes

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-- 
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].

[nutch] 01/01: Nutch 1.14 release - update version number - add changes / release notes

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch branch-1.14
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit a8e60bdfb79b368612f068ed5aeeb690e29b448d
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Mon Dec 18 20:07:35 2017 +0100

    Nutch 1.14 release
    - update version number
    - add changes / release notes
---
 CHANGES.txt            | 99 +++++++++++++++++++++++++++++++++++++++++++++++---
 conf/nutch-default.xml |  2 +-
 default.properties     |  2 +-
 src/bin/nutch          |  2 +-
 4 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index c9946e7..eec205b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,15 +1,104 @@
 # Nutch Change Log
 
-Nutch 1.14 Release (dd/mm/yyyy)
+Nutch 1.14 Release 18/12/2017 (dd/mm/yyyy)
 
 Comments
 
-Fellow committers, Nutch 1.14 contains a breaking change NUTCH-2046. Please use the note below and
-in the release announcement and keep it on top in this CHANGES.txt for the Nutch 1.14 release.
-* the bin/crawl script now expects the path to the seed to be preceded by -s
+Breaking Changes
+
+    - the bin/crawl script now expects the path to the seed to be preceded by -s  (NUTCH-2046)
+
+Bug
+
+    [NUTCH-2071] - A parser failure on a single document may fail crawling job
+    [NUTCH-2235] - Classpath discrepancy with protocol-selenium in deploy mode
+    [NUTCH-2269] - Clean not working after crawl
+    [NUTCH-2295] - Nutch master docker container broken
+    [NUTCH-2297] - CrawlDbReader -stats wrong values for earliest fetch time and shortest interval
+    [NUTCH-2316] - Library conflict with Parser-Tika Plugin and Lib Folder
+    [NUTCH-2317] - Plugin jars don't get added to classpath while running in local
+    [NUTCH-2322] - URL not available for Jexl operations
+    [NUTCH-2354] - Upgrade Hadoop dependencies to 2.7.4
+    [NUTCH-2365] - HTTP Redirects to SubDomains don't get crawled if db.ignore.external.links.mode == byDomain
+    [NUTCH-2371] - Injector to support noFilter and noNormalize
+    [NUTCH-2372] - Javadocs build failing.
+    [NUTCH-2386] - BasicURLNormalizer does not encode curly braces
+    [NUTCH-2391] - Spurious Duplications for MD5
+    [NUTCH-2394] - Possible bugs in the source code
+    [NUTCH-2398] - Fetcher saving redirected robots.txt under redirect target URL
+    [NUTCH-2399] - indexer-elastic does not index multi-value fields (only the first value is indexed)
+    [NUTCH-2401] - headings plugin does not trim values
+    [NUTCH-2403] - Nutch Selenium: Wrong documentation about PhantomJS
+    [NUTCH-2413] - Parsing fetcher to respect property "parse.filter.urls"
+    [NUTCH-2420] - Bug in variable generate.max.count and fetcher.server.delay
+    [NUTCH-2436] - Remove empty comment, and redundant semicolon from CommandRunner
+    [NUTCH-2442] - Injector to stop if job fails to avoid loss of CrawlDb
+    [NUTCH-2444] - HostDB CSV dumper to emit field header by default
+    [NUTCH-2446] - URLFiltersCheck fix
+    [NUTCH-2448] - Allow Sending an empty http.agent.version
+    [NUTCH-2451] - protocol-ftp to resolve relative URL when following redirects
+    [NUTCH-2452] - Problem retrieving encoded URLs via FTP?
+    [NUTCH-2456] - Allow to index pages/URLs not contained in CrawlDb
+    [NUTCH-2458] - TikaParser doesn't work with tika-config.xml set
+    [NUTCH-2464] - Plugin headings: Headers That Contain HTML Elements Are Not Parsed
+    [NUTCH-2465] - Broken Eclipse project. Classpaths and interactiveselenium should be fixed.
+    [NUTCH-2472] - Sitemap processor does not honour db.ignore.external.links
+    [NUTCH-2473] - Elasticsearch REST Indexer broken due to wrong depenency
+    [NUTCH-2474] - CrawlDbReader -stats fails with ClassCastException
+    [NUTCH-2478] - // is not a valid base URL
+    [NUTCH-2483] - Remove/replace indirect dependencies to org.json
+
+Improvement
+
+    [NUTCH-1763] - Improving comments on the Injector Class
+    [NUTCH-2034] - CrawlDB filtered documents counter.
+    [NUTCH-2035] - Regex filter using case sensitive rules.
+    [NUTCH-2046] - The crawl script should be able to skip an initial injection.
+    [NUTCH-2135] - Ant Eclipse build does not include protocol-interactiveselenium
+    [NUTCH-2193] - Upgrade feed parser plugin to use rome 1.5
+    [NUTCH-2216] - db.ignore.*.links to optionally follow internal redirects
+    [NUTCH-2281] - Support non-default FileSystem
+    [NUTCH-2296] - Elasticsearch Indexing Over Rest
+    [NUTCH-2320] - URLFilterChecker to run as TCP Telnet service
+    [NUTCH-2335] - Injector not to filter and normalize existing URLs in CrawlDb
+    [NUTCH-2362] - Upgrade MaxMind GeoIP version in index-geoip
+    [NUTCH-2368] - Variable generate.max.count and fetcher.server.delay
+    [NUTCH-2370] - FileDumper: save JSON mapping file -> URL
+    [NUTCH-2376] - Improve configurability of HTTP Accept* header fields
+    [NUTCH-2378] - ChildFirst plugin classloader
+    [NUTCH-2380] - indexer-elastic version upgrade to 5.3.0
+    [NUTCH-2397] - Parser to add paragraph line breaks
+    [NUTCH-2400] - Solr 6.6.0 compatibility
+    [NUTCH-2406] - Sum up constants, make minor changes
+    [NUTCH-2408] - CrawlDb: allow update from unparsed segments
+    [NUTCH-2409] - Injector: complete command-line help and counters
+    [NUTCH-2414] - Allow LanguageIndexingFilter to actually filter documents by language.
+    [NUTCH-2430] - Complete plugin build configuration
+    [NUTCH-2431] - URLFilterchecker to implement Tool-interface
+    [NUTCH-2439] - Upgrade to Apache Tika 1.17
+    [NUTCH-2443] - Extract links from the video tag with the parse-html plugin
+    [NUTCH-2445] - Fetcher following outlinks to keep track of already fetched items
+    [NUTCH-2463] - Enable sampling CrawlDB
+    [NUTCH-2468] - should filter out invalid URLs by default
+    [NUTCH-2470] - CrawlDbReader -stats to show quantiles of score
+    [NUTCH-2477] - Refactor *Checker classes to use base class for common code
+    [NUTCH-2480] - Upgrade crawler-commons dependency to 0.9
 
 New Feature
-    [NUTCH-2046] -  The crawl script should be able to skip an initial injection
+
+    [NUTCH-1465] - Support sitemaps in Nutch
+    [NUTCH-1932] - Automatically remove orphaned pages
+    [NUTCH-2333] - Indexer for RabbitMQ
+    [NUTCH-2338] - URLNormalizerChecker to run as TCP Telnet service
+    [NUTCH-2415] - Create a JEXL based IndexingFilter
+    [NUTCH-2433] - Html Parser: keep htmltag where the outlinks are found
+    [NUTCH-2435] - New configuration allowing to choose whether to store 'parse_text' directory or not.
+    [NUTCH-2484] - Extend indexer-elastic-rest to support languages
+
+Task
+
+    [NUTCH-2181] - Add Webpage for 3rd Party Connectors/Libraries to Apache Nutch
+
 
 Nutch 1.13 Release 28/03/2017 (dd/mm/yyyy)
 Release Report: https://s.apache.org/wq3x
diff --git a/conf/nutch-default.xml b/conf/nutch-default.xml
index c88e5b9..797e348 100644
--- a/conf/nutch-default.xml
+++ b/conf/nutch-default.xml
@@ -164,7 +164,7 @@
 
 <property>
   <name>http.agent.version</name>
-  <value>Nutch-1.14-SNAPSHOT</value>
+  <value>Nutch-1.14</value>
   <description>A version string to advertise in the User-Agent 
    header.</description>
 </property>
diff --git a/default.properties b/default.properties
index c057518..bf466f9 100644
--- a/default.properties
+++ b/default.properties
@@ -14,7 +14,7 @@
 # limitations under the License.
 
 name=apache-nutch
-version=1.14-SNAPSHOT
+version=1.14
 final.name=${name}-${version}
 year=2017
 
diff --git a/src/bin/nutch b/src/bin/nutch
index 10e8c29..f42abfd 100755
--- a/src/bin/nutch
+++ b/src/bin/nutch
@@ -53,7 +53,7 @@ done
 
 # if no args specified, show usage
 if [ $# = 0 ]; then
-  echo "nutch 1.14-SNAPSHOT"
+  echo "nutch 1.14"
   echo "Usage: nutch COMMAND"
   echo "where COMMAND is one of:"
   echo "  readdb            read / dump crawl db"

-- 
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.