You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2020/06/18 10:05:47 UTC

[nutch] annotated tag release-1.17 updated (e68bd87 -> eff98db)

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to annotated tag release-1.17
in repository https://gitbox.apache.org/repos/asf/nutch.git.


*** WARNING: tag release-1.17 was modified! ***

    from e68bd87  (tag)
      to eff98db  (tag)
 tagging 1386c5a815f706a3fe6d9e1960924502285c3282 (commit)
 replaces release-1.13
      by Sebastian Nagel
      on Thu Jun 18 12:05:24 2020 +0200

- Log -----------------------------------------------------------------
Apache Nutch 1.17 RC#1
-----------------------------------------------------------------------

    omit 77fa56e  Nutch 1.16 release - update current year in API docs etc. - update version number - add changes / release notes
    omit 46f7bf0  Nutch 1.16 release - update links to Hadoop and Solr API docs
     new dd93c81  Nutch 1.17 release - update links to Hadoop and Solr API docs
     new 1386c5a  Nutch 1.17 release - update current year in API docs etc. - update version number - add changes / release notes

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
annotated tag are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (e68bd87)
            \
             N -- N -- N   refs/tags/release-1.17 (eff98db)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


[nutch] 01/02: Nutch 1.17 release - update links to Hadoop and Solr API docs

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to annotated tag release-1.17
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit dd93c81d4b8f8039a3c87e4ca1a50bcbe0c45cb4
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Wed Jun 17 23:00:09 2020 +0200

    Nutch 1.17 release
    - update links to Hadoop and Solr API docs
---
 default.properties | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/default.properties b/default.properties
index e96c555..4181800 100644
--- a/default.properties
+++ b/default.properties
@@ -44,10 +44,10 @@ test.junit.output.format = plain
 javadoc.proxy.host=-J-DproxyHost=
 javadoc.proxy.port=-J-DproxyPort=
 javadoc.link.java=https://docs.oracle.com/javase/8/docs/api/
-javadoc.link.hadoop=https://hadoop.apache.org/docs/r2.9.2/api/
-javadoc.link.lucene.core=https://lucene.apache.org/core/5_5_0/core/
-javadoc.link.lucene.analyzers-common=https://lucene.apache.org/core/5_5_0/analyzers-common/
-javadoc.link.solr-solrj=https://lucene.apache.org/solr/5_5_0/solr-solrj/
+javadoc.link.hadoop=https://hadoop.apache.org/docs/r3.1.3/api/
+javadoc.link.lucene.core=https://lucene.apache.org/core/8_5_1/core/
+javadoc.link.lucene.analyzers-common=https://lucene.apache.org/core/8_5_1/analyzers-common/
+javadoc.link.solr-solrj=https://lucene.apache.org/solr/8_5_1/solr-solrj/
 javadoc.packages=org.apache.nutch.*
 
 dist.dir=./dist


[nutch] 02/02: Nutch 1.17 release - update current year in API docs etc. - update version number - add changes / release notes

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to annotated tag release-1.17
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit 1386c5a815f706a3fe6d9e1960924502285c3282
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Wed Jun 17 23:10:35 2020 +0200

    Nutch 1.17 release
    - update current year in API docs etc.
    - update version number
    - add changes / release notes
---
 CHANGES.txt            | 82 +++++++++++++++++++++++++++++++++++++++++++++++++-
 NOTICE.txt             |  2 +-
 conf/nutch-default.xml |  2 +-
 default.properties     |  4 +--
 src/bin/nutch          |  2 +-
 5 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 3f26a8d..dcdc6e2 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,6 +1,86 @@
 # Nutch Change Log
 
-Nutch 1.17 Development
+Nutch 1.17 Release 18/06/2020 (dd/mm/yyyy)
+Release Report: https://s.apache.org/ovhry
+
+Bug
+
+    [NUTCH-1559] - parse-metatags duplicates extracted metatags
+    [NUTCH-2379] - crawl script dedup's crawldb update is slow
+    [NUTCH-2419] - Some URL filters and normalizers do not respect command-line override for rule file
+    [NUTCH-2507] - NutchTutorial wiki pages as a lot of outdated command line calls when it starts with the solr interaction
+    [NUTCH-2511] - SitemapProcessor limited by http.content.limit
+    [NUTCH-2525] - Metadata indexer cannot handle uppercase parse metadata
+    [NUTCH-2567] - parse-metatags writes all meta tags twice
+    [NUTCH-2720] - ROBOTS metatag ignored when capitalized
+    [NUTCH-2745] - Solr schema.xml not shipped in binary release
+    [NUTCH-2748] - Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb
+    [NUTCH-2751] - nutch clean does not work with secured solr cloud
+    [NUTCH-2753] - Add -listen option to command-line help of CrawlDbReader and LinkDbReader
+    [NUTCH-2754] - fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec.
+    [NUTCH-2760] - protocol-okhttp: properly record HTTP version in request message header
+    [NUTCH-2761] - ivy jar fails to download
+    [NUTCH-2763] - protocol-okhttp (store.http.headers): add whitespace in status line after status code also when message is empty
+    [NUTCH-2770] - Subcollection logic allows empty string as a whitelist value, thus matching every incoming document.
+    [NUTCH-2778] - indexer-elastic to properly log errors
+    [NUTCH-2787] - CrawlDb JSON dump does not export metadata primitive data types correctly
+    [NUTCH-2789] - Documentation: update links to point to cwiki
+    [NUTCH-2790] - CSVIndexWriter does not escape leading quotes properly
+    [NUTCH-2791] - domainstats, protocolstats and crawlcomplete do not handle GCS URLs
+
+New Feature
+
+    [NUTCH-1863] - Add JSON format dump output to readdb command
+
+Improvement
+
+    [NUTCH-1194] - Generator: CrawlDB lock should be released earlier
+    [NUTCH-2002] - ParserChecker and IndexingFiltersChecker to check robots.txt
+    [NUTCH-2184] - Enable IndexingJob to function with no crawldb
+    [NUTCH-2495] - Use -deleteGone instead of clean job in crawler script while indexing
+    [NUTCH-2496] - Speed up link inversion step in crawling script
+    [NUTCH-2501] - allow to set Java heap size when using crawl script in distributed mode
+    [NUTCH-2649] - Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit
+    [NUTCH-2733] - protocol-okhttp: add support for Brotli compression (Content-Encoding)
+    [NUTCH-2739] - indexer-elastic: Upgrade ES and migrate to REST client
+    [NUTCH-2743] - Add list of Nutch properties (nutch-default.xml) to documentation
+    [NUTCH-2746] - Basic URL normalizer to normalize Unicode domain names
+    [NUTCH-2747] - Replace remaining o.a.commons.logging by org.slf4j
+    [NUTCH-2750] - Improve CrawlDbReader & LinkDbReader reader handling
+    [NUTCH-2752] - indexer-solr: Upgrade to latest Solr version
+    [NUTCH-2755] - Remove obsolete plugin indexer-elastic-rest
+    [NUTCH-2757] - indexer-elastic: add authentication options
+    [NUTCH-2758] - Add plugin READMEs to binary release packages
+    [NUTCH-2759] - bin/crawl: Rename option --num-slaves
+    [NUTCH-2762] - Replace http:// URLs by https:// (build files and documentation)
+    [NUTCH-2767] - Fetcher to stop filling queues skipped due to repeated exceptions
+    [NUTCH-2768] - FetcherThread: unnecessary usage of class casts
+    [NUTCH-2772] - Debugging parse filter to show serialized DOM tree
+    [NUTCH-2773] - SegmentReader (-dump or -get): show HTML content as UTF-8
+    [NUTCH-2774] - Annotate methods implementing the Hadoop API by @Override
+    [NUTCH-2775] - Fetcher to guarantee minimum delay even if robots.txt defines shorter Crawl-delay
+    [NUTCH-2776] - Fetcher to temporarily deduplicate followed redirects
+    [NUTCH-2777] - Upgrade to Hadoop 3.1
+    [NUTCH-2779] - Upgrade to Tika 1.24.1
+    [NUTCH-2780] - Upgrade index-solr to use Solr 8.5.1
+    [NUTCH-2781] - Increase default Java heap size
+    [NUTCH-2783] - Use (more) parametrized logging
+    [NUTCH-2784] - Add tool to list Nutch and Hadoop properties
+    [NUTCH-2785] - FreeGenerator: command-line option to define number of generated fetch lists
+    [NUTCH-2788] - ParseData: improve presentation of Metadata in method toString()
+    [NUTCH-2794] - Add additional ciphers to HTTP base's default cipher suite
+
+Test
+
+    [NUTCH-1945] - Test for XLSX parser
+
+Task
+
+    [NUTCH-2434] - Add methods to reset parameters HTMLMetaTags
+
+Sub-task
+
+    [NUTCH-2735] - Update the indexer-solr documentation about the schema.xml usage
 
 
 Nutch 1.16 Release 02/10/2019 (dd/mm/yyyy)
diff --git a/NOTICE.txt b/NOTICE.txt
index 5b46045..71f29fa 100644
--- a/NOTICE.txt
+++ b/NOTICE.txt
@@ -1,5 +1,5 @@
 Apache Nutch
-Copyright 2019 The Apache Software Foundation
+Copyright 2020 The Apache Software Foundation
 
 This product includes software developed by The Apache Software
 Foundation (http://www.apache.org/).
diff --git a/conf/nutch-default.xml b/conf/nutch-default.xml
index 23af74b..b7c9570 100644
--- a/conf/nutch-default.xml
+++ b/conf/nutch-default.xml
@@ -164,7 +164,7 @@
 
 <property>
   <name>http.agent.version</name>
-  <value>Nutch-1.17-SNAPSHOT</value>
+  <value>Nutch-1.17</value>
   <description>A version string to advertise in the User-Agent 
    header.</description>
 </property>
diff --git a/default.properties b/default.properties
index 4181800..960f788 100644
--- a/default.properties
+++ b/default.properties
@@ -14,9 +14,9 @@
 # limitations under the License.
 
 name=apache-nutch
-version=1.17-SNAPSHOT
+version=1.17
 final.name=${name}-${version}
-year=2019
+year=2020
 
 basedir = ./
 src.dir = ./src/java
diff --git a/src/bin/nutch b/src/bin/nutch
index 244d812..57bf970 100755
--- a/src/bin/nutch
+++ b/src/bin/nutch
@@ -60,7 +60,7 @@ done
 
 # if no args specified, show usage
 if [ $# = 0 ]; then
-  echo "nutch 1.17-SNAPSHOT"
+  echo "nutch 1.17"
   echo "Usage: nutch COMMAND [-Dproperty=value]... [command-specific args]..."
   echo "where COMMAND is one of:"
   echo "  readdb            read / dump crawl db"