You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2018/06/02 11:35:16 UTC
[nutch] branch master updated (0cec7b5 -> 2544fad)
This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from 0cec7b5 Merge pull request #335 from r0ann3l/NUTCH-2580
add 4f73c63 NUTCH-2583 Upgrading Nutch's dependencies - apply patch contributed by Ralf
add 20ecad2 NUTCH-2584 Upgrade parse-tika to use Tika 1.18
add f5e3a30 NUTCH-2584 Upgrade parse-tika to use Tika 1.18 - fix failing unit tests - use Tika parser to get DOM tree of test documents - fix HTMLMetaProcessor to extract no-cache and base-href attributes on DOM tree modified by Tika - ignore links from FORM and SOURCE elements which are not extracted by Tika parser
add 217e646 Add target "report" to view dependency tree of plugins
add 107b364 NUTCH-2589 HTML redirections are not followed when using parse-tika - extract meta-refresh redirects from DOM tree normalized by Tika - add unit test to check whether meta-refresh redirects are extracted and parse status holds the redirect target
new 2544fad Merge pull request #336 from sebastian-nagel/NUTCH-2583-upgrade-dependencies
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
ivy/ivy.xml | 67 +++++------
src/plugin/build-plugin.xml | 4 +
src/plugin/parse-tika/build.xml | 15 +--
src/plugin/parse-tika/howto_upgrade_tika.txt | 16 ++-
src/plugin/parse-tika/ivy.xml | 2 +-
src/plugin/parse-tika/plugin.xml | 65 +++++++----
.../apache/nutch/parse/tika/HTMLMetaProcessor.java | 125 +++++++++++++--------
.../org/apache/nutch/parse/tika/TikaParser.java | 20 ++--
.../{ => parse}/tika/TestDOMContentUtils.java | 78 +++++++------
.../nutch/{ => parse}/tika/TestFeedParser.java | 2 +-
.../nutch/{ => parse}/tika/TestHtmlParser.java | 2 +-
.../nutch/{ => parse}/tika/TestImageMetadata.java | 2 +-
.../nutch/{ => parse}/tika/TestMSWordParser.java | 2 +-
.../nutch/{ => parse}/tika/TestOOParser.java | 2 +-
.../nutch/{ => parse}/tika/TestPdfParser.java | 2 +-
.../nutch/{ => parse}/tika/TestRTFParser.java | 2 +-
.../{ => parse}/tika/TestRobotsMetaProcessor.java | 70 ++++++++----
17 files changed, 276 insertions(+), 200 deletions(-)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestDOMContentUtils.java (89%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestFeedParser.java (99%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestHtmlParser.java (99%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestImageMetadata.java (98%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestMSWordParser.java (98%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestOOParser.java (98%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestPdfParser.java (98%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestRTFParser.java (98%)
rename src/plugin/parse-tika/src/test/org/apache/nutch/{ => parse}/tika/TestRobotsMetaProcessor.java (68%)
--
To stop receiving notification emails like this one, please contact
snagel@apache.org.
[nutch] 01/01: Merge pull request #336 from
sebastian-nagel/NUTCH-2583-upgrade-dependencies
Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git
commit 2544fad223faeaafef966d0f04ff00da9f749641
Merge: 0cec7b5 107b364
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Sat Jun 2 13:35:11 2018 +0200
Merge pull request #336 from sebastian-nagel/NUTCH-2583-upgrade-dependencies
NUTCH-2583 Upgrading Nutch's dependencies (contributed by Ralf)
NUTCH-2584 Upgrade parse-tika to use Tika 1.18
NUTCH-2589 HTML redirections are not followed when using parse-tika
ivy/ivy.xml | 67 +++++------
src/plugin/build-plugin.xml | 4 +
src/plugin/parse-tika/build.xml | 15 +--
src/plugin/parse-tika/howto_upgrade_tika.txt | 16 ++-
src/plugin/parse-tika/ivy.xml | 2 +-
src/plugin/parse-tika/plugin.xml | 65 +++++++----
.../apache/nutch/parse/tika/HTMLMetaProcessor.java | 125 +++++++++++++--------
.../org/apache/nutch/parse/tika/TikaParser.java | 20 ++--
.../{ => parse}/tika/TestDOMContentUtils.java | 78 +++++++------
.../nutch/{ => parse}/tika/TestFeedParser.java | 2 +-
.../nutch/{ => parse}/tika/TestHtmlParser.java | 2 +-
.../nutch/{ => parse}/tika/TestImageMetadata.java | 2 +-
.../nutch/{ => parse}/tika/TestMSWordParser.java | 2 +-
.../nutch/{ => parse}/tika/TestOOParser.java | 2 +-
.../nutch/{ => parse}/tika/TestPdfParser.java | 2 +-
.../nutch/{ => parse}/tika/TestRTFParser.java | 2 +-
.../{ => parse}/tika/TestRobotsMetaProcessor.java | 70 ++++++++----
17 files changed, 276 insertions(+), 200 deletions(-)
--
To stop receiving notification emails like this one, please contact
snagel@apache.org.