You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2017/09/29 11:46:43 UTC
[nutch] branch master updated (da64358 -> 777e759)
This is an automated email from the ASF dual-hosted git repository.
snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.
from da64358 Merge pull request #227 from kpm1985/NUTCH-2436
add 7db1173 NUTCH-2433 / Html Parser: keep htmltag where the outlinks are found
add ca59744 New configuration parameter: 'parser.html.outlinks.htmlnode_metadata_name' set empty value as default.
add bfd47db Small adjustment: Keep a reference to the last outlink to set metadata.
add 3067753 Apply new parameter "parser.html.outlinks.ignore_tags" to the tika parser, as well. Some extra [eclipse-codeformat.xml] formatting changes applied as well.
new 777e759 Merge pull request #224 from maborec/NUTCH-2433
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
conf/nutch-default.xml | 7 ++++
.../apache/nutch/parse/html/DOMContentUtils.java | 24 ++++++++++++--
.../apache/nutch/parse/tika/DOMContentUtils.java | 37 +++++++++++++++++-----
3 files changed, 58 insertions(+), 10 deletions(-)
--
To stop receiving notification emails like this one, please contact
['"commits@nutch.apache.org" <co...@nutch.apache.org>'].
[nutch] 01/01: Merge pull request #224 from maborec/NUTCH-2433
Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git
commit 777e759ada24eac84072a5f1722938442432eadc
Merge: da64358 3067753
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Fri Sep 29 13:46:40 2017 +0200
Merge pull request #224 from maborec/NUTCH-2433
Nutch 2433 - New configuration for HTML parser to keep the HTML nodes in outlinks metadata
conf/nutch-default.xml | 7 ++++
.../apache/nutch/parse/html/DOMContentUtils.java | 24 ++++++++++++--
.../apache/nutch/parse/tika/DOMContentUtils.java | 37 +++++++++++++++++-----
3 files changed, 58 insertions(+), 10 deletions(-)
--
To stop receiving notification emails like this one, please contact
"commits@nutch.apache.org" <co...@nutch.apache.org>.