You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/09 16:03:01 UTC
[jira] [Commented] (NUTCH-2788) ParseData: improve presentation of
Metadata in method toString()
[ https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129461#comment-17129461 ]
ASF GitHub Bot commented on NUTCH-2788:
---------------------------------------
sebastian-nagel opened a new pull request #529:
URL: https://github.com/apache/nutch/pull/529
- switch to multi-line presentation of Metadata in ParseData::toString
- default implementation of Metadata::toString is still single-line
- replace StringBuffer by StringBuilder in modified methods
Parsechecker will now show metadata as follows:
```
$> bin/nutch parsechecker -Dplugin.includes='parse-(tika|metatags)|protocol-okhttp' http://localhost/
fetching: http://localhost/
...
Title: Apache2 Ubuntu Default Page: It works
Outlinks: 2
outlink: toUrl: http://localhost/icons/ubuntu-logo.png anchor: Ubuntu Logo
outlink: toUrl: http://localhost/manual anchor: manual
Content Metadata:
Accept-Ranges = bytes
Keep-Alive = timeout=5, max=100
nutch.fetch.time = 1591696071739
Server = Apache/2.4.41 (Ubuntu)
ETag = "2aa6-59647cb960db3-gzip"
Connection = Keep-Alive
Vary = Accept-Encoding
Last-Modified = Fri, 01 Nov 2019 12:06:26 GMT
Date = Tue, 09 Jun 2020 09:47:51 GMT
Content-Type = text/html
Parse Metadata:
dc:title = Apache2 Ubuntu Default Page: It works
Content-Encoding = UTF-8
Content-Type-Hint = text/html; charset=UTF-8
Content-Type = application/xhtml+xml; charset=UTF-8
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
> ParseData: improve presentation of Metadata in method toString()
> ----------------------------------------------------------------
>
> Key: NUTCH-2788
> URL: https://issues.apache.org/jira/browse/NUTCH-2788
> Project: Nutch
> Issue Type: Improvement
> Components: metadata, parser
> Affects Versions: 1.16
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.18
>
>
> See NUTCH-2567:
> bq. I would also suggest making the output of Metadata::toString more readable(for instance by adding a newline before each new metadata value). It would have made this bug way easier to spot inside the output of the parsechecker.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)