You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Giuseppe Totaro (JIRA)" <ji...@apache.org> on 2015/03/11 18:04:40 UTC
[jira] [Created] (NUTCH-1959) Improving CommonCrawlFormat
implementations
Giuseppe Totaro created NUTCH-1959:
--------------------------------------
Summary: Improving CommonCrawlFormat implementations
Key: NUTCH-1959
URL: https://issues.apache.org/jira/browse/NUTCH-1959
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.9
Reporter: Giuseppe Totaro
Priority: Minor
{{CommonCrawlFormat}} is an interface for Java classes that implement methods for writing data into Common Crawl format. {{AbstractCommonCrawlFormat}} is an abstract class that implements {{CommonCrawlFormat}} and provides abstract methods for "CommonCrawl formatter" classes.
You can find in attachment a PATCH that includes some improvements for {{CommonCrawlFormat}}-based classes;
* {{CommonCrawlFormat}} and {{AbstractCommonCrawlFormat}} now provide only the {{getJsonData()}} method, responsible for getting out JSON data.
* {{AbstractCommonCrawlFormat}} provides also the abstract methods that each subclass has to implement in order to handle JSON objects.
* {{CommonCrawlFormatSimple}} is a {{StringBuilder}}-based formatter that now provide also escaping of JSON string values.
This PATCH aims at providing a better interface for implementing/extending {{CommonCrawlFormat}} classes.
I would really appreciate your feedback.
Thanks a lot,
Giuseppe
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)