You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Alfonso Nishikawa (JIRA)" <ji...@apache.org> on 2016/10/08 17:43:20 UTC
[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x
[ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alfonso Nishikawa updated NUTCH-1741:
-------------------------------------
Attachment: NUTCH-1741-webpage-avsc.patch
Attached a proposed patch for webpage.avsc.
I suspect the creator of the final patch pressed backspace or moved some bracket unnoticed just before creating NUTCH-1741v7.patch, since the Persistent WebPage.SCHEMA$ has the right schema:
If you take a look at the schema of the version in the repository atm [1], near the end it shows:
{code}
\"default\":{}},{\"name\":\"stmPriority\"
{code}
But the schema definition webpage.avsc at [2] shows:
{code}
"default": {
},
{
"name": "stmPriority",
{code}
The patch just fixes de schema, but no recompilation should be needed.
[1] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/java/org/apache/nutch/storage/WebPage.java#L31
[2] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/gora/webpage.avsc#L294
> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
> Key: NUTCH-1741
> URL: https://issues.apache.org/jira/browse/NUTCH-1741
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher, generator
> Reporter: Alparslan Avcı
> Assignee: Cihad Guzel
> Labels: gsoc2015
> Fix For: 2.4
>
> Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, NUTCH-1741-v4.patch, NUTCH-1741-webpage-avsc.patch, NUTCH-1741.patch, NUTCH-1741v5.patch, NUTCH-1741v6.patch, NUTCH-1741v7.patch, SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed in NUTCH-1465 for trunk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)