You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Alfonso Nishikawa (JIRA)" <ji...@apache.org> on 2016/10/08 14:45:21 UTC

[jira] [Commented] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

    [ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558096#comment-15558096 ] 

Alfonso Nishikawa commented on NUTCH-1741:
------------------------------------------

I believe webpage.avsc is wrong in this patch. It should be:

{code}
    {
      "name": "sitemaps",
      "type": {
        "type": "map",
        "values": [
          "null",
          "string"
        ]
      },
      "doc": "Sitemap urls in robot.txt",
      "default": {} <---
    }, <-----------------
    {
      "name": "stmPriority",
      "type": "float",
      "doc": "",
      "default": 0
    }
{code}

In WebPage.SCHEMA$ is correct.

> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
>                 Key: NUTCH-1741
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1741
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator
>            Reporter: Alparslan Avcı
>            Assignee: Cihad Guzel
>              Labels: gsoc2015
>             Fix For: 2.4
>
>         Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, NUTCH-1741-v4.patch, NUTCH-1741.patch, NUTCH-1741v5.patch, NUTCH-1741v6.patch, NUTCH-1741v7.patch, SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed in NUTCH-1465 for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)