You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Alfonso Nishikawa (JIRA)" <ji...@apache.org> on 2016/10/08 17:43:20 UTC

[jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x

     [ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alfonso Nishikawa updated NUTCH-1741:
-------------------------------------
    Attachment: NUTCH-1741-webpage-avsc.patch

Attached a proposed patch for webpage.avsc. 

I suspect the creator of the final patch pressed backspace or moved some bracket unnoticed just before creating NUTCH-1741v7.patch, since the Persistent WebPage.SCHEMA$ has the right schema:

If you take a look at the schema of the version in the repository atm [1], near the end it shows:
{code}
\"default\":{}},{\"name\":\"stmPriority\"
{code}

But the schema definition webpage.avsc at [2] shows:

{code}
      "default": {

      },
      {
        "name": "stmPriority",
{code}

The patch just fixes de schema, but no recompilation should be needed.


[1] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/java/org/apache/nutch/storage/WebPage.java#L31

[2] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/gora/webpage.avsc#L294

> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
>                 Key: NUTCH-1741
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1741
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator
>            Reporter: Alparslan Avcı
>            Assignee: Cihad Guzel
>              Labels: gsoc2015
>             Fix For: 2.4
>
>         Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, NUTCH-1741-v4.patch, NUTCH-1741-webpage-avsc.patch, NUTCH-1741.patch, NUTCH-1741v5.patch, NUTCH-1741v6.patch, NUTCH-1741v7.patch, SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed in NUTCH-1465 for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)