You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Kaidul Islam (JIRA)" <ji...@apache.org> on 2017/05/23 09:32:04 UTC

[jira] [Updated] (NUTCH-2389) Precise data parsing using Jsoup CSS selectors

     [ https://issues.apache.org/jira/browse/NUTCH-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kaidul Islam updated NUTCH-2389:
--------------------------------
    Description: 
Currently Nutch 1.x and 2.x has no features to extract/parse exact contents for specific websites. I've developed a plugin using Jsoup for my current project to extract precise content for site specific crawling using XML configuration.

Please let me know if this feature seems relevant and currently not present in Nutch. I have also plan to export it into Nutch 1.x.

  was:
Currently Nutch 1.x and 2.x has no features to extract/parse exact contents for specific websites. I've developed a plugin using Jsoup for my current project to extract precise content for site specific crawling.

Please let me know if this feature seems relevant and currently not present in Nutch. I have also plan to export it into Nutch 1.x.


> Precise data parsing using Jsoup CSS selectors
> ----------------------------------------------
>
>                 Key: NUTCH-2389
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2389
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 2.3
>            Reporter: Kaidul Islam
>            Assignee: Kaidul Islam
>             Fix For: 2.4
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> Currently Nutch 1.x and 2.x has no features to extract/parse exact contents for specific websites. I've developed a plugin using Jsoup for my current project to extract precise content for site specific crawling using XML configuration.
> Please let me know if this feature seems relevant and currently not present in Nutch. I have also plan to export it into Nutch 1.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)