You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2012/07/06 17:11:34 UTC

[jira] [Commented] (NUTCH-1414) Date extraction parse filter

    [ https://issues.apache.org/jira/browse/NUTCH-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408055#comment-13408055 ] 

Julien Nioche commented on NUTCH-1414:
--------------------------------------

I'm concerned about the proliferation of micro-functionalities such as this one. What do we keep as part of the distribution and what can be stored/maintained somewhere else? One could imagine endless variants around this one e.g. all sorts of entities (People, Location, Emails, etc....) which is certainly useful for whoever wrote the plugin and possible a few more people but means we have more code to maintain, document, debug etc...

We could have a page on the WIKI ("Plugin Market?") pointing to external resources. Obviously the ones which are well maintained, mature and widely used could make it into our repo. With the Nutch artefacts being accessible with Ivy/Maven it would be trivial to write a script to build and test a standalone plugin.
                
> Date extraction parse filter
> ----------------------------
>
>                 Key: NUTCH-1414
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1414
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.6
>
>         Attachments: NUTCH-1414-1.6-1.patch
>
>
> Date extraction parse filter for Nutch to provide means to extract an arbitrary page date (article date) from the parse text.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira