You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2005/12/11 19:11:08 UTC

[jira] Commented: (NUTCH-34) Parsing different content formats

    [ http://issues.apache.org/jira/browse/NUTCH-34?page=comments#action_12360147 ] 

Chris A. Mattmann commented on NUTCH-34:
----------------------------------------

Hi Folks,

 Just wondering: is this issue taken care of by NUTCH-88? It would seem at least some elements of it were (i.e., the single location of parse plugin ordering and such, the ability to have different parse plugins registered to the same mimeType, with priority, etc.). The only thing that isn't really handled by NUTCH-88 is the contentLength addition to the plugin.xml file, etc., but that could (and IMO should) be split into a separate issue.

I recommend closing this issue as the bulk of it was handled by NUTCH-88.

Cheers,
  Chris


> Parsing different content formats
> ---------------------------------
>
>          Key: NUTCH-34
>          URL: http://issues.apache.org/jira/browse/NUTCH-34
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: Stephan Strittmatter
>     Priority: Trivial

>
> At the moment Nuch is set up to filter content by config the xml-config file.
> There it is also set global how many bytes are loaded.
> I think it yould be better to let the parser plugins "register" themselfe in some registry where every plugin could tell the fetcher, that:
> 1. this document type is wanted (because the parser plugin is 
>    installed and activated)
> 2. how much of the content is required (some plugins need the whole 
>    content and some not)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira