You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2006/10/04 19:51:22 UTC

[jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

     [ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]

Chris A. Mattmann updated NUTCH-379:
------------------------------------

    Attachment: NUTCH-379.Mattmann.100406.patch.txt

Small patch that at least gets started on fixing the larger issue of content urls and parser mapping, in that it forwards the content URL (as is expected anyways by the ParserFactory I/F) to the getParsers method in the ParserFactory

> ParseUtil does not pass through the content's URL to the ParserFactory
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-379
>                 URL: http://issues.apache.org/jira/browse/NUTCH-379
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8, 0.9.0, 0.8.1
>         Environment: Power Mac Dual G5, 2.0 Ghz, although fix is independent of environment
>            Reporter: Chris A. Mattmann
>         Assigned To: Chris A. Mattmann
>             Fix For: 0.8, 0.9.0, 0.8.1, 0.8.2
>
>         Attachments: NUTCH-379.Mattmann.100406.patch.txt
>
>
> Currently the ParseUtil class that is called by the Fetcher to actually perform the parsing of content does not forward thorugh the content's url for use in the ParserFactory. A bigger issue, however, is that the url (and for that matter, the pathSuffix) is no longer used to determine which parsing plugin should be called. My colleague at JPL discovered that more major bug and will soon input a JIRA issue for it. However, in the meantime, this small patch at least sets up the forwarding of the content's URL to the ParserFactory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira