You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/05/24 10:32:25 UTC

[jira] Assigned: (NUTCH-826) Mailing list is broken.

     [ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche reassigned NUTCH-826:
-----------------------------------

    Assignee: Julien Nioche

> Mailing list is broken.
> -----------------------
>
>                 Key: NUTCH-826
>                 URL: https://issues.apache.org/jira/browse/NUTCH-826
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: John Sherwood
>            Assignee: Julien Nioche
>            Priority: Blocker
>
> All of the following addresses are failing:
> nutch-user@nutch.apache.org
> nutch-user-subscribe@nutch.apache.org
> nutch-user-subscribe@lucene.apache.org
> For the last one, the mailer daemon said 
> "This mailing list has moved to user at nutch.apache.org."
> Below is the message I tried to send:
> Hi people,
> I've been banging my head against this problem for two days now.
> Simply, I want to add a field with the value of a given meta tag.
> I've been trying the parse-xml plugin, but that seems that it doesn't
> work with version 1.0.  I've tried the code at
> http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html
> and it hasn't worked.  I don't even know why.  I don't even know if my
> plugin is being used... or even looked for!  Nutch seems to have a
> infuriating "Fail silently" policy for plugins.  I put a
> System.exit(1) in my filters just to see if my code is even being
> encountered.  It has not in spite of my config telling it to.
> Here's my config:
> nutch-site.xml
> ...
> <property>
>  <name>plugin.includes</name>
>  <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|metadata</value>
> </property>
> ...
> parse-plugins.xml
> ...
> <mimeType name="application/xhtml+xml">
>    <plugin id="parse-html" />
>    <plugin id="metadata" />
> </mimeType>
> <mimeType name="text/html">
>       <plugin id="parse-html" />
>       <plugin id="metadata" />
> </mimeType>
> <mimeType name="text/sgml">
>       <plugin id="parse-html" />
>       <plugin id="metadata" />
> </mimeType>
> <mimeType name="text/xml">
>          <plugin id="parse-html" />
>          <plugin id="parse-rss" />
>         <plugin id="metadata" />
>         <plugin id="feed" />
> </mimeType>
> ...
> <alias name="metadata"
> extension-id="com.example.website.nutch.parsing.MetaTagExtractorParseFilter"
> />
> ...
> I've also copied the plugin.xml and jar from my build/metadata to the
> plugins root dir.
> Nonetheless, Nutch runs and puts data in solr for me.  Afaik, Nutch is
> completely unaware of my plugin despite my config options.  Is the
> some other place I need to tell Nutch to use my plugin?  Is there some
> other approach to do this without having to write a plugin?  This does
> seem like a lot of work to simply get a meta tag into a field.  Any
> help would be appreciated.
> Sincerely,
> John Sherwood

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.