You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2010/05/24 10:32:25 UTC
[jira] Assigned: (NUTCH-826) Mailing list is broken.
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche reassigned NUTCH-826:
-----------------------------------
Assignee: Julien Nioche
> Mailing list is broken.
> -----------------------
>
> Key: NUTCH-826
> URL: https://issues.apache.org/jira/browse/NUTCH-826
> Project: Nutch
> Issue Type: Bug
> Reporter: John Sherwood
> Assignee: Julien Nioche
> Priority: Blocker
>
> All of the following addresses are failing:
> nutch-user@nutch.apache.org
> nutch-user-subscribe@nutch.apache.org
> nutch-user-subscribe@lucene.apache.org
> For the last one, the mailer daemon said
> "This mailing list has moved to user at nutch.apache.org."
> Below is the message I tried to send:
> Hi people,
> I've been banging my head against this problem for two days now.
> Simply, I want to add a field with the value of a given meta tag.
> I've been trying the parse-xml plugin, but that seems that it doesn't
> work with version 1.0. I've tried the code at
> http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html
> and it hasn't worked. I don't even know why. I don't even know if my
> plugin is being used... or even looked for! Nutch seems to have a
> infuriating "Fail silently" policy for plugins. I put a
> System.exit(1) in my filters just to see if my code is even being
> encountered. It has not in spite of my config telling it to.
> Here's my config:
> nutch-site.xml
> ...
> <property>
> <name>plugin.includes</name>
> <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|metadata</value>
> </property>
> ...
> parse-plugins.xml
> ...
> <mimeType name="application/xhtml+xml">
> <plugin id="parse-html" />
> <plugin id="metadata" />
> </mimeType>
> <mimeType name="text/html">
> <plugin id="parse-html" />
> <plugin id="metadata" />
> </mimeType>
> <mimeType name="text/sgml">
> <plugin id="parse-html" />
> <plugin id="metadata" />
> </mimeType>
> <mimeType name="text/xml">
> <plugin id="parse-html" />
> <plugin id="parse-rss" />
> <plugin id="metadata" />
> <plugin id="feed" />
> </mimeType>
> ...
> <alias name="metadata"
> extension-id="com.example.website.nutch.parsing.MetaTagExtractorParseFilter"
> />
> ...
> I've also copied the plugin.xml and jar from my build/metadata to the
> plugins root dir.
> Nonetheless, Nutch runs and puts data in solr for me. Afaik, Nutch is
> completely unaware of my plugin despite my config options. Is the
> some other place I need to tell Nutch to use my plugin? Is there some
> other approach to do this without having to write a plugin? This does
> seem like a lot of work to simply get a meta tag into a field. Any
> help would be appreciated.
> Sincerely,
> John Sherwood
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.