You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andy Morris <an...@woodward.edu> on 2006/02/02 23:18:48 UTC

Xml?

 
What is this error from?
060202 141539 ParserFactory:Plugin: parse-text mapped to contentType
text/xml via parse-plugins.xml, but its plugin.xml file does not claim
to support contentType: text/xml
060202 141539 ParserFactory:Plugin: parse-html mapped to contentType
text/xml via parse-plugins.xml, but its plugin.xml file does not claim
to support contentType: text/xml
060202 141539 ParserFactory: Plugin: parse-rss mapped to contentType
text/xml via parse-plugins.xml, but not enabled via plugin.includes in
nutch-default.xml

Andy

Re: plugins directory

Posted by Doug Cutting <cu...@apache.org>.
mikeyc wrote:
> Any idea how the 'plugins' directory gets populated?  I noticed
> microformats-hreview was not there.  It does exist in the build directory
> with its jar and class files.  Could this be the issue?  

The plugins directory exists in release builds.  When developing, 
plugins live in build/plugins.  If you're developing you should 
generally work from a subversion checkout, not a downloaded release.

Doug

Success

Posted by mikeyc <mc...@gmail.com>.
Hey Guys - Just to let you know.  That was the issue.  I updated the
nutch-default.xml file to use 'build/plugins' as the plugin directory. 
Works now.  Still not really sure why microformats-hreview directory did not
get copied to the 'plugins' directory.  

Regards,
Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3887051
Sent from the Nutch - User forum at Nabble.com.


plugins directory

Posted by mikeyc <mc...@gmail.com>.
Hi Guys,
Any idea how the 'plugins' directory gets populated?  I noticed
microformats-hreview was not there.  It does exist in the build directory
with its jar and class files.  Could this be the issue?  

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3886722
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Jerome / Chris,
Thanks for all your help.  I'll re-check my configuration.  Must be
something I did.

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3886205
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Jérôme Charron <je...@gmail.com>.
Mike,

I have just tested the plugin in the tar file you sended to me.
All is working fine : the plugin is loaded !
Please check one more time your conf

Regards


Jérôme

Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Ok.  Just sent it.

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3885674
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Jérôme Charron <je...@gmail.com>.
Mike,

could you please send me (on my private mail) a tar of your plugin.
Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Yes, I always run a global build after making any changes.  Even checked the
.job file for my plugin and it's there.

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3885030
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Jérôme Charron <je...@gmail.com>.
> No, I don't see it in this list and yes I have added my plugin to
> nutch-site.xml.

Did you make a global ant build?
(in order to create a .job file for hadoop that will contain your plugin
code?)

Jérôme

Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
No, I don't see it in this list and yes I have added my plugin to
nutch-site.xml.

060412 121836 Plugin Auto-activation mode: [true]
060412 121836 Registered Plugins:
060412 121836   CyberNeko HTML Parser (lib-nekohtml)
060412 121836   Site Query Filter (query-site)
060412 121836   Html Parse Plug-in (parse-html)
060412 121836   Regex URL Filter Framework (lib-regex-filter)
060412 121836   Jakarta Commons HTTP Client (lib-commons-httpclient)
060412 121836   Basic Indexing Filter (index-basic)
060412 121836   File Protocol Plug-in (protocol-file)
060412 121836   Text Parse Plug-in (parse-text)
060412 121836   Regex URL Filter (urlfilter-regex)
060412 121836   Basic Query Filter (query-basic)
060412 121836   HTTP Framework (lib-http)
060412 121836   XML Libraries (lib-xml)
060412 121836   URL Query Filter (query-url)
060412 121836   Log4j (lib-log4j)
060412 121836   Http Protocol Plug-in (protocol-http)
060412 121836   the nutch core extension points (nutch-extensionpoints)
060412 121836   RSS Parse Plug-in (parse-rss)
060412 121836 Registered Extension-Points:
060412 121836   Nutch Protocol (org.apache.nutch.protocol.Protocol)
060412 121836   Nutch URL Filter (org.apache.nutch.net.URLFilter)
060412 121836   HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
060412 121836   Nutch Online Search Results Clustering Plugin
(org.apache.nutch.clustering.OnlineClusterer)
060412 121836   Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
060412 121836   Nutch Content Parser (org.apache.nutch.parse.Parser)
060412 121836   Ontology Model Loader (org.apache.nutch.ontology.Ontology)
060412 121836   Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
060412 121836   Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3884843
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Jérôme Charron <je...@gmail.com>.
> plugins.misc=\
>    org.apache.nutch.analysis.lang*:\
>    org.apache.nutch.microformats.reltag*:\
>    org.apache.nutch.microformats.hreview*:\
>    org.creativecommons.nutch*

No. This is just for javadoc purpose

Do you see that your plugin is activated in traces?
(when the plugin repository initializes itself it display the list of
activated plugin)
Do you add your plugin in the plugins.include property (nutch-site.xml or
nutch-default.xml)?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Hi Chris,
Yup, build directory looks fine.  Has my folder with the appropriate jar and
class files.  Also, I did add my plugin to the default.properties file. 
Does this make a difference?

plugins.misc=\
   org.apache.nutch.analysis.lang*:\
   org.apache.nutch.microformats.reltag*:\
   org.apache.nutch.microformats.hreview*:\
   org.creativecommons.nutch*

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3884590
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Mike,

 Another thing is: are you making sure that your plugin is being built? That
is, did you add an entry in $NUTCH_HOME/src/build.xml for your plugin,
underneath the the "deploy" tag (at least)? This will cause your plugin to
be built when the rest of the plugins are built, and then copied to
$NUTCH_HOME/build, which is where the plugin repository will look for the
runtime for plugins. Your plugin might not be loaded because of that. Please
check and let us know.

Cheers,
  Chris



On 4/12/06 8:56 AM, "mikeyc" <mc...@gmail.com> wrote:

> 
> Chris / Jerome,
> Ok.  So, now the error message is gone, but my plugin doesn't seem to be
> getting called (not seeing any of my messages).  As listed below, I updated
> my plugin.xml (similar to microformats-reltag) and removed any entries in
> the parse-plugins.xml file.
> 
> Any ideas?  
> 
> Again, thanks for helping me work through these issues - didn't have half as
> many with version 0.7. ;)
> 
> -Mike
> --
> View this message in context:
> http://www.nabble.com/Xml--t1050112.html#a3884328
> Sent from the Nutch - User forum at Nabble.com.
> 

______________________________________________
Chris A. Mattmann
Chris.Mattmann@jpl.nasa.gov
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.



Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Chris / Jerome,
Ok.  So, now the error message is gone, but my plugin doesn't seem to be
getting called (not seeing any of my messages).  As listed below, I updated
my plugin.xml (similar to microformats-reltag) and removed any entries in
the parse-plugins.xml file.  

Any ideas?  

Again, thanks for helping me work through these issues - didn't have half as
many with version 0.7. ;) 

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3884328
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Jérôme Charron <je...@gmail.com>.
Mike,

First of all, remove the <extension-point ...> from your plugin xml (it is
defined in nutch-extensionpoints plugin).
Then, add the required directive :
<requires>
      <import plugin="nutch-extensionpoints"/>
   </requires>
Finally, remove the alias and the declaration of your plugin in the
parse-plugins.xml : your parser is not a Parser, but a HtmlParse (called be
the parse-html plugin),
so there is no need for your plugin to be mapped to a content-type.

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Chris,
Yeah, sorry that was a mistake when I posted.  My alias tags are
appropriately in parse-plugins.xml.  I thought the issue could have been a
typo in the plugin.xml file (implementation id), but I re-built / re-ran and
I'm still getting the same error.

1 - Yes

2 & 3 - Yes (listed below)

   <extension-point
      id="org.apache.nutch.parse.HtmlParseFilter"
      name="Nutch HTML Parse Filter"/>

   <runtime>
      <library name="microformats-hreview.jar">
         <export name="*"/>
      </library>
   </runtime>

   <extension id="org.apache.nutch.microformats.hreview.HReviewParser"
              name="HReview parser"
              point="org.apache.nutch.parse.HtmlParseFilter">
      <implementation
id="org.apache.nutch.microformats.hreview.HReviewParser"
                     
class="org.apache.nutch.microformats.hreview.HReviewParser"/>
   </extension>

Thanks,
Mike

--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3883182
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Mike,

 Well one thing that I notice off the bat is that you specify the alias tag
in nutch-site.xml (or maybe this was a typo when you posted the message). If
it wasn't, the alias tag should go into $NUTCH_HOME/conf/parse-plugins.xml,
the same place where you mapped the mimeTypes to plugin ids. Second, I would
ask that you verify that the following are true:

1. you have a plugin called "microformats-hreview" located in
$NUTCH_HOME/src/plugin/microformats-hreview

2. the plugin "microformats-hreview" has a plugin.xml file

3. the implementation id attribute inside of the plugin.xml file for the
microformats-hreview plugin is set to the value
"org.apache.nutch.microformats.hreview.HReviewParser"

Check on those things and let me know what you find out. We'll get to the
bottom of this.

Cheers,
  Chris

--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3882468
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Sure no problem. 

log message
060411 235725 ParserFactory: Plugin:
org.apache.nutch.microformats.hreview.HReviewParser mapped to contentType
text/html via parse-plugins.xml, but not enabled via plugin.includes in
nutch-default.xml

parse-plugins.xml
<mimeType name="application/xhtml+xml">
                <plugin id="microformats-hreview" />
</mimeType>
<mimeType name="text/html">
                <plugin id="microformats-hreview"/>
                <plugin id="parse-html" />
</mimeType>

nutch-site.xml
<alias name="microformats-hreview" 
                       
extension-id="org.apache.nutch.microformats.hreview.HReviewParser" />

<configuration>
        <property>
          <name>plugin.includes</name>
         
<value>nutch-extensionpoints|protocol-(file|http)|urlfilter-regex|index-basic|parse-(
html|text|rss)|query-(basic|site|url)|microformats-hreview</value>
          <description>Regular expression naming plugin directory names
to</description>
        </property>

</configuration>
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3875663
Sent from the Nutch - User forum at Nabble.com.


Re: Same Error (Version 0.8)

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Mike,

   Could you post the snippet from your nutch-site.xml where you enable
plugin: org.apache.nutch.xxx.xxx.xxx. Could you also be more specific and
post the entire name of the plugin that it printed in your log file? This
warning message basically means that there was an entry in the
parse-plugins.xml file for your plugin org.apache.nutch.xxx.xxx.xxx, but it
was never enabled in nutch-site.xml, (or nutch-default.xml).

Thanks,
  Chris

--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3875572
Sent from the Nutch - User forum at Nabble.com.


Same Error (Version 0.8)

Posted by mikeyc <mc...@gmail.com>.
Hey Chris,
Any idea why I would get the same error message even though I updated my
nutch-site.xml and parse-plugins.xml files?

060411 230237 ParserFactory: Plugin: org.apache.nutch.xxx.xxx.xxx mapped to
contentType text/html via parse-plugins.xml, but not enabled via
plugin.includes in nutch-default.xml

-Mike
--
View this message in context: http://www.nabble.com/Xml--t1050112.html#a3874926
Sent from the Nutch - User forum at Nabble.com.


RE: Xml?

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Andy,

> What is this error from?

Wow, super cool! You're the first post I've seen to the list regarding these
log messages that I put in :-) For that matter, they're log warnings, not
errors really:

> 060202 141539 ParserFactory:Plugin: parse-text mapped to contentType
> text/xml via parse-plugins.xml, but its plugin.xml file does not claim
> to support contentType: text/xml

This one says that you have the parse-text plugin mapped to the contentType
"text/xml" in the parse-plugins.xml file. However, this is kind of weird
because the plugin.xml file for the parse-text plugin does not claim to
support "text/xml". So, it's just a warning.

> 060202 141539 ParserFactory:Plugin: parse-html mapped to contentType
> text/xml via parse-plugins.xml, but its plugin.xml file does not claim
> to support contentType: text/xml

Same issue here.

> 060202 141539 ParserFactory: Plugin: parse-rss mapped to contentType
> text/xml via parse-plugins.xml, but not enabled via plugin.includes in
> nutch-default.xml

This is another cool one (in my opinion :-) ). It says that you went ahead
and mapped parse-rss to the contentType "text/xml" in parse-plugins.xml,
however, you didn't enable parse-rss in the plugin.includes property in
nutch-default.xml, or nutch-site.xml.

Does that make sense?

Cheers,
  Chris

> 
> Andy