You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by rk_sharma <rk...@yahoo.com> on 2014/01/11 16:10:43 UTC

data not stored in creating plugin for nutch-2.1

Hello

I am creating a plugin for nutch-2.1 
I am using ParseFilter for getting Parsed Content. I have create an extra
field in mysql with name filtered_content but i am getting null value in
this field. i am unable in find what error i have made.

Steps which i am following for creating Plugin is
1 - i have create a myPlugin/src/java/nutch/plugin/selecto/MyParseFilter and
myPlugin/src/java/nutch/plugin/selecto/MyParseIndex classes in Plugin
directory and create build.xml,ivy.xml and plugin.xml in myPlugin directory

2 - Next step is that i have put entry of myPlugin in build.xml in plugin
directory
 
3- Step is I have made some changes in nutch-site.xml.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
  <name>http.agent.name</name>
  <value>spineor</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

	http.robots.agents
	http.agent.description
	http.agent.url
	http.agent.email
	http.agent.version

  and set their values appropriately.

  </description>
</property>

<property>
  <name>http.robots.agents</name>
  <value>spineor</value>
  <description>The agent strings we'll look for in robots.txt files,
  comma-separated, in decreasing order of precedence. You should
  put the value of http.agent.name as the first agent name, and keep the
  default * at the end of the list. E.g.: BlurflDev,Blurfl,*
  </description>
</property>
<property>
<name>http.accept.language</name>
<value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the “Accept-Language” request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>

<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
<description>The character encoding to fall back to when no other
information
is available</description>
</property>

<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
<description>blah blah
</description>
</property>
<property>
    <name>plugin.includes</name>
   
<value>protocol-http|urlfilter-regex|parse-(html|tika)|myPlugin|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
    <description>blah blahh</description>
</property>
<property>
    <name>parser.html.selector.blacklist</name>
    <value>div.bottom,div.desc,div</value>
    <description>blah blah</description>
</property>
<property>
    <name>parser.html.selector.storage_field</name>
    <value>filtered_content</value>
    <description>blah blah</description>
</property>
</configuration>


4 - in conf/schema.xml i have add this mapping line
<field name="filtered_content" type="text" stored="false" indexed="true">        

i have also create filtered_content field in mysql.So what can be error in
my configuration


Thanks
Rajni Kant

	



--
View this message in context: http://lucene.472066.n3.nabble.com/data-not-stored-in-creating-plugin-for-nutch-2-1-tp4110881.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: data not stored in creating plugin for nutch-2.1

Posted by rk_sharma <rk...@yahoo.com>.
Thanks Lewis for your suggestion

But i want to add some more point in clarification regarding my problem 
i Have checked Hadoop.log file it shows me that my plugin(
2014-01-13 02:08:09,362 INFO  plugin.PluginRepository - 	Blacklist and
Whitelist Parser and Indexer(myPlugin) 
) is correct moreover i found error when i create my plugin i removed those
error by configuring some jar file.


i am following  this <https://github.com/veggen/nutch-element-selector>  
link for configuration

 
 thanks 



--
View this message in context: http://lucene.472066.n3.nabble.com/data-not-stored-in-creating-plugin-for-nutch-2-1-tp4110881p4111005.html
Sent from the Nutch - User mailing list archive at Nabble.com.