You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rahul Thathoo <ra...@gmail.com> on 2009/06/11 03:05:46 UTC

Cannot seem to get Custom Query Filter working

Hi All,
I am trying to work on the example for building a plugin for nutch here:
http://wiki.apache.org/nutch/WritingPluginExample-0.9

Using nutch-0.9 on Windows XP.

Now, i have set up things exactly as it is mentioned, and hence i actually
see somethings working - like the HTML Parser Extension is able to grab the
contents of the recommended meta tag and add them to the document being
parsed. The indexing filter extension also works in that it is able to add
the field 'recommended' to the lucene text index with the content of the
meta-tag. I use Luke to make sure the content within the meta tag is in the
new recommended field. I also use Luke to query on that content and it
works. Now the query filter doesnt really do what it is supposed to. My
Query Filter code looks like :

package org.apache.nutch.parse.recommended;

import org.apache.nutch.searcher.FieldQueryFilter;
import java.util.logging.Logger;

// Commons imports
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;

public class RecommendedQueryFilter extends FieldQueryFilter {
    private static final Log LOG =
LogFactory.getLog(RecommendedParser.class.getName());

    public RecommendedQueryFilter() {
        super("recommended", 5f);
        LOG.info("Added a recommended query");
    }

      public void setConf(Configuration conf) {
        super.setConf(conf);
    }
}

And my nutch-site.xml looks like:
<property>
  <name>plugin.includes</name>

  <value>recommended|nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url|recommendedSearcher)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

<property>
    <name>searcher.dir</name>
    <value>C:\nutch-0.9\yatry1\</value>
</property>

And my plugin.xml file looks like:

<?xml version="1.0" encoding="UTF-8"?>
<plugin
   id="recommended"
   name="Recommended Parser/Filter"
   version="0.0.1"
   provider-name="nutch.org">

   <runtime>
      <!-- As defined in build.xml this plugin will end up bundled as
recommended.jar -->
      <library name="recommended.jar">
         <export name="*"/>
      </library>
   </runtime>

   <!-- The RecommendedParser extends the HtmlParseFilter to grab the
contents of
        any recommended meta tags -->
   <extension id="org.apache.nutch.parse.recommended.recommendedfilter"
              name="Recommended Parser"
              point="org.apache.nutch.parse.HtmlParseFilter">
      <implementation id="RecommendedParser"

 class="org.apache.nutch.parse.recommended.RecommendedParser"/>
   </extension>

   <!-- TheRecommendedIndexer extends the IndexingFilter in order to add the
contents
        of the recommended meta tags (as found by the RecommendedParser) to
the lucene
        index. -->
   <extension id="org.apache.nutch.parse.recommended.recommendedindexer"
              name="Recommended identifier filter"
              point="org.apache.nutch.indexer.IndexingFilter">
      <implementation id="RecommendedIndexer"

 class="org.apache.nutch.parse.recommended.RecommendedIndexer"/>
   </extension>

   <!-- The RecommendedQueryFilter gets called when you perform a search. It
runs a
        search for the user's query against the recommended fields.  In
order to get
        add this to the list of filters that gets run by default, you have
to use
        "fields=DEFAULT". -->
   <extension id="org.apache.nutch.parse.recommended.recommendedSearcher"
              name="Recommended Search Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="RecommendedQueryFilter"

 class="org.apache.nutch.parse.recommended.RecommendedQueryFilter">
        <parameter name="fields" value="recommended"/>
        </implementation>
   </extension>
</plugin>

I build nutch using ant and deploy a new war file - after making my changes.
But i am running out of ideas on what could be possibly wrong. Any ideas or
clues worth exploring would be greatly appreciated.

thanks
rahul

Re: Cannot seem to get Custom Query Filter working

Posted by Rahul Thathoo <ra...@gmail.com>.
Yes, i have made this addition in src/plugin/build.xml:
<ant dir="*recommended*" target="*deploy*" />

Also - and may be this is a very naive question - where is the query log
location - or is there a flag that needs to be set for the query filter to
start logging?

thanks
rahul

On Wed, Jun 10, 2009 at 10:01 PM, Ankur Garg <ga...@gmail.com>wrote:

> have you made an entry in src/plugin/build.xml file for your plugin?
>
> On Thu, Jun 11, 2009 at 6:35 AM, Rahul Thathoo <rahul.thathoo@gmail.com
> >wrote:
>
> > Hi All,
> > I am trying to work on the example for building a plugin for nutch here:
> > http://wiki.apache.org/nutch/WritingPluginExample-0.9
> >
> > Using nutch-0.9 on Windows XP.
> >
> > Now, i have set up things exactly as it is mentioned, and hence i
> actually
> > see somethings working - like the HTML Parser Extension is able to grab
> the
> > contents of the recommended meta tag and add them to the document being
> > parsed. The indexing filter extension also works in that it is able to
> add
> > the field 'recommended' to the lucene text index with the content of the
> > meta-tag. I use Luke to make sure the content within the meta tag is in
> the
> > new recommended field. I also use Luke to query on that content and it
> > works. Now the query filter doesnt really do what it is supposed to. My
> > Query Filter code looks like :
> >
> > package org.apache.nutch.parse.recommended;
> >
> > import org.apache.nutch.searcher.FieldQueryFilter;
> > import java.util.logging.Logger;
> >
> > // Commons imports
> > import org.apache.commons.logging.Log;
> > import org.apache.commons.logging.LogFactory;
> > import org.apache.hadoop.conf.Configuration;
> >
> > public class RecommendedQueryFilter extends FieldQueryFilter {
> >    private static final Log LOG =
> > LogFactory.getLog(RecommendedParser.class.getName());
> >
> >    public RecommendedQueryFilter() {
> >        super("recommended", 5f);
> >        LOG.info("Added a recommended query");
> >    }
> >
> >      public void setConf(Configuration conf) {
> >        super.setConf(conf);
> >    }
> > }
> >
> > And my nutch-site.xml looks like:
> > <property>
> >  <name>plugin.includes</name>
> >
> >
> >
>  <value>recommended|nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url|recommendedSearcher)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> > </property>
> >
> > <property>
> >    <name>searcher.dir</name>
> >    <value>C:\nutch-0.9\yatry1\</value>
> > </property>
> >
> > And my plugin.xml file looks like:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <plugin
> >   id="recommended"
> >   name="Recommended Parser/Filter"
> >   version="0.0.1"
> >   provider-name="nutch.org">
> >
> >   <runtime>
> >      <!-- As defined in build.xml this plugin will end up bundled as
> > recommended.jar -->
> >      <library name="recommended.jar">
> >         <export name="*"/>
> >      </library>
> >   </runtime>
> >
> >   <!-- The RecommendedParser extends the HtmlParseFilter to grab the
> > contents of
> >        any recommended meta tags -->
> >   <extension id="org.apache.nutch.parse.recommended.recommendedfilter"
> >              name="Recommended Parser"
> >              point="org.apache.nutch.parse.HtmlParseFilter">
> >      <implementation id="RecommendedParser"
> >
> >  class="org.apache.nutch.parse.recommended.RecommendedParser"/>
> >   </extension>
> >
> >   <!-- TheRecommendedIndexer extends the IndexingFilter in order to add
> the
> > contents
> >        of the recommended meta tags (as found by the RecommendedParser)
> to
> > the lucene
> >        index. -->
> >   <extension id="org.apache.nutch.parse.recommended.recommendedindexer"
> >              name="Recommended identifier filter"
> >              point="org.apache.nutch.indexer.IndexingFilter">
> >      <implementation id="RecommendedIndexer"
> >
> >  class="org.apache.nutch.parse.recommended.RecommendedIndexer"/>
> >   </extension>
> >
> >   <!-- The RecommendedQueryFilter gets called when you perform a search.
> It
> > runs a
> >        search for the user's query against the recommended fields.  In
> > order to get
> >        add this to the list of filters that gets run by default, you have
> > to use
> >        "fields=DEFAULT". -->
> >   <extension id="org.apache.nutch.parse.recommended.recommendedSearcher"
> >              name="Recommended Search Query Filter"
> >              point="org.apache.nutch.searcher.QueryFilter">
> >      <implementation id="RecommendedQueryFilter"
> >
> >  class="org.apache.nutch.parse.recommended.RecommendedQueryFilter">
> >        <parameter name="fields" value="recommended"/>
> >        </implementation>
> >   </extension>
> > </plugin>
> >
> > I build nutch using ant and deploy a new war file - after making my
> > changes.
> > But i am running out of ideas on what could be possibly wrong. Any ideas
> or
> > clues worth exploring would be greatly appreciated.
> >
> > thanks
> > rahul
> >
>
>
>
> --
> Ankur Garg
> CDAC Noida
> Anusandhan Bhawan
> Sec-62, Noida
> Mob. 9968274797
>
> अँकुर गर्ग
> प्रगत संगणन विकास केंद्र
> अनुसन्धान भवन
> सेक्टर-६२, नॉएडा
> मोबाइल नंबर - ९९६८२७४७९७
>

Re: Cannot seem to get Custom Query Filter working

Posted by Ankur Garg <ga...@gmail.com>.
have you made an entry in src/plugin/build.xml file for your plugin?

On Thu, Jun 11, 2009 at 6:35 AM, Rahul Thathoo <ra...@gmail.com>wrote:

> Hi All,
> I am trying to work on the example for building a plugin for nutch here:
> http://wiki.apache.org/nutch/WritingPluginExample-0.9
>
> Using nutch-0.9 on Windows XP.
>
> Now, i have set up things exactly as it is mentioned, and hence i actually
> see somethings working - like the HTML Parser Extension is able to grab the
> contents of the recommended meta tag and add them to the document being
> parsed. The indexing filter extension also works in that it is able to add
> the field 'recommended' to the lucene text index with the content of the
> meta-tag. I use Luke to make sure the content within the meta tag is in the
> new recommended field. I also use Luke to query on that content and it
> works. Now the query filter doesnt really do what it is supposed to. My
> Query Filter code looks like :
>
> package org.apache.nutch.parse.recommended;
>
> import org.apache.nutch.searcher.FieldQueryFilter;
> import java.util.logging.Logger;
>
> // Commons imports
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
> import org.apache.hadoop.conf.Configuration;
>
> public class RecommendedQueryFilter extends FieldQueryFilter {
>    private static final Log LOG =
> LogFactory.getLog(RecommendedParser.class.getName());
>
>    public RecommendedQueryFilter() {
>        super("recommended", 5f);
>        LOG.info("Added a recommended query");
>    }
>
>      public void setConf(Configuration conf) {
>        super.setConf(conf);
>    }
> }
>
> And my nutch-site.xml looks like:
> <property>
>  <name>plugin.includes</name>
>
>
>  <value>recommended|nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url|recommendedSearcher)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> </property>
>
> <property>
>    <name>searcher.dir</name>
>    <value>C:\nutch-0.9\yatry1\</value>
> </property>
>
> And my plugin.xml file looks like:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <plugin
>   id="recommended"
>   name="Recommended Parser/Filter"
>   version="0.0.1"
>   provider-name="nutch.org">
>
>   <runtime>
>      <!-- As defined in build.xml this plugin will end up bundled as
> recommended.jar -->
>      <library name="recommended.jar">
>         <export name="*"/>
>      </library>
>   </runtime>
>
>   <!-- The RecommendedParser extends the HtmlParseFilter to grab the
> contents of
>        any recommended meta tags -->
>   <extension id="org.apache.nutch.parse.recommended.recommendedfilter"
>              name="Recommended Parser"
>              point="org.apache.nutch.parse.HtmlParseFilter">
>      <implementation id="RecommendedParser"
>
>  class="org.apache.nutch.parse.recommended.RecommendedParser"/>
>   </extension>
>
>   <!-- TheRecommendedIndexer extends the IndexingFilter in order to add the
> contents
>        of the recommended meta tags (as found by the RecommendedParser) to
> the lucene
>        index. -->
>   <extension id="org.apache.nutch.parse.recommended.recommendedindexer"
>              name="Recommended identifier filter"
>              point="org.apache.nutch.indexer.IndexingFilter">
>      <implementation id="RecommendedIndexer"
>
>  class="org.apache.nutch.parse.recommended.RecommendedIndexer"/>
>   </extension>
>
>   <!-- The RecommendedQueryFilter gets called when you perform a search. It
> runs a
>        search for the user's query against the recommended fields.  In
> order to get
>        add this to the list of filters that gets run by default, you have
> to use
>        "fields=DEFAULT". -->
>   <extension id="org.apache.nutch.parse.recommended.recommendedSearcher"
>              name="Recommended Search Query Filter"
>              point="org.apache.nutch.searcher.QueryFilter">
>      <implementation id="RecommendedQueryFilter"
>
>  class="org.apache.nutch.parse.recommended.RecommendedQueryFilter">
>        <parameter name="fields" value="recommended"/>
>        </implementation>
>   </extension>
> </plugin>
>
> I build nutch using ant and deploy a new war file - after making my
> changes.
> But i am running out of ideas on what could be possibly wrong. Any ideas or
> clues worth exploring would be greatly appreciated.
>
> thanks
> rahul
>



-- 
Ankur Garg
CDAC Noida
Anusandhan Bhawan
Sec-62, Noida
Mob. 9968274797

अँकुर गर्ग
प्रगत संगणन विकास केंद्र
अनुसन्धान भवन
सेक्टर-६२, नॉएडा
मोबाइल नंबर - ९९६८२७४७९७