You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by student_t <cc...@cscinfo.com> on 2008/09/30 15:25:57 UTC

Please help with QueryFilter configuration

Hi! Nutch Experts,

Please help me with the following two questions.

1. It seems to me the Lucene QueryFilter functions differently from Nutch
QueryFilter. Lucene QueryFilter filters out hits and Nutch QueryFilter
modifies the underlying Query. Is my understanding correct?

2. How do I configure so Nutch knows to send the query to the underlying
QueryFilter before issuing its search? I have all the plugins in the right
directories and seen in my log file that these plugins are loaded. But when
I issue my search, I didn't see log entries (INFO) in the QueryFilter code
spitted out. I must be missing a configuration entry. I think this entry
must be put in nutch-default.xml. But what are the entries and can I load
the configuration file any time I want? It seems to me when I construct a
Nutch Query it attempts to load the configuration dynamically.

Thanks very much in advance for any insights!

-student_t



-- 
View this message in context: http://www.nabble.com/Please-help-with-QueryFilter-configuration-tp19742072p19742072.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Please help with QueryFilter configuration

Posted by student_t <cc...@cscinfo.com>.
Thanks so much Doğacan,

At which point does the Nutch query processing system calls the query
filter? Is it on the server side or client side? On the client side, I see
NutchBean.search method is called with a Nutch Query. At that (client side)
point, the Query instance looks like

content:"+(content:pepsi) +((host:ca^10.0))"

I want to filter out the first "conent" field to make it look like a regular
Lucene BooleanQuery. But I don't know whether I should do it on the client
or server side. Would you please elaborate on this?

In addition, I am using raw-fields but the filter didn't look like was
called. I thought I put in entries in log4j.properties to have it logged but
don't know why there are no logging text even after I put in 

Thanks again!
-student_t



Doğacan Güney-3 wrote:
> 
> On Tue, Sep 30, 2008 at 4:25 PM, student_t <cc...@cscinfo.com> wrote:
>>
>> Hi! Nutch Experts,
>>
>> Please help me with the following two questions.
>>
>> 1. It seems to me the Lucene QueryFilter functions differently from Nutch
>> QueryFilter. Lucene QueryFilter filters out hits and Nutch QueryFilter
>> modifies the underlying Query. Is my understanding correct?
>>
> 
> Yes.
> 
>> 2. How do I configure so Nutch knows to send the query to the underlying
>> QueryFilter before issuing its search? I have all the plugins in the
>> right
>> directories and seen in my log file that these plugins are loaded. But
>> when
>> I issue my search, I didn't see log entries (INFO) in the QueryFilter
>> code
>> spitted out. I must be missing a configuration entry. I think this entry
>> must be put in nutch-default.xml. But what are the entries and can I load
>> the configuration file any time I want? It seems to me when I construct a
>> Nutch Query it attempts to load the configuration dynamically.
>>
> 
> Have you modified your plugin.xml?
> 
> For example, query-basic's looks like this:
> 
> .....
>       <implementation id="BasicQueryFilter"
>                      
> class="org.apache.nutch.searcher.basic.BasicQueryFilter">
>         <parameter name="fields" value="DEFAULT"/>
>       </implementation>
> ....
> 
> And query-url like this:
> 
> .....
> <parameter name="fields" value="url"/>
> .....
> 
> A query filter only gets called if Nutch's query processing system
> determines that it has fields to parse.
> So, query-basic is called on everything without a specific field
> (thus, DEFAULT) and query-url is called
> if query has "url:......"
> 
> 
>> Thanks very much in advance for any insights!
>>
>> -student_t
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Please-help-with-QueryFilter-configuration-tp19742072p19742072.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Doğacan Güney
> 
> 

-- 
View this message in context: http://www.nabble.com/Please-help-with-QueryFilter-configuration-tp19742072p19763425.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Please help with QueryFilter configuration

Posted by Doğacan Güney <do...@gmail.com>.
On Tue, Sep 30, 2008 at 4:25 PM, student_t <cc...@cscinfo.com> wrote:
>
> Hi! Nutch Experts,
>
> Please help me with the following two questions.
>
> 1. It seems to me the Lucene QueryFilter functions differently from Nutch
> QueryFilter. Lucene QueryFilter filters out hits and Nutch QueryFilter
> modifies the underlying Query. Is my understanding correct?
>

Yes.

> 2. How do I configure so Nutch knows to send the query to the underlying
> QueryFilter before issuing its search? I have all the plugins in the right
> directories and seen in my log file that these plugins are loaded. But when
> I issue my search, I didn't see log entries (INFO) in the QueryFilter code
> spitted out. I must be missing a configuration entry. I think this entry
> must be put in nutch-default.xml. But what are the entries and can I load
> the configuration file any time I want? It seems to me when I construct a
> Nutch Query it attempts to load the configuration dynamically.
>

Have you modified your plugin.xml?

For example, query-basic's looks like this:

.....
      <implementation id="BasicQueryFilter"
                      class="org.apache.nutch.searcher.basic.BasicQueryFilter">
        <parameter name="fields" value="DEFAULT"/>
      </implementation>
....

And query-url like this:

.....
<parameter name="fields" value="url"/>
.....

A query filter only gets called if Nutch's query processing system
determines that it has fields to parse.
So, query-basic is called on everything without a specific field
(thus, DEFAULT) and query-url is called
if query has "url:......"


> Thanks very much in advance for any insights!
>
> -student_t
>
>
>
> --
> View this message in context: http://www.nabble.com/Please-help-with-QueryFilter-configuration-tp19742072p19742072.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney