You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gautham Pai <bu...@gmail.com> on 2007/10/09 21:40:04 UTC

Custom field query

I have seen this question being asked multiple times in this forum. However
this has confused me more because each has its own approach to solving the
issue and no one has outlined the steps in one place. The tutorials seem to
be a bit outdated too.

The version of Nutch I am using is 0.9.

I have 3 custom fields that I have added via an IndexingFilter. The fields
are: author, title and description. I now intend to provide support for
querying these fields as:
author:Gautham
title:Nutch
etc.

I added an Author class as follows:

public class Author extends RawFieldQueryFilter {
	private Configuration conf;

	public Author() {
		super("author", 5f);
	}

	public void setConf(Configuration conf) {
		this.conf = conf;
	}

	public Configuration getConf() {
		return this.conf;
	}
}

and made an entry in plugin.xml as:

 <extension id="query.Author"
              name="Author"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="Author"
                      class="query.Author">
             <parameter name="fields" value="author"/>
      </implementation>
   </extension>

When I use NutchBean to perform the query, I see no results. I also tried
changing this to RawFieldQueryFilter to QueryFilter and following the
approach used in the query-more plugin. It does not seem to work either.

The questions I have specifically are:
* Do I need to create one class per custom field that I intend to provide
support for query?
* Should I use RawFieldQueryFilter or QueryFilter?
* Should I make an entry as: <parameter name="fields" value="author"/> or
<parameter name="fields" value="DEFAULT"/> in plugin.xml?

Any help or pointers is greatly appreciated.

Thanks,
Gautham.
-- 
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13123141
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Custom field query

Posted by Gautham Pai <bu...@gmail.com>.
Just a related question:

If LIMO can perform custom field search, why does Nutch require us to add a
new plug-in to perform searches on custom fields?

Gautham.
-- 
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13307609
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Custom field query

Posted by Jasper Kamperman <ja...@openwaternet.com>.
I added some explanation to the Wiki page.

On Oct 18, 2007, at 12:10 PM, Gautham Pai wrote:

>
> Hi,
>
> I was able to solve the problem today. The problem was with the way  
> I do the
> indexing. I used UN_TOKENIZED while doing my indexing, because that  
> is what
> is mentioned in this tutorial:
> http://wiki.apache.org/nutch/WritingPluginExample
>
> I changed that to TOKENIZED and everything works fine now.
>
> Why is it mentioned as TOKENIZED in the tutorial?
>
> Jasper and Julien, thanks for the help.
>
> Gautham.
>
>
> Jasper Kamperman wrote:
>>
>> Yup, the patch solved our problem. Actually it's more the other way
>> around, Julien Nioche published the patch as a result of solving our
>> problem :-).
>>
>> Jasper
>>
>> On Oct 10, 2007, at 1:53 PM, Gautham Pai wrote:
>>
>>>
>>> I see you had a similar issue here:
>>> http://www.nabble.com/Field-based-search-on-metadata-
>>> tf4213684.html#a12045840
>>>
>>> Were you able to solve the problem? I am facing the exact same
>>> issue as is
>>> mentioned in the thread.
>>>
>>> The problem of being able to query multiple fields using just one
>>> class is
>>> secondary. I am right now trying to solve the basic problem of
>>> querying one
>>> custom field at a time.
>>>
>>> Does the patch help me with this?
>>>
>>> Gautham.
>>>
>>> Jasper Kamperman wrote:
>>>>
>>>> You might want to check out this patch https://issues.apache.org/
>>>> jira/
>>>> browse/NUTCH-563 . From what I understand of your questions, it  
>>>> might
>>>> help solve your issues.
>>>>
>>>> Jasper
>>>>
>>>> On Oct 10, 2007, at 9:08 AM, Milan Krendzelak wrote:
>>>>
>>>>> Hi Gautham,
>>>>>
>>>>> I am using Nutch 0.8 and implemented the new field to search in
>>>>> according the plugin query-lang.
>>>>> Try to do the same as query-lang, let's say just for testing...
>>>>> Also don't forget to create new plugin.xml and define fields
>>>>> parameter.
>>>>> It works for me, and I think it should work for you too.
>>>>>
>>>>> BasicQueryFilter is used to query the index on different fields  
>>>>> but
>>>>> this the same Term
>>>>> for example +(url:java anchor:java content:java title:java ...)
>>>>> in your case, as I understand you want to query index with
>>>>> different terms like: +author:Guatham +title:Nutch  
>>>>> +description:Java
>>>>> In this case you have to build you own query and when pass the
>>>>> query as a parameter to search function ( for example in  
>>>>> NutchBean )
>>>>>
>>>>> Actually you are right about the tutorial or documentation.
>>>>> Compare to other Apache products, Nutch is really pure documented.
>>>>> Thanks god we have this mailing list, otherwise I would be  
>>>>> lost :-)
>>>>>
>>>>> Regards,
>>>>> M
>>>>>
>>>>> Milan Krendzelak
>>>>> Senior Software Developer
>>>>>
>>>>> mTLD Top Level Domain Limited is a private limited company
>>>>> incorporated and registered in the Republic of Ireland with
>>>>> registered number 398040 and registered office at Arthur Cox
>>>>> Building, Earlsfort Terrace, Dublin 2
>>>>>
>>>>> ________________________________
>>>>>
>>>>> From: Gautham Pai [mailto:buzypi@gmail.com]
>>>>> Sent: Wed 10/10/2007 16:24
>>>>> To: nutch-user@lucene.apache.org
>>>>> Subject: Re: Custom field query
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Still, no luck. I am not able to search on a single field let  
>>>>> alone
>>>>> multiple
>>>>> fields per class.
>>>>>
>>>>> I tried debugging the code and this is what I found:
>>>>>
>>>>> * I see the field  listed in the FIELD_NAMES HashSet in
>>>>> QueryFilters.java.
>>>>> * LuceneQueryOptimizer's method: optimize has a call to
>>>>> searcher.search and
>>>>> this returns no TopDocs in the case of author. If I do a search on
>>>>> "url" it
>>>>> works fine and I see results.
>>>>> * I tried changing the boost value. No effect.
>>>>>
>>>>> The fields that I am searching on are not tokenized. I don't have
>>>>> any
>>>>> analyzers defined. Is this a problem?
>>>>>
>>>>> What else could be wrong?
>>>>>
>>>>> Could this be a problem with Lucene or am I missing some
>>>>> configuration?
>>>>>
>>>>> Thanks,
>>>>> Gautham
>>>>>
>>>>> Sagar Naik-2 wrote:
>>>>>>
>>>>>> Hey,
>>>>>> Pl see the answers to the questions below.
>>>>>> Gautham Pai wrote:
>>>>>>> I have seen this question being asked multiple times in this
>>>>>>> forum.
>>>>>>> However
>>>>>>> this has confused me more because each has its own approach to
>>>>>>> solving
>>>>>>> the
>>>>>>> issue and no one has outlined the steps in one place. The
>>>>>>> tutorials seem
>>>>>>> to
>>>>>>> be a bit outdated too.
>>>>>>>
>>>>>>> The version of Nutch I am using is 0.9.
>>>>>>>
>>>>>>> I have 3 custom fields that I have added via an IndexingFilter.
>>>>>>> The
>>>>>>> fields
>>>>>>> are: author, title and description. I now intend to provide
>>>>>>> support for
>>>>>>> querying these fields as:
>>>>>>> author:Gautham
>>>>>>> title:Nutch
>>>>>>> etc.
>>>>>>>
>>>>>>> I added an Author class as follows:
>>>>>>>
>>>>>>> public class Author extends RawFieldQueryFilter {
>>>>>>>      private Configuration conf;
>>>>>>>
>>>>>>>      public Author() {
>>>>>>>              super("author", 5f);
>>>>>>>      }
>>>>>>>
>>>>>>>      public void setConf(Configuration conf) {
>>>>>>>              this.conf = conf;
>>>>>>>      }
>>>>>>>
>>>>>>>      public Configuration getConf() {
>>>>>>>              return this.conf;
>>>>>>>      }
>>>>>>> }
>>>>>>>
>>>>>>> and made an entry in plugin.xml as:
>>>>>>>
>>>>>>>  <extension id="query.Author"
>>>>>>>               name="Author"
>>>>>>>               point="org.apache.nutch.searcher.QueryFilter">
>>>>>>>       <implementation id="Author"
>>>>>>>                       class="query.Author">
>>>>>>>              <parameter name="fields" value="author"/>
>>>>>>>       </implementation>
>>>>>>>    </extension>
>>>>>>>
>>>>>>> When I use NutchBean to perform the query, I see no results. I
>>>>>>> also tried
>>>>>>> changing the RawFieldQueryFilter to QueryFilter and following  
>>>>>>> the
>>>>>>> approach
>>>>>>> used in the query-more plugin. It does not seem to work either.
>>>>>>>
>>>>>>> The questions I have specifically are:
>>>>>>> * Do I need to create one class per custom field that I  
>>>>>>> intend to
>>>>>>> provide
>>>>>>> support for query?
>>>>>>>
>>>>>> Generally, one class for all the custom fields is sufficient. In
>>>>>> your
>>>>>> case too, u should be able to do with one class
>>>>>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>>>>>
>>>>>> RawFieldQueryFilter implements  QueryFilter , So I would use
>>>>>> RawfieldQueryFilter.
>>>>>>> * Should I make an entry as: <parameter name="fields"
>>>>>>> value="author"/> or
>>>>>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>>>>>
>>>>>>>
>>>>>> In your case,
>>>>>>
>>>>>> <parameter name="fields" value="author, title, description"/>
>>>>>> should solve
>>>>>> the problem.
>>>>>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>>>>>
>>>>>>> Any help or pointers is greatly appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gautham.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> This message has been scanned for viruses and
>>>>>> dangerous content and is believed to be clean.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://www.nabble.com/Custom-field-
>>>>> query-tf4596454.html#a13138143
>>>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Custom-field-
>>> query-tf4596454.html#a13144552
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Custom-field- 
> query-tf4596454.html#a13281583
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


Re: Custom field query

Posted by Gautham Pai <bu...@gmail.com>.
Hi,

I was able to solve the problem today. The problem was with the way I do the
indexing. I used UN_TOKENIZED while doing my indexing, because that is what
is mentioned in this tutorial:
http://wiki.apache.org/nutch/WritingPluginExample

I changed that to TOKENIZED and everything works fine now.

Why is it mentioned as TOKENIZED in the tutorial?

Jasper and Julien, thanks for the help.

Gautham.


Jasper Kamperman wrote:
> 
> Yup, the patch solved our problem. Actually it's more the other way  
> around, Julien Nioche published the patch as a result of solving our  
> problem :-).
> 
> Jasper
> 
> On Oct 10, 2007, at 1:53 PM, Gautham Pai wrote:
> 
>>
>> I see you had a similar issue here:
>> http://www.nabble.com/Field-based-search-on-metadata- 
>> tf4213684.html#a12045840
>>
>> Were you able to solve the problem? I am facing the exact same  
>> issue as is
>> mentioned in the thread.
>>
>> The problem of being able to query multiple fields using just one  
>> class is
>> secondary. I am right now trying to solve the basic problem of  
>> querying one
>> custom field at a time.
>>
>> Does the patch help me with this?
>>
>> Gautham.
>>
>> Jasper Kamperman wrote:
>>>
>>> You might want to check out this patch https://issues.apache.org/ 
>>> jira/
>>> browse/NUTCH-563 . From what I understand of your questions, it might
>>> help solve your issues.
>>>
>>> Jasper
>>>
>>> On Oct 10, 2007, at 9:08 AM, Milan Krendzelak wrote:
>>>
>>>> Hi Gautham,
>>>>
>>>> I am using Nutch 0.8 and implemented the new field to search in
>>>> according the plugin query-lang.
>>>> Try to do the same as query-lang, let's say just for testing...
>>>> Also don't forget to create new plugin.xml and define fields
>>>> parameter.
>>>> It works for me, and I think it should work for you too.
>>>>
>>>> BasicQueryFilter is used to query the index on different fields but
>>>> this the same Term
>>>> for example +(url:java anchor:java content:java title:java ...)
>>>> in your case, as I understand you want to query index with
>>>> different terms like: +author:Guatham +title:Nutch +description:Java
>>>> In this case you have to build you own query and when pass the
>>>> query as a parameter to search function ( for example in NutchBean )
>>>>
>>>> Actually you are right about the tutorial or documentation.
>>>> Compare to other Apache products, Nutch is really pure documented.
>>>> Thanks god we have this mailing list, otherwise I would be lost :-)
>>>>
>>>> Regards,
>>>> M
>>>>
>>>> Milan Krendzelak
>>>> Senior Software Developer
>>>>
>>>> mTLD Top Level Domain Limited is a private limited company
>>>> incorporated and registered in the Republic of Ireland with
>>>> registered number 398040 and registered office at Arthur Cox
>>>> Building, Earlsfort Terrace, Dublin 2
>>>>
>>>> ________________________________
>>>>
>>>> From: Gautham Pai [mailto:buzypi@gmail.com]
>>>> Sent: Wed 10/10/2007 16:24
>>>> To: nutch-user@lucene.apache.org
>>>> Subject: Re: Custom field query
>>>>
>>>>
>>>>
>>>>
>>>> Still, no luck. I am not able to search on a single field let alone
>>>> multiple
>>>> fields per class.
>>>>
>>>> I tried debugging the code and this is what I found:
>>>>
>>>> * I see the field  listed in the FIELD_NAMES HashSet in
>>>> QueryFilters.java.
>>>> * LuceneQueryOptimizer's method: optimize has a call to
>>>> searcher.search and
>>>> this returns no TopDocs in the case of author. If I do a search on
>>>> "url" it
>>>> works fine and I see results.
>>>> * I tried changing the boost value. No effect.
>>>>
>>>> The fields that I am searching on are not tokenized. I don't have  
>>>> any
>>>> analyzers defined. Is this a problem?
>>>>
>>>> What else could be wrong?
>>>>
>>>> Could this be a problem with Lucene or am I missing some
>>>> configuration?
>>>>
>>>> Thanks,
>>>> Gautham
>>>>
>>>> Sagar Naik-2 wrote:
>>>>>
>>>>> Hey,
>>>>> Pl see the answers to the questions below.
>>>>> Gautham Pai wrote:
>>>>>> I have seen this question being asked multiple times in this  
>>>>>> forum.
>>>>>> However
>>>>>> this has confused me more because each has its own approach to
>>>>>> solving
>>>>>> the
>>>>>> issue and no one has outlined the steps in one place. The
>>>>>> tutorials seem
>>>>>> to
>>>>>> be a bit outdated too.
>>>>>>
>>>>>> The version of Nutch I am using is 0.9.
>>>>>>
>>>>>> I have 3 custom fields that I have added via an IndexingFilter.  
>>>>>> The
>>>>>> fields
>>>>>> are: author, title and description. I now intend to provide
>>>>>> support for
>>>>>> querying these fields as:
>>>>>> author:Gautham
>>>>>> title:Nutch
>>>>>> etc.
>>>>>>
>>>>>> I added an Author class as follows:
>>>>>>
>>>>>> public class Author extends RawFieldQueryFilter {
>>>>>>      private Configuration conf;
>>>>>>
>>>>>>      public Author() {
>>>>>>              super("author", 5f);
>>>>>>      }
>>>>>>
>>>>>>      public void setConf(Configuration conf) {
>>>>>>              this.conf = conf;
>>>>>>      }
>>>>>>
>>>>>>      public Configuration getConf() {
>>>>>>              return this.conf;
>>>>>>      }
>>>>>> }
>>>>>>
>>>>>> and made an entry in plugin.xml as:
>>>>>>
>>>>>>  <extension id="query.Author"
>>>>>>               name="Author"
>>>>>>               point="org.apache.nutch.searcher.QueryFilter">
>>>>>>       <implementation id="Author"
>>>>>>                       class="query.Author">
>>>>>>              <parameter name="fields" value="author"/>
>>>>>>       </implementation>
>>>>>>    </extension>
>>>>>>
>>>>>> When I use NutchBean to perform the query, I see no results. I
>>>>>> also tried
>>>>>> changing the RawFieldQueryFilter to QueryFilter and following the
>>>>>> approach
>>>>>> used in the query-more plugin. It does not seem to work either.
>>>>>>
>>>>>> The questions I have specifically are:
>>>>>> * Do I need to create one class per custom field that I intend to
>>>>>> provide
>>>>>> support for query?
>>>>>>
>>>>> Generally, one class for all the custom fields is sufficient. In  
>>>>> your
>>>>> case too, u should be able to do with one class
>>>>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>>>>
>>>>> RawFieldQueryFilter implements  QueryFilter , So I would use
>>>>> RawfieldQueryFilter.
>>>>>> * Should I make an entry as: <parameter name="fields"
>>>>>> value="author"/> or
>>>>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>>>>
>>>>>>
>>>>> In your case,
>>>>>
>>>>> <parameter name="fields" value="author, title, description"/>
>>>>> should solve
>>>>> the problem.
>>>>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>>>>
>>>>>> Any help or pointers is greatly appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>> Gautham.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> This message has been scanned for viruses and
>>>>> dangerous content and is believed to be clean.
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Custom-field-
>>>> query-tf4596454.html#a13138143
>>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Custom-field- 
>> query-tf4596454.html#a13144552
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13281583
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Custom field query

Posted by Jasper Kamperman <ja...@openwaternet.com>.
Yup, the patch solved our problem. Actually it's more the other way  
around, Julien Nioche published the patch as a result of solving our  
problem :-).

Jasper

On Oct 10, 2007, at 1:53 PM, Gautham Pai wrote:

>
> I see you had a similar issue here:
> http://www.nabble.com/Field-based-search-on-metadata- 
> tf4213684.html#a12045840
>
> Were you able to solve the problem? I am facing the exact same  
> issue as is
> mentioned in the thread.
>
> The problem of being able to query multiple fields using just one  
> class is
> secondary. I am right now trying to solve the basic problem of  
> querying one
> custom field at a time.
>
> Does the patch help me with this?
>
> Gautham.
>
> Jasper Kamperman wrote:
>>
>> You might want to check out this patch https://issues.apache.org/ 
>> jira/
>> browse/NUTCH-563 . From what I understand of your questions, it might
>> help solve your issues.
>>
>> Jasper
>>
>> On Oct 10, 2007, at 9:08 AM, Milan Krendzelak wrote:
>>
>>> Hi Gautham,
>>>
>>> I am using Nutch 0.8 and implemented the new field to search in
>>> according the plugin query-lang.
>>> Try to do the same as query-lang, let's say just for testing...
>>> Also don't forget to create new plugin.xml and define fields
>>> parameter.
>>> It works for me, and I think it should work for you too.
>>>
>>> BasicQueryFilter is used to query the index on different fields but
>>> this the same Term
>>> for example +(url:java anchor:java content:java title:java ...)
>>> in your case, as I understand you want to query index with
>>> different terms like: +author:Guatham +title:Nutch +description:Java
>>> In this case you have to build you own query and when pass the
>>> query as a parameter to search function ( for example in NutchBean )
>>>
>>> Actually you are right about the tutorial or documentation.
>>> Compare to other Apache products, Nutch is really pure documented.
>>> Thanks god we have this mailing list, otherwise I would be lost :-)
>>>
>>> Regards,
>>> M
>>>
>>> Milan Krendzelak
>>> Senior Software Developer
>>>
>>> mTLD Top Level Domain Limited is a private limited company
>>> incorporated and registered in the Republic of Ireland with
>>> registered number 398040 and registered office at Arthur Cox
>>> Building, Earlsfort Terrace, Dublin 2
>>>
>>> ________________________________
>>>
>>> From: Gautham Pai [mailto:buzypi@gmail.com]
>>> Sent: Wed 10/10/2007 16:24
>>> To: nutch-user@lucene.apache.org
>>> Subject: Re: Custom field query
>>>
>>>
>>>
>>>
>>> Still, no luck. I am not able to search on a single field let alone
>>> multiple
>>> fields per class.
>>>
>>> I tried debugging the code and this is what I found:
>>>
>>> * I see the field  listed in the FIELD_NAMES HashSet in
>>> QueryFilters.java.
>>> * LuceneQueryOptimizer's method: optimize has a call to
>>> searcher.search and
>>> this returns no TopDocs in the case of author. If I do a search on
>>> "url" it
>>> works fine and I see results.
>>> * I tried changing the boost value. No effect.
>>>
>>> The fields that I am searching on are not tokenized. I don't have  
>>> any
>>> analyzers defined. Is this a problem?
>>>
>>> What else could be wrong?
>>>
>>> Could this be a problem with Lucene or am I missing some
>>> configuration?
>>>
>>> Thanks,
>>> Gautham
>>>
>>> Sagar Naik-2 wrote:
>>>>
>>>> Hey,
>>>> Pl see the answers to the questions below.
>>>> Gautham Pai wrote:
>>>>> I have seen this question being asked multiple times in this  
>>>>> forum.
>>>>> However
>>>>> this has confused me more because each has its own approach to
>>>>> solving
>>>>> the
>>>>> issue and no one has outlined the steps in one place. The
>>>>> tutorials seem
>>>>> to
>>>>> be a bit outdated too.
>>>>>
>>>>> The version of Nutch I am using is 0.9.
>>>>>
>>>>> I have 3 custom fields that I have added via an IndexingFilter.  
>>>>> The
>>>>> fields
>>>>> are: author, title and description. I now intend to provide
>>>>> support for
>>>>> querying these fields as:
>>>>> author:Gautham
>>>>> title:Nutch
>>>>> etc.
>>>>>
>>>>> I added an Author class as follows:
>>>>>
>>>>> public class Author extends RawFieldQueryFilter {
>>>>>      private Configuration conf;
>>>>>
>>>>>      public Author() {
>>>>>              super("author", 5f);
>>>>>      }
>>>>>
>>>>>      public void setConf(Configuration conf) {
>>>>>              this.conf = conf;
>>>>>      }
>>>>>
>>>>>      public Configuration getConf() {
>>>>>              return this.conf;
>>>>>      }
>>>>> }
>>>>>
>>>>> and made an entry in plugin.xml as:
>>>>>
>>>>>  <extension id="query.Author"
>>>>>               name="Author"
>>>>>               point="org.apache.nutch.searcher.QueryFilter">
>>>>>       <implementation id="Author"
>>>>>                       class="query.Author">
>>>>>              <parameter name="fields" value="author"/>
>>>>>       </implementation>
>>>>>    </extension>
>>>>>
>>>>> When I use NutchBean to perform the query, I see no results. I
>>>>> also tried
>>>>> changing the RawFieldQueryFilter to QueryFilter and following the
>>>>> approach
>>>>> used in the query-more plugin. It does not seem to work either.
>>>>>
>>>>> The questions I have specifically are:
>>>>> * Do I need to create one class per custom field that I intend to
>>>>> provide
>>>>> support for query?
>>>>>
>>>> Generally, one class for all the custom fields is sufficient. In  
>>>> your
>>>> case too, u should be able to do with one class
>>>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>>>
>>>> RawFieldQueryFilter implements  QueryFilter , So I would use
>>>> RawfieldQueryFilter.
>>>>> * Should I make an entry as: <parameter name="fields"
>>>>> value="author"/> or
>>>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>>>
>>>>>
>>>> In your case,
>>>>
>>>> <parameter name="fields" value="author, title, description"/>
>>>> should solve
>>>> the problem.
>>>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>>>
>>>>> Any help or pointers is greatly appreciated.
>>>>>
>>>>> Thanks,
>>>>> Gautham.
>>>>>
>>>>
>>>>
>>>> --
>>>> This message has been scanned for viruses and
>>>> dangerous content and is believed to be clean.
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Custom-field-
>>> query-tf4596454.html#a13138143
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Custom-field- 
> query-tf4596454.html#a13144552
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


Re: Custom field query

Posted by Gautham Pai <bu...@gmail.com>.
I see you had a similar issue here:
http://www.nabble.com/Field-based-search-on-metadata-tf4213684.html#a12045840

Were you able to solve the problem? I am facing the exact same issue as is
mentioned in the thread.

The problem of being able to query multiple fields using just one class is
secondary. I am right now trying to solve the basic problem of querying one
custom field at a time.

Does the patch help me with this?

Gautham.

Jasper Kamperman wrote:
> 
> You might want to check out this patch https://issues.apache.org/jira/ 
> browse/NUTCH-563 . From what I understand of your questions, it might  
> help solve your issues.
> 
> Jasper
> 
> On Oct 10, 2007, at 9:08 AM, Milan Krendzelak wrote:
> 
>> Hi Gautham,
>>
>> I am using Nutch 0.8 and implemented the new field to search in  
>> according the plugin query-lang.
>> Try to do the same as query-lang, let's say just for testing...
>> Also don't forget to create new plugin.xml and define fields  
>> parameter.
>> It works for me, and I think it should work for you too.
>>
>> BasicQueryFilter is used to query the index on different fields but  
>> this the same Term
>> for example +(url:java anchor:java content:java title:java ...)
>> in your case, as I understand you want to query index with  
>> different terms like: +author:Guatham +title:Nutch +description:Java
>> In this case you have to build you own query and when pass the  
>> query as a parameter to search function ( for example in NutchBean )
>>
>> Actually you are right about the tutorial or documentation.
>> Compare to other Apache products, Nutch is really pure documented.
>> Thanks god we have this mailing list, otherwise I would be lost :-)
>>
>> Regards,
>> M
>>
>> Milan Krendzelak
>> Senior Software Developer
>>
>> mTLD Top Level Domain Limited is a private limited company  
>> incorporated and registered in the Republic of Ireland with  
>> registered number 398040 and registered office at Arthur Cox  
>> Building, Earlsfort Terrace, Dublin 2
>>
>> ________________________________
>>
>> From: Gautham Pai [mailto:buzypi@gmail.com]
>> Sent: Wed 10/10/2007 16:24
>> To: nutch-user@lucene.apache.org
>> Subject: Re: Custom field query
>>
>>
>>
>>
>> Still, no luck. I am not able to search on a single field let alone  
>> multiple
>> fields per class.
>>
>> I tried debugging the code and this is what I found:
>>
>> * I see the field  listed in the FIELD_NAMES HashSet in  
>> QueryFilters.java.
>> * LuceneQueryOptimizer's method: optimize has a call to  
>> searcher.search and
>> this returns no TopDocs in the case of author. If I do a search on  
>> "url" it
>> works fine and I see results.
>> * I tried changing the boost value. No effect.
>>
>> The fields that I am searching on are not tokenized. I don't have any
>> analyzers defined. Is this a problem?
>>
>> What else could be wrong?
>>
>> Could this be a problem with Lucene or am I missing some  
>> configuration?
>>
>> Thanks,
>> Gautham
>>
>> Sagar Naik-2 wrote:
>>>
>>> Hey,
>>> Pl see the answers to the questions below.
>>> Gautham Pai wrote:
>>>> I have seen this question being asked multiple times in this forum.
>>>> However
>>>> this has confused me more because each has its own approach to  
>>>> solving
>>>> the
>>>> issue and no one has outlined the steps in one place. The  
>>>> tutorials seem
>>>> to
>>>> be a bit outdated too.
>>>>
>>>> The version of Nutch I am using is 0.9.
>>>>
>>>> I have 3 custom fields that I have added via an IndexingFilter. The
>>>> fields
>>>> are: author, title and description. I now intend to provide  
>>>> support for
>>>> querying these fields as:
>>>> author:Gautham
>>>> title:Nutch
>>>> etc.
>>>>
>>>> I added an Author class as follows:
>>>>
>>>> public class Author extends RawFieldQueryFilter {
>>>>      private Configuration conf;
>>>>
>>>>      public Author() {
>>>>              super("author", 5f);
>>>>      }
>>>>
>>>>      public void setConf(Configuration conf) {
>>>>              this.conf = conf;
>>>>      }
>>>>
>>>>      public Configuration getConf() {
>>>>              return this.conf;
>>>>      }
>>>> }
>>>>
>>>> and made an entry in plugin.xml as:
>>>>
>>>>  <extension id="query.Author"
>>>>               name="Author"
>>>>               point="org.apache.nutch.searcher.QueryFilter">
>>>>       <implementation id="Author"
>>>>                       class="query.Author">
>>>>              <parameter name="fields" value="author"/>
>>>>       </implementation>
>>>>    </extension>
>>>>
>>>> When I use NutchBean to perform the query, I see no results. I  
>>>> also tried
>>>> changing the RawFieldQueryFilter to QueryFilter and following the
>>>> approach
>>>> used in the query-more plugin. It does not seem to work either.
>>>>
>>>> The questions I have specifically are:
>>>> * Do I need to create one class per custom field that I intend to  
>>>> provide
>>>> support for query?
>>>>
>>> Generally, one class for all the custom fields is sufficient. In your
>>> case too, u should be able to do with one class
>>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>>
>>> RawFieldQueryFilter implements  QueryFilter , So I would use
>>> RawfieldQueryFilter.
>>>> * Should I make an entry as: <parameter name="fields"  
>>>> value="author"/> or
>>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>>
>>>>
>>> In your case,
>>>
>>> <parameter name="fields" value="author, title, description"/>  
>>> should solve
>>> the problem.
>>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>>
>>>> Any help or pointers is greatly appreciated.
>>>>
>>>> Thanks,
>>>> Gautham.
>>>>
>>>
>>>
>>> --
>>> This message has been scanned for viruses and
>>> dangerous content and is believed to be clean.
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/Custom-field- 
>> query-tf4596454.html#a13138143
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13144552
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Custom field query

Posted by Jasper Kamperman <ja...@openwaternet.com>.
You might want to check out this patch https://issues.apache.org/jira/ 
browse/NUTCH-563 . From what I understand of your questions, it might  
help solve your issues.

Jasper

On Oct 10, 2007, at 9:08 AM, Milan Krendzelak wrote:

> Hi Gautham,
>
> I am using Nutch 0.8 and implemented the new field to search in  
> according the plugin query-lang.
> Try to do the same as query-lang, let's say just for testing...
> Also don't forget to create new plugin.xml and define fields  
> parameter.
> It works for me, and I think it should work for you too.
>
> BasicQueryFilter is used to query the index on different fields but  
> this the same Term
> for example +(url:java anchor:java content:java title:java ...)
> in your case, as I understand you want to query index with  
> different terms like: +author:Guatham +title:Nutch +description:Java
> In this case you have to build you own query and when pass the  
> query as a parameter to search function ( for example in NutchBean )
>
> Actually you are right about the tutorial or documentation.
> Compare to other Apache products, Nutch is really pure documented.
> Thanks god we have this mailing list, otherwise I would be lost :-)
>
> Regards,
> M
>
> Milan Krendzelak
> Senior Software Developer
>
> mTLD Top Level Domain Limited is a private limited company  
> incorporated and registered in the Republic of Ireland with  
> registered number 398040 and registered office at Arthur Cox  
> Building, Earlsfort Terrace, Dublin 2
>
> ________________________________
>
> From: Gautham Pai [mailto:buzypi@gmail.com]
> Sent: Wed 10/10/2007 16:24
> To: nutch-user@lucene.apache.org
> Subject: Re: Custom field query
>
>
>
>
> Still, no luck. I am not able to search on a single field let alone  
> multiple
> fields per class.
>
> I tried debugging the code and this is what I found:
>
> * I see the field  listed in the FIELD_NAMES HashSet in  
> QueryFilters.java.
> * LuceneQueryOptimizer's method: optimize has a call to  
> searcher.search and
> this returns no TopDocs in the case of author. If I do a search on  
> "url" it
> works fine and I see results.
> * I tried changing the boost value. No effect.
>
> The fields that I am searching on are not tokenized. I don't have any
> analyzers defined. Is this a problem?
>
> What else could be wrong?
>
> Could this be a problem with Lucene or am I missing some  
> configuration?
>
> Thanks,
> Gautham
>
> Sagar Naik-2 wrote:
>>
>> Hey,
>> Pl see the answers to the questions below.
>> Gautham Pai wrote:
>>> I have seen this question being asked multiple times in this forum.
>>> However
>>> this has confused me more because each has its own approach to  
>>> solving
>>> the
>>> issue and no one has outlined the steps in one place. The  
>>> tutorials seem
>>> to
>>> be a bit outdated too.
>>>
>>> The version of Nutch I am using is 0.9.
>>>
>>> I have 3 custom fields that I have added via an IndexingFilter. The
>>> fields
>>> are: author, title and description. I now intend to provide  
>>> support for
>>> querying these fields as:
>>> author:Gautham
>>> title:Nutch
>>> etc.
>>>
>>> I added an Author class as follows:
>>>
>>> public class Author extends RawFieldQueryFilter {
>>>      private Configuration conf;
>>>
>>>      public Author() {
>>>              super("author", 5f);
>>>      }
>>>
>>>      public void setConf(Configuration conf) {
>>>              this.conf = conf;
>>>      }
>>>
>>>      public Configuration getConf() {
>>>              return this.conf;
>>>      }
>>> }
>>>
>>> and made an entry in plugin.xml as:
>>>
>>>  <extension id="query.Author"
>>>               name="Author"
>>>               point="org.apache.nutch.searcher.QueryFilter">
>>>       <implementation id="Author"
>>>                       class="query.Author">
>>>              <parameter name="fields" value="author"/>
>>>       </implementation>
>>>    </extension>
>>>
>>> When I use NutchBean to perform the query, I see no results. I  
>>> also tried
>>> changing the RawFieldQueryFilter to QueryFilter and following the
>>> approach
>>> used in the query-more plugin. It does not seem to work either.
>>>
>>> The questions I have specifically are:
>>> * Do I need to create one class per custom field that I intend to  
>>> provide
>>> support for query?
>>>
>> Generally, one class for all the custom fields is sufficient. In your
>> case too, u should be able to do with one class
>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>
>> RawFieldQueryFilter implements  QueryFilter , So I would use
>> RawfieldQueryFilter.
>>> * Should I make an entry as: <parameter name="fields"  
>>> value="author"/> or
>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>
>>>
>> In your case,
>>
>> <parameter name="fields" value="author, title, description"/>  
>> should solve
>> the problem.
>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>
>>> Any help or pointers is greatly appreciated.
>>>
>>> Thanks,
>>> Gautham.
>>>
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content and is believed to be clean.
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Custom-field- 
> query-tf4596454.html#a13138143
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>


RE: Custom field query

Posted by Gautham Pai <bu...@gmail.com>.
I guess you are talking about language-identifier plugin. It might seem
strange, but even this one is not working.

I tried using Nutch 0.8 in order to be on the same page, but I am still
facing the problem. I see a field called 'lang' in the index, but cannot
query on it.

Until now, the only field on which I was able to query is 'url'.

Here are the steps I followed to use the language-identifier plugin:
* I set the plugin to be used in conf/nutch-site.xml at: plugin.includes.
<property>
	<name>plugin.includes</name> 

<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(html|xml|text|js|pdf)|index-(basic)|query-(basic|site|url|more)|scoring-opic|language-identifier</value>
</property> 
* I run the crawl using bin/nutch crawl.
* I verify that the field has been added by using Luke and see that all
documents have the 'lang' field in them.
* I run a query on the url field and verify that there are results.
* I run a query on the 'lang' field. No results. :(

Thanks for helping out.

Gautham.


Milan Krendzelak wrote:
> 
> Hi Gautham,
>  
> I am using Nutch 0.8 and implemented the new field to search in according
> the plugin query-lang.
> Try to do the same as query-lang, let's say just for testing...
> Also don't forget to create new plugin.xml and define fields parameter.
> It works for me, and I think it should work for you too.
>  
> BasicQueryFilter is used to query the index on different fields but this
> the same Term 
> for example +(url:java anchor:java content:java title:java ...)
> in your case, as I understand you want to query index with different terms
> like: +author:Guatham +title:Nutch +description:Java
> In this case you have to build you own query and when pass the query as a
> parameter to search function ( for example in NutchBean )
>  
> Actually you are right about the tutorial or documentation.
> Compare to other Apache products, Nutch is really pure documented.
> Thanks god we have this mailing list, otherwise I would be lost :-)
>  
> Regards,
> M
>  
> Milan Krendzelak
> Senior Software Developer
>  
> mTLD Top Level Domain Limited is a private limited company incorporated
> and registered in the Republic of Ireland with registered number 398040
> and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
> 
> ________________________________
> 
> From: Gautham Pai [mailto:buzypi@gmail.com]
> Sent: Wed 10/10/2007 16:24
> To: nutch-user@lucene.apache.org
> Subject: Re: Custom field query
> 
> 
> 
> 
> Still, no luck. I am not able to search on a single field let alone
> multiple
> fields per class.
> 
> I tried debugging the code and this is what I found:
> 
> * I see the field  listed in the FIELD_NAMES HashSet in QueryFilters.java.
> * LuceneQueryOptimizer's method: optimize has a call to searcher.search
> and
> this returns no TopDocs in the case of author. If I do a search on "url"
> it
> works fine and I see results.
> * I tried changing the boost value. No effect.
> 
> The fields that I am searching on are not tokenized. I don't have any
> analyzers defined. Is this a problem?
> 
> What else could be wrong?
> 
> Could this be a problem with Lucene or am I missing some configuration?
> 
> Thanks,
> Gautham
> 
> Sagar Naik-2 wrote:
>>
>> Hey,
>> Pl see the answers to the questions below.
>> Gautham Pai wrote:
>>> I have seen this question being asked multiple times in this forum.
>>> However
>>> this has confused me more because each has its own approach to solving
>>> the
>>> issue and no one has outlined the steps in one place. The tutorials seem
>>> to
>>> be a bit outdated too.
>>>
>>> The version of Nutch I am using is 0.9.
>>>
>>> I have 3 custom fields that I have added via an IndexingFilter. The
>>> fields
>>> are: author, title and description. I now intend to provide support for
>>> querying these fields as:
>>> author:Gautham
>>> title:Nutch
>>> etc.
>>>
>>> I added an Author class as follows:
>>>
>>> public class Author extends RawFieldQueryFilter {
>>>      private Configuration conf;
>>>
>>>      public Author() {
>>>              super("author", 5f);
>>>      }
>>>
>>>      public void setConf(Configuration conf) {
>>>              this.conf = conf;
>>>      }
>>>
>>>      public Configuration getConf() {
>>>              return this.conf;
>>>      }
>>> }
>>>
>>> and made an entry in plugin.xml as:
>>>
>>>  <extension id="query.Author"
>>>               name="Author"
>>>               point="org.apache.nutch.searcher.QueryFilter">
>>>       <implementation id="Author"
>>>                       class="query.Author">
>>>              <parameter name="fields" value="author"/>
>>>       </implementation>
>>>    </extension>
>>>
>>> When I use NutchBean to perform the query, I see no results. I also
>>> tried
>>> changing the RawFieldQueryFilter to QueryFilter and following the
>>> approach
>>> used in the query-more plugin. It does not seem to work either.
>>>
>>> The questions I have specifically are:
>>> * Do I need to create one class per custom field that I intend to
>>> provide
>>> support for query?
>>>  
>> Generally, one class for all the custom fields is sufficient. In your
>> case too, u should be able to do with one class
>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>  
>> RawFieldQueryFilter implements  QueryFilter , So I would use
>> RawfieldQueryFilter.
>>> * Should I make an entry as: <parameter name="fields" value="author"/>
>>> or
>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>
>>>  
>> In your case,
>>
>> <parameter name="fields" value="author, title, description"/> should
>> solve
>> the problem.
>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>
>>> Any help or pointers is greatly appreciated.
>>>
>>> Thanks,
>>> Gautham.
>>>  
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content and is believed to be clean.
>>
>>
>>
> 
> --
> View this message in context:
> http://www.nabble.com/Custom-field-query-tf4596454.html#a13138143
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13144511
Sent from the Nutch - User mailing list archive at Nabble.com.


RE: Custom field query

Posted by Milan Krendzelak <mk...@mtld.mobi>.
Hi Gautham,
 
I am using Nutch 0.8 and implemented the new field to search in according the plugin query-lang.
Try to do the same as query-lang, let's say just for testing...
Also don't forget to create new plugin.xml and define fields parameter.
It works for me, and I think it should work for you too.
 
BasicQueryFilter is used to query the index on different fields but this the same Term 
for example +(url:java anchor:java content:java title:java ...)
in your case, as I understand you want to query index with different terms like: +author:Guatham +title:Nutch +description:Java
In this case you have to build you own query and when pass the query as a parameter to search function ( for example in NutchBean )
 
Actually you are right about the tutorial or documentation.
Compare to other Apache products, Nutch is really pure documented.
Thanks god we have this mailing list, otherwise I would be lost :-)
 
Regards,
M
 
Milan Krendzelak
Senior Software Developer
 
mTLD Top Level Domain Limited is a private limited company incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2

________________________________

From: Gautham Pai [mailto:buzypi@gmail.com]
Sent: Wed 10/10/2007 16:24
To: nutch-user@lucene.apache.org
Subject: Re: Custom field query




Still, no luck. I am not able to search on a single field let alone multiple
fields per class.

I tried debugging the code and this is what I found:

* I see the field  listed in the FIELD_NAMES HashSet in QueryFilters.java.
* LuceneQueryOptimizer's method: optimize has a call to searcher.search and
this returns no TopDocs in the case of author. If I do a search on "url" it
works fine and I see results.
* I tried changing the boost value. No effect.

The fields that I am searching on are not tokenized. I don't have any
analyzers defined. Is this a problem?

What else could be wrong?

Could this be a problem with Lucene or am I missing some configuration?

Thanks,
Gautham

Sagar Naik-2 wrote:
>
> Hey,
> Pl see the answers to the questions below.
> Gautham Pai wrote:
>> I have seen this question being asked multiple times in this forum.
>> However
>> this has confused me more because each has its own approach to solving
>> the
>> issue and no one has outlined the steps in one place. The tutorials seem
>> to
>> be a bit outdated too.
>>
>> The version of Nutch I am using is 0.9.
>>
>> I have 3 custom fields that I have added via an IndexingFilter. The
>> fields
>> are: author, title and description. I now intend to provide support for
>> querying these fields as:
>> author:Gautham
>> title:Nutch
>> etc.
>>
>> I added an Author class as follows:
>>
>> public class Author extends RawFieldQueryFilter {
>>      private Configuration conf;
>>
>>      public Author() {
>>              super("author", 5f);
>>      }
>>
>>      public void setConf(Configuration conf) {
>>              this.conf = conf;
>>      }
>>
>>      public Configuration getConf() {
>>              return this.conf;
>>      }
>> }
>>
>> and made an entry in plugin.xml as:
>>
>>  <extension id="query.Author"
>>               name="Author"
>>               point="org.apache.nutch.searcher.QueryFilter">
>>       <implementation id="Author"
>>                       class="query.Author">
>>              <parameter name="fields" value="author"/>
>>       </implementation>
>>    </extension>
>>
>> When I use NutchBean to perform the query, I see no results. I also tried
>> changing the RawFieldQueryFilter to QueryFilter and following the
>> approach
>> used in the query-more plugin. It does not seem to work either.
>>
>> The questions I have specifically are:
>> * Do I need to create one class per custom field that I intend to provide
>> support for query?
>>  
> Generally, one class for all the custom fields is sufficient. In your
> case too, u should be able to do with one class
>> * Should I use RawFieldQueryFilter or QueryFilter?
>>  
> RawFieldQueryFilter implements  QueryFilter , So I would use
> RawfieldQueryFilter.
>> * Should I make an entry as: <parameter name="fields" value="author"/> or
>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>
>>  
> In your case,
>
> <parameter name="fields" value="author, title, description"/> should solve
> the problem.
> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>
>> Any help or pointers is greatly appreciated.
>>
>> Thanks,
>> Gautham.
>>  
>
>
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
>
>
>

--
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13138143
Sent from the Nutch - User mailing list archive at Nabble.com.




Re: Custom field query

Posted by Gautham Pai <bu...@gmail.com>.
Still, no luck. I am not able to search on a single field let alone multiple
fields per class.

I tried debugging the code and this is what I found:

* I see the field  listed in the FIELD_NAMES HashSet in QueryFilters.java.
* LuceneQueryOptimizer's method: optimize has a call to searcher.search and
this returns no TopDocs in the case of author. If I do a search on "url" it
works fine and I see results.
* I tried changing the boost value. No effect.

The fields that I am searching on are not tokenized. I don't have any
analyzers defined. Is this a problem?

What else could be wrong?

Could this be a problem with Lucene or am I missing some configuration?

Thanks,
Gautham

Sagar Naik-2 wrote:
> 
> Hey,
> Pl see the answers to the questions below.
> Gautham Pai wrote:
>> I have seen this question being asked multiple times in this forum.
>> However
>> this has confused me more because each has its own approach to solving
>> the
>> issue and no one has outlined the steps in one place. The tutorials seem
>> to
>> be a bit outdated too.
>>
>> The version of Nutch I am using is 0.9.
>>
>> I have 3 custom fields that I have added via an IndexingFilter. The
>> fields
>> are: author, title and description. I now intend to provide support for
>> querying these fields as:
>> author:Gautham
>> title:Nutch
>> etc.
>>
>> I added an Author class as follows:
>>
>> public class Author extends RawFieldQueryFilter {
>> 	private Configuration conf;
>>
>> 	public Author() {
>> 		super("author", 5f);
>> 	}
>>
>> 	public void setConf(Configuration conf) {
>> 		this.conf = conf;
>> 	}
>>
>> 	public Configuration getConf() {
>> 		return this.conf;
>> 	}
>> }
>>
>> and made an entry in plugin.xml as:
>>
>>  <extension id="query.Author"
>>               name="Author"
>>               point="org.apache.nutch.searcher.QueryFilter">
>>       <implementation id="Author"
>>                       class="query.Author">
>>              <parameter name="fields" value="author"/>
>>       </implementation>
>>    </extension>
>>
>> When I use NutchBean to perform the query, I see no results. I also tried
>> changing the RawFieldQueryFilter to QueryFilter and following the
>> approach
>> used in the query-more plugin. It does not seem to work either.
>>
>> The questions I have specifically are:
>> * Do I need to create one class per custom field that I intend to provide
>> support for query?
>>   
> Generally, one class for all the custom fields is sufficient. In your 
> case too, u should be able to do with one class
>> * Should I use RawFieldQueryFilter or QueryFilter?
>>   
> RawFieldQueryFilter implements  QueryFilter , So I would use 
> RawfieldQueryFilter.
>> * Should I make an entry as: <parameter name="fields" value="author"/> or
>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>
>>   
> In your case,
> 
> <parameter name="fields" value="author, title, description"/> should solve
> the problem.
> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
> 
>> Any help or pointers is greatly appreciated.
>>
>> Thanks,
>> Gautham.
>>   
> 
> 
> -- 
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Custom-field-query-tf4596454.html#a13138143
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Custom field query

Posted by Sagar Naik <sa...@visvo.com>.
Hey,
Pl see the answers to the questions below.
Gautham Pai wrote:
> I have seen this question being asked multiple times in this forum. However
> this has confused me more because each has its own approach to solving the
> issue and no one has outlined the steps in one place. The tutorials seem to
> be a bit outdated too.
>
> The version of Nutch I am using is 0.9.
>
> I have 3 custom fields that I have added via an IndexingFilter. The fields
> are: author, title and description. I now intend to provide support for
> querying these fields as:
> author:Gautham
> title:Nutch
> etc.
>
> I added an Author class as follows:
>
> public class Author extends RawFieldQueryFilter {
> 	private Configuration conf;
>
> 	public Author() {
> 		super("author", 5f);
> 	}
>
> 	public void setConf(Configuration conf) {
> 		this.conf = conf;
> 	}
>
> 	public Configuration getConf() {
> 		return this.conf;
> 	}
> }
>
> and made an entry in plugin.xml as:
>
>  <extension id="query.Author"
>               name="Author"
>               point="org.apache.nutch.searcher.QueryFilter">
>       <implementation id="Author"
>                       class="query.Author">
>              <parameter name="fields" value="author"/>
>       </implementation>
>    </extension>
>
> When I use NutchBean to perform the query, I see no results. I also tried
> changing the RawFieldQueryFilter to QueryFilter and following the approach
> used in the query-more plugin. It does not seem to work either.
>
> The questions I have specifically are:
> * Do I need to create one class per custom field that I intend to provide
> support for query?
>   
Generally, one class for all the custom fields is sufficient. In your 
case too, u should be able to do with one class
> * Should I use RawFieldQueryFilter or QueryFilter?
>   
RawFieldQueryFilter implements  QueryFilter , So I would use 
RawfieldQueryFilter.
> * Should I make an entry as: <parameter name="fields" value="author"/> or
> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>
>   
In your case,

<parameter name="fields" value="author, title, description"/> should solve the problem.
Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.

> Any help or pointers is greatly appreciated.
>
> Thanks,
> Gautham.
>   


-- 
This message has been scanned for viruses and
dangerous content and is believed to be clean.