You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Anupkumar Putane <An...@caritor.com> on 2005/10/27 16:15:10 UTC

Issue in executing multi-field search in Nutch

Dear Nutch-users,

We are looking for some assistance on getting Nutch 0.7.1 to work in a 
Multilingual website, where we are building support for English and French 
content.

The issue is that we are not able to use the "lang" field of the indexed 
documents to search and retrieve language specific results. 

We have done this so far
1.  added the lang="fr" in the Meta tag of French web pages
2.  Activated the language-identifier plugin by adding it in the list of 
values of plug-includes property in Nuch configuration - nutch-default.xml 
(we do not have any overriding config in nutch-site.xml)
3.  Indexed the website and verified the indexed documents to contain the 
'lang' field with appropriate value.  Verification was done using Luke.
4.  Using Search facility in Luke, we verified retrieval of language 
specific documents by using multi-field query like 
"content:"service"+lang:fr"

But, when we tried to do the same with Nutch using an URL, results are not 
being retrieved.  We tried the following URL formats
1.  /search.jsp?query=content:"service"+lang:fr  (Non-encoded url, with 
all field names specified)
2.  /search.jsp?query=content%3A%22service%22%2Blang%3Afr  (Encoded url, 
with all field names specified)
3.  /search.jsp?query="service"+lang:fr (we did not specify the content 
field as it is the default field)
4.  /search.jsp?query=%22service%22%2Blang%3Afr

In all these cases, the entire query string was being considered as the 
searchable phrase, instead of differentiating the fields.

We also noticed that Nutch was not responding to taking any multifield 
queries (query-basic plugin continues to be included in the configuration)

We would appreciate if you can let us know 
1.  if multi-field queries are supported by Nutch 0.7.1 and if so, suggest 
any corrections to our setup
2.  If multi-field queries are not supported, how would you advise us 
build the functionality - rebuild Nutch code / any existing tool, maybe 
NutchWax??.

Thanks for your time,
Anup