You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/10/27 15:55:31 UTC

Re: [Nutch-general] Issue in executing multi-field search in Nutch

You mentioned that the lang tag is setted correctly, right?
So than you may only miss to have the language query filter plugin  
installed in your war.
Check logs if the lang query filter plugin is installed correctly,  
turn it on if not - that should solve your problem.
HTH
Stefan

P.S. language tagging and search filtering works fine for me...

Verify that you have
Am 27.10.2005 um 15:45 schrieb Anupkumar Putane:

>
> Dear Nutch-users,
>
> We are looking for some assistance on getting Nutch 0.7.1 to work  
> in a Multilingual website, where we are building support for  
> English and French content.
>
> The issue is that we are not able to use the "lang" field of the  
> indexed documents to search and retrieve language specific results.
>
> We have done this so far
> 1.  added the lang="fr" in the Meta tag of French web pages
> 2.  Activated the language-identifier plugin by adding it in the  
> list of values of plug-includes property in Nuch configuration -  
> nutch-default.xml (we do not have any overriding config in nutch- 
> site.xml)
> 3.  Indexed the website and verified the indexed documents to  
> contain the 'lang' field with appropriate value.  Verification was  
> done using Luke.
> 4.  Using Search facility in Luke, we verified retrieval of  
> language specific documents by using multi-field query like  
> "content:"service"+lang:fr"
>
> But, when we tried to do the same with Nutch using an URL, results  
> are not being retrieved.  We tried the following URL formats
> 1.  /search.jsp?query=content:"service"+lang:fr  (Non-encoded url,  
> with all field names specified)
> 2.  /search.jsp?query=content%3A%22service%22%2Blang%3Afr  (Encoded  
> url, with all field names specified)
> 3.  /search.jsp?query="service"+lang:fr (we did not specify the  
> content field as it is the default field)
> 4.  /search.jsp?query=%22service%22%2Blang%3Afr
>
> In all these cases, the entire query string was being considered as  
> the searchable phrase, instead of differentiating the fields.
>
> We also noticed that Nutch was not responding to taking any  
> multifield queries (query-basic plugin continues to be included in  
> the configuration)
>
> We would appreciate if you can let us know
> 1.  if multi-field queries are supported by Nutch 0.7.1 and if so,  
> suggest any corrections to our setup
> 2.  If multi-field queries are not supported, how would you advise  
> us build the functionality - rebuild Nutch code / any existing  
> tool, maybe NutchWax??.
>
> Thanks for your assistance,
> Anup


Re: [Nutch-general] Issue in executing multi-field search in Nutch

Posted by Anupkumar Putane <An...@caritor.com>.
Hi,

Thanks for the pointers.  We had indeed missed out adding the 
language-identifier plugin to the war file.  Problem resolved by adding 
it.

Best Regards,
Anup




Jérôme Charron <je...@gmail.com> 
10/27/2005 08:40 PM
Please respond to
nutch-user@lucene.apache.org


To
nutch-user@lucene.apache.org
cc

Subject
Re: [Nutch-general] Issue in executing multi-field search in Nutch






As mentionned previously to Deepa Devanathan in the previous mail thread
"Search only in one language", I tried to reproduce your problem, but all 
is
working fine for me too.
Something that I understand in your mail is that you try to directly acces
the URL (not using the search jsp page).
So, in such a case, the good URL encoding is:
/search.jsp?query=service+lang%3Afr

Regards

Jérôme

On 10/27/05, Stefan Groschupf <sg@media-style.com > wrote:
>
> You mentioned that the lang tag is setted correctly, right?
> So than you may only miss to have the language query filter plugin
> installed in your war.
> Check logs if the lang query filter plugin is installed correctly,
> turn it on if not - that should solve your problem.
> HTH
> Stefan
>
> P.S. language tagging and search filtering works fine for me...
>
> Verify that you have
> Am 27.10.2005 um 15:45 schrieb Anupkumar Putane:
>
> >
> > Dear Nutch-users,
> >
> > We are looking for some assistance on getting Nutch 0.7.1 to work
> > in a Multilingual website, where we are building support for
> > English and French content.
> >
> > The issue is that we are not able to use the "lang" field of the
> > indexed documents to search and retrieve language specific results.
> >
> > We have done this so far
> > 1. added the lang="fr" in the Meta tag of French web pages
> > 2. Activated the language-identifier plugin by adding it in the
> > list of values of plug-includes property in Nuch configuration -
> > nutch-default.xml (we do not have any overriding config in nutch-
> > site.xml)
> > 3. Indexed the website and verified the indexed documents to
> > contain the 'lang' field with appropriate value. Verification was
> > done using Luke.
> > 4. Using Search facility in Luke, we verified retrieval of
> > language specific documents by using multi-field query like
> > "content:"service"+lang:fr"
> >
> > But, when we tried to do the same with Nutch using an URL, results
> > are not being retrieved. We tried the following URL formats
> > 1. /search.jsp?query=content:"service"+lang:fr (Non-encoded url,
> > with all field names specified)
> > 2. /search.jsp?query=content%3A%22service%22%2Blang%3Afr (Encoded
> > url, with all field names specified)
> > 3. /search.jsp?query="service"+lang:fr (we did not specify the
> > content field as it is the default field)
> > 4. /search.jsp?query=%22service%22%2Blang%3Afr
> >
> > In all these cases, the entire query string was being considered as
> > the searchable phrase, instead of differentiating the fields.
> >
> > We also noticed that Nutch was not responding to taking any
> > multifield queries (query-basic plugin continues to be included in
> > the configuration)
> >
> > We would appreciate if you can let us know
> > 1. if multi-field queries are supported by Nutch 0.7.1 and if so,
> > suggest any corrections to our setup
> > 2. If multi-field queries are not supported, how would you advise
> > us build the functionality - rebuild Nutch code / any existing
> > tool, maybe NutchWax??.
> >
> > Thanks for your assistance,
> > Anup
>
>
>


--
http://motrech.free.fr/
http://www.frutch.org/


Re: [Nutch-general] Issue in executing multi-field search in Nutch

Posted by Jérôme Charron <je...@gmail.com>.
As mentionned previously to Deepa Devanathan in the previous mail thread
"Search only in one language", I tried to reproduce your problem, but all is
working fine for me too.
Something that I understand in your mail is that you try to directly acces
the URL (not using the search jsp page).
So, in such a case, the good URL encoding is:
/search.jsp?query=service+lang%3Afr

Regards

Jérôme

On 10/27/05, Stefan Groschupf <sg@media-style.com > wrote:
>
> You mentioned that the lang tag is setted correctly, right?
> So than you may only miss to have the language query filter plugin
> installed in your war.
> Check logs if the lang query filter plugin is installed correctly,
> turn it on if not - that should solve your problem.
> HTH
> Stefan
>
> P.S. language tagging and search filtering works fine for me...
>
> Verify that you have
> Am 27.10.2005 um 15:45 schrieb Anupkumar Putane:
>
> >
> > Dear Nutch-users,
> >
> > We are looking for some assistance on getting Nutch 0.7.1 to work
> > in a Multilingual website, where we are building support for
> > English and French content.
> >
> > The issue is that we are not able to use the "lang" field of the
> > indexed documents to search and retrieve language specific results.
> >
> > We have done this so far
> > 1. added the lang="fr" in the Meta tag of French web pages
> > 2. Activated the language-identifier plugin by adding it in the
> > list of values of plug-includes property in Nuch configuration -
> > nutch-default.xml (we do not have any overriding config in nutch-
> > site.xml)
> > 3. Indexed the website and verified the indexed documents to
> > contain the 'lang' field with appropriate value. Verification was
> > done using Luke.
> > 4. Using Search facility in Luke, we verified retrieval of
> > language specific documents by using multi-field query like
> > "content:"service"+lang:fr"
> >
> > But, when we tried to do the same with Nutch using an URL, results
> > are not being retrieved. We tried the following URL formats
> > 1. /search.jsp?query=content:"service"+lang:fr (Non-encoded url,
> > with all field names specified)
> > 2. /search.jsp?query=content%3A%22service%22%2Blang%3Afr (Encoded
> > url, with all field names specified)
> > 3. /search.jsp?query="service"+lang:fr (we did not specify the
> > content field as it is the default field)
> > 4. /search.jsp?query=%22service%22%2Blang%3Afr
> >
> > In all these cases, the entire query string was being considered as
> > the searchable phrase, instead of differentiating the fields.
> >
> > We also noticed that Nutch was not responding to taking any
> > multifield queries (query-basic plugin continues to be included in
> > the configuration)
> >
> > We would appreciate if you can let us know
> > 1. if multi-field queries are supported by Nutch 0.7.1 and if so,
> > suggest any corrections to our setup
> > 2. If multi-field queries are not supported, how would you advise
> > us build the functionality - rebuild Nutch code / any existing
> > tool, maybe NutchWax??.
> >
> > Thanks for your assistance,
> > Anup
>
>
>


--
http://motrech.free.fr/
http://www.frutch.org/