You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Fr...@bnc.ca on 2006/12/13 17:21:54 UTC

NUTCH 0.8.1: Difficulties with Analyzers

I am having a hardtime implementing the French Analyzer... Any help with 
be immensely appreciated. Here are the details, first I tried with the 
official plugin (analysis-fr), and then I tried another one 
(analyze-french).

1- ATTEMPT TO USE THE PLUGIN: analysis-fr (by Jérôme)

-I get the following in the logs so I know the plugin gets loaded.
 [12/13/06 10:38:48:417 EST]  176a8c8 PluginReposit I 
org.apache.nutch.plugin.PluginRepository   French Analysis Plug-in 
(analysis-fr)

- When performing an actual search, if it's done in french I get 
absolutely no results.
[12/13/06 10:39:20:134 EST]   fa0bf4 NutchBean     I 
org.apache.nutch.searcher.NutchBean  query request from 10.133.35.153
[12/13/06 10:39:20:137 EST]   fa0bf4 NutchBean     I 
org.apache.nutch.searcher.NutchBean  query: fr?quentes
[12/13/06 10:39:20:138 EST]   fa0bf4 NutchBean     I 
org.apache.nutch.searcher.NutchBean  lang: en
[12/13/06 10:39:20:139 EST]   fa0bf4 NutchBean     I 
org.apache.nutch.searcher.NutchBean  searching for 20 raw hits
[12/13/06 10:39:20:145 EST]   fa0bf4 NutchBean     I 
org.apache.nutch.searcher.NutchBean  total hits: 1
[12/13/06 10:39:28:406 EST]  176a8c8 NutchBean     I 
org.apache.nutch.searcher.NutchBean  query request from 10.133.35.153
[12/13/06 10:39:28:409 EST]  176a8c8 NutchBean     I 
org.apache.nutch.searcher.NutchBean  query: fr?quentes
[12/13/06 10:39:28:411 EST]  176a8c8 NutchBean     I 
org.apache.nutch.searcher.NutchBean  lang: fr
[12/13/06 10:39:28:413 EST]  176a8c8 NutchBean     I 
org.apache.nutch.searcher.NutchBean  searching for 20 raw hits
[12/13/06 10:39:28:421 EST]  176a8c8 NutchBean     I 
org.apache.nutch.searcher.NutchBean  total hits: 0


2- ATTEMPT TO USE THE PLUGIN: analyze-french (by Christophe)

I get a different error with that one:
2006-12-13 11:14:48,891 DEBUG plugin.PluginRepository - parsing: 
/opt/Nutch-0.8/plugins/analyze-french/plugin.xml
2006-12-13 11:14:48,898 DEBUG plugin.PluginRepository - plugin: 
id=analyze-french name=French Analyzer Filter version=1.0.0 
provider=nutch.orgclass=null
2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl: 
point=org.apache.nutch.searcher.QueryFilter 
class=org.apache.nutch.searcher.more.TypeQueryFilter
2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl: 
point=org.apache.nutch.searcher.QueryFilter 
class=org.apache.nutch.searcher.more.DateQueryFilter
Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:108)
        at 
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
        at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:180)
        at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:173)
        at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:389)

Thank you in advance !!!
___________________________________________
François McNeil


Réf. : Re: NUTCH 0.8.1: Difficulties with Analyzers

Posted by Fr...@bnc.ca.
Hello Jérôme, merci beaucoup for getting back to me. 

Here are the answers:

>1. does the documents were indexed with the french analyzer activated?
 
        Yes, my hadoop-site.xml (in /opt/nutch-0.8/conf/) contained the 
following plugin.includes before I crawled/indexed the site.
        Also, the file nutch-site.xml within the webapp's WEB-INF/classes/ 
folder contains the same plugin.includes.

nutch-extensionpoints|language-identifier|lib-lucene-analyzers|protocol-httpclient|urlfilter-regex|parse-(text|
pdf|msword|html)|index-basic|analysis-fr|query-(basic|site|url)|summary-basic|scoring-opic

>2. could you perform a search with a non-accentuated query?

        Yes, a non-accentuated search returns adequate results. 
        In addition, if I perform an accentuated search from a different 
locale (i.e: a different location than http://myhost:8080/nutch/fr), I do 
get adequate results.

___________________________________________
François McNeil





"Jérôme Charron" <je...@gmail.com>
2006-12-13 17:01
Veuillez répondre à nutch-user
 
        Pour :  nutch-user@lucene.apache.org
        cc : 
        Objet : Re: NUTCH 0.8.1: Difficulties with Analyzers

> org.apache.nutch.searcher.NutchBean  query: fr?quentes

François, two basic points I would like to check first:
1. does the documents were indexed with the french analyzer activated?
2. could you perform a search with a non-accentuated query?

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/


Re: NUTCH 0.8.1: Difficulties with Analyzers

Posted by Jérôme Charron <je...@gmail.com>.
> org.apache.nutch.searcher.NutchBean  query: fr?quentes

François, two basic points I would like to check first:
1. does the documents were indexed with the french analyzer activated?
2. could you perform a search with a non-accentuated query?

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Re: NUTCH 0.8.1: Difficulties with Analyzers

Posted by Dennis Kubes <nu...@dragonflymc.com>.
I have not used the french analyzer...but did you use the french 
analyzer for both indexing and searching?

Dennis

Francois.McNeil@bnc.ca wrote:
> I am having a hardtime implementing the French Analyzer... Any help with 
> be immensely appreciated. Here are the details, first I tried with the 
> official plugin (analysis-fr), and then I tried another one 
> (analyze-french).
>
> 1- ATTEMPT TO USE THE PLUGIN: analysis-fr (by Jérôme)
>
> -I get the following in the logs so I know the plugin gets loaded.
>  [12/13/06 10:38:48:417 EST]  176a8c8 PluginReposit I 
> org.apache.nutch.plugin.PluginRepository   French Analysis Plug-in 
> (analysis-fr)
>
> - When performing an actual search, if it's done in french I get 
> absolutely no results.
> [12/13/06 10:39:20:134 EST]   fa0bf4 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  query request from 10.133.35.153
> [12/13/06 10:39:20:137 EST]   fa0bf4 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  query: fr?quentes
> [12/13/06 10:39:20:138 EST]   fa0bf4 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  lang: en
> [12/13/06 10:39:20:139 EST]   fa0bf4 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  searching for 20 raw hits
> [12/13/06 10:39:20:145 EST]   fa0bf4 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  total hits: 1
> [12/13/06 10:39:28:406 EST]  176a8c8 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  query request from 10.133.35.153
> [12/13/06 10:39:28:409 EST]  176a8c8 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  query: fr?quentes
> [12/13/06 10:39:28:411 EST]  176a8c8 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  lang: fr
> [12/13/06 10:39:28:413 EST]  176a8c8 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  searching for 20 raw hits
> [12/13/06 10:39:28:421 EST]  176a8c8 NutchBean     I 
> org.apache.nutch.searcher.NutchBean  total hits: 0
>
>
> 2- ATTEMPT TO USE THE PLUGIN: analyze-french (by Christophe)
>
> I get a different error with that one:
> 2006-12-13 11:14:48,891 DEBUG plugin.PluginRepository - parsing: 
> /opt/Nutch-0.8/plugins/analyze-french/plugin.xml
> 2006-12-13 11:14:48,898 DEBUG plugin.PluginRepository - plugin: 
> id=analyze-french name=French Analyzer Filter version=1.0.0 
> provider=nutch.orgclass=null
> 2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl: 
> point=org.apache.nutch.searcher.QueryFilter 
> class=org.apache.nutch.searcher.more.TypeQueryFilter
> 2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl: 
> point=org.apache.nutch.searcher.QueryFilter 
> class=org.apache.nutch.searcher.more.DateQueryFilter
> Exception in thread "main" java.lang.NullPointerException
>         at 
> org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:108)
>         at 
> org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
>         at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:180)
>         at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:173)
>         at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:389)
>
> Thank you in advance !!!
> ___________________________________________
> François McNeil
>
>
>