You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Fr...@bnc.ca on 2006/12/13 17:21:54 UTC
NUTCH 0.8.1: Difficulties with Analyzers
I am having a hardtime implementing the French Analyzer... Any help with
be immensely appreciated. Here are the details, first I tried with the
official plugin (analysis-fr), and then I tried another one
(analyze-french).
1- ATTEMPT TO USE THE PLUGIN: analysis-fr (by Jérôme)
-I get the following in the logs so I know the plugin gets loaded.
[12/13/06 10:38:48:417 EST] 176a8c8 PluginReposit I
org.apache.nutch.plugin.PluginRepository French Analysis Plug-in
(analysis-fr)
- When performing an actual search, if it's done in french I get
absolutely no results.
[12/13/06 10:39:20:134 EST] fa0bf4 NutchBean I
org.apache.nutch.searcher.NutchBean query request from 10.133.35.153
[12/13/06 10:39:20:137 EST] fa0bf4 NutchBean I
org.apache.nutch.searcher.NutchBean query: fr?quentes
[12/13/06 10:39:20:138 EST] fa0bf4 NutchBean I
org.apache.nutch.searcher.NutchBean lang: en
[12/13/06 10:39:20:139 EST] fa0bf4 NutchBean I
org.apache.nutch.searcher.NutchBean searching for 20 raw hits
[12/13/06 10:39:20:145 EST] fa0bf4 NutchBean I
org.apache.nutch.searcher.NutchBean total hits: 1
[12/13/06 10:39:28:406 EST] 176a8c8 NutchBean I
org.apache.nutch.searcher.NutchBean query request from 10.133.35.153
[12/13/06 10:39:28:409 EST] 176a8c8 NutchBean I
org.apache.nutch.searcher.NutchBean query: fr?quentes
[12/13/06 10:39:28:411 EST] 176a8c8 NutchBean I
org.apache.nutch.searcher.NutchBean lang: fr
[12/13/06 10:39:28:413 EST] 176a8c8 NutchBean I
org.apache.nutch.searcher.NutchBean searching for 20 raw hits
[12/13/06 10:39:28:421 EST] 176a8c8 NutchBean I
org.apache.nutch.searcher.NutchBean total hits: 0
2- ATTEMPT TO USE THE PLUGIN: analyze-french (by Christophe)
I get a different error with that one:
2006-12-13 11:14:48,891 DEBUG plugin.PluginRepository - parsing:
/opt/Nutch-0.8/plugins/analyze-french/plugin.xml
2006-12-13 11:14:48,898 DEBUG plugin.PluginRepository - plugin:
id=analyze-french name=French Analyzer Filter version=1.0.0
provider=nutch.orgclass=null
2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.more.TypeQueryFilter
2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.more.DateQueryFilter
Exception in thread "main" java.lang.NullPointerException
at
org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:108)
at
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:180)
at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:173)
at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:389)
Thank you in advance !!!
___________________________________________
François McNeil
Réf. : Re: NUTCH 0.8.1: Difficulties with Analyzers
Posted by Fr...@bnc.ca.
Hello Jérôme, merci beaucoup for getting back to me.
Here are the answers:
>1. does the documents were indexed with the french analyzer activated?
Yes, my hadoop-site.xml (in /opt/nutch-0.8/conf/) contained the
following plugin.includes before I crawled/indexed the site.
Also, the file nutch-site.xml within the webapp's WEB-INF/classes/
folder contains the same plugin.includes.
nutch-extensionpoints|language-identifier|lib-lucene-analyzers|protocol-httpclient|urlfilter-regex|parse-(text|
pdf|msword|html)|index-basic|analysis-fr|query-(basic|site|url)|summary-basic|scoring-opic
>2. could you perform a search with a non-accentuated query?
Yes, a non-accentuated search returns adequate results.
In addition, if I perform an accentuated search from a different
locale (i.e: a different location than http://myhost:8080/nutch/fr), I do
get adequate results.
___________________________________________
François McNeil
"Jérôme Charron" <je...@gmail.com>
2006-12-13 17:01
Veuillez répondre à nutch-user
Pour : nutch-user@lucene.apache.org
cc :
Objet : Re: NUTCH 0.8.1: Difficulties with Analyzers
> org.apache.nutch.searcher.NutchBean query: fr?quentes
François, two basic points I would like to check first:
1. does the documents were indexed with the french analyzer activated?
2. could you perform a search with a non-accentuated query?
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
Re: NUTCH 0.8.1: Difficulties with Analyzers
Posted by Jérôme Charron <je...@gmail.com>.
> org.apache.nutch.searcher.NutchBean query: fr?quentes
François, two basic points I would like to check first:
1. does the documents were indexed with the french analyzer activated?
2. could you perform a search with a non-accentuated query?
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
Re: NUTCH 0.8.1: Difficulties with Analyzers
Posted by Dennis Kubes <nu...@dragonflymc.com>.
I have not used the french analyzer...but did you use the french
analyzer for both indexing and searching?
Dennis
Francois.McNeil@bnc.ca wrote:
> I am having a hardtime implementing the French Analyzer... Any help with
> be immensely appreciated. Here are the details, first I tried with the
> official plugin (analysis-fr), and then I tried another one
> (analyze-french).
>
> 1- ATTEMPT TO USE THE PLUGIN: analysis-fr (by Jérôme)
>
> -I get the following in the logs so I know the plugin gets loaded.
> [12/13/06 10:38:48:417 EST] 176a8c8 PluginReposit I
> org.apache.nutch.plugin.PluginRepository French Analysis Plug-in
> (analysis-fr)
>
> - When performing an actual search, if it's done in french I get
> absolutely no results.
> [12/13/06 10:39:20:134 EST] fa0bf4 NutchBean I
> org.apache.nutch.searcher.NutchBean query request from 10.133.35.153
> [12/13/06 10:39:20:137 EST] fa0bf4 NutchBean I
> org.apache.nutch.searcher.NutchBean query: fr?quentes
> [12/13/06 10:39:20:138 EST] fa0bf4 NutchBean I
> org.apache.nutch.searcher.NutchBean lang: en
> [12/13/06 10:39:20:139 EST] fa0bf4 NutchBean I
> org.apache.nutch.searcher.NutchBean searching for 20 raw hits
> [12/13/06 10:39:20:145 EST] fa0bf4 NutchBean I
> org.apache.nutch.searcher.NutchBean total hits: 1
> [12/13/06 10:39:28:406 EST] 176a8c8 NutchBean I
> org.apache.nutch.searcher.NutchBean query request from 10.133.35.153
> [12/13/06 10:39:28:409 EST] 176a8c8 NutchBean I
> org.apache.nutch.searcher.NutchBean query: fr?quentes
> [12/13/06 10:39:28:411 EST] 176a8c8 NutchBean I
> org.apache.nutch.searcher.NutchBean lang: fr
> [12/13/06 10:39:28:413 EST] 176a8c8 NutchBean I
> org.apache.nutch.searcher.NutchBean searching for 20 raw hits
> [12/13/06 10:39:28:421 EST] 176a8c8 NutchBean I
> org.apache.nutch.searcher.NutchBean total hits: 0
>
>
> 2- ATTEMPT TO USE THE PLUGIN: analyze-french (by Christophe)
>
> I get a different error with that one:
> 2006-12-13 11:14:48,891 DEBUG plugin.PluginRepository - parsing:
> /opt/Nutch-0.8/plugins/analyze-french/plugin.xml
> 2006-12-13 11:14:48,898 DEBUG plugin.PluginRepository - plugin:
> id=analyze-french name=French Analyzer Filter version=1.0.0
> provider=nutch.orgclass=null
> 2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.more.TypeQueryFilter
> 2006-12-13 11:14:48,899 DEBUG plugin.PluginRepository - impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.more.DateQueryFilter
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:108)
> at
> org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
> at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:180)
> at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:173)
> at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:389)
>
> Thank you in advance !!!
> ___________________________________________
> François McNeil
>
>
>