You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/08/08 01:22:09 UTC
StandardAnalyzer vs KeywordAnalyzer in Luke
I'm invoking Luke like this:
java -jar lukeall-0.7.1.jar
I run this query:
content:Nyarubuye
When I use the StandardAnalyzer I get results but when I use the
KeywordAnalyzer I don't get results. Can someone explain this?
My corpus was crawled and indexed using a nightly build of nutch (with Lucene
2.2, just like my Luke 0.7.1), crawling a bunch of news sites. A legitimate
result page would be:
http://news.bbc.co.uk/2/hi/programmes/panorama/3582267.stm
SimpleAnalyzer also works as does StopAnalyzer. WhitespaceAnalyzer fails.
(SnowballAnalyzer gives me a ClassDefNotFound exception). PerfieldAnalyzer
gives me a PerfieldAnalyzerWrapper error.
____________________________________________________________________________________
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
http://farechase.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: StandardAnalyzer vs KeywordAnalyzer in Luke
Posted by Erick Erickson <er...@gmail.com>.
>From the documentation for both SimpleAnalyzer and
StopAnalyzer...
...with LowerCaseFilter...
So I assume that your problem is the capital "N"...
Erick
On 8/7/07, Kai_testing Middleton <ka...@yahoo.com> wrote:
>
> I'm invoking Luke like this:
> java -jar lukeall-0.7.1.jar
> I run this query:
> content:Nyarubuye
>
> When I use the StandardAnalyzer I get results but when I use the
> KeywordAnalyzer I don't get results. Can someone explain this?
>
> My corpus was crawled and indexed using a nightly build of nutch (with
> Lucene
> 2.2, just like my Luke 0.7.1), crawling a bunch of news sites. A
> legitimate
> result page would be:
> http://news.bbc.co.uk/2/hi/programmes/panorama/3582267.stm
>
> SimpleAnalyzer also works as does StopAnalyzer. WhitespaceAnalyzer fails.
> (SnowballAnalyzer gives me a ClassDefNotFound
> exception). PerfieldAnalyzer
> gives me a PerfieldAnalyzerWrapper error.
>
>
>
>
>
>
> ____________________________________________________________________________________
> Looking for a deal? Find great prices on flights and hotels with Yahoo!
> FareChase.
> http://farechase.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: StandardAnalyzer vs KeywordAnalyzer in Luke
Posted by Grant Ingersoll <gs...@apache.org>.
Nutch uses it's own Analyzer. You should use the Analyzer that Nutch
uses in order to get proper results. That may mean adding the Nutch
Analyzer to your Luke classpath.
-Grant
On Aug 7, 2007, at 7:22 PM, Kai_testing Middleton wrote:
> I'm invoking Luke like this:
> java -jar lukeall-0.7.1.jar
> I run this query:
> content:Nyarubuye
>
> When I use the StandardAnalyzer I get results but when I use the
> KeywordAnalyzer I don't get results. Can someone explain this?
>
> My corpus was crawled and indexed using a nightly build of nutch
> (with Lucene
> 2.2, just like my Luke 0.7.1), crawling a bunch of news sites. A
> legitimate
> result page would be:
> http://news.bbc.co.uk/2/hi/programmes/panorama/3582267.stm
>
> SimpleAnalyzer also works as does StopAnalyzer. WhitespaceAnalyzer
> fails.
> (SnowballAnalyzer gives me a ClassDefNotFound exception).
> PerfieldAnalyzer
> gives me a PerfieldAnalyzerWrapper error.
>
>
>
>
>
> ______________________________________________________________________
> ______________
> Looking for a deal? Find great prices on flights and hotels with
> Yahoo! FareChase.
> http://farechase.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: StandardAnalyzer vs KeywordAnalyzer in Luke
Posted by Andrzej Bialecki <ab...@getopt.org>.
elguillelmo wrote:
>
> Kai_testing Middleton wrote:
>> The nutch analyzer is NutchDocumentAnalyzer. Does anyone know how to add
>> this to the Luke classpath? I tried this kind of thing but it didn't work
>>
>
> I'm trying to work out the same thing, to no avail. Would anybody be able to
> detail how to add Nutch's Analyzer to the Luke's classpath?
>
> What I'm doing at the moment is:
>
> java -classpath lukeall-0.8.1.jar:/path/to/nutchAnalyzer.jar
> org.getopt.luke.Luke
Well ... It could be done, but not easily.
First, NutchDocumentAnalyzer is dependent on other Nutch classes (so you
need nutch-${version}.jar) but they in turn depend on Hadoop (so you
need hadoop-core*.jar), which in turn depends on a dozen or so other
jars ... All of this needs to be added to classpath.
Second, this analyzer doesn't have a no-args constructor, it needs a
Hadoop Configuration argument. Luke can handle only no-args or single
String arg constructors. I would have to change the way Analyzers are
instantiated in Luke so that you can pass an existing instance (e.g. one
that you created in the scripting plugin context).
Third, NutchDocumentAnalyzer uses CommonGrams, which in turn _require_
the presence of a common-grams.utf8 resource on the classpath.
To summarize: unless you want to get your hands dirty with Luke
internals it can't be done.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: StandardAnalyzer vs KeywordAnalyzer in Luke
Posted by elguillelmo <gg...@lsi.uned.es>.
Kai_testing Middleton wrote:
>
> The nutch analyzer is NutchDocumentAnalyzer. Does anyone know how to add
> this to the Luke classpath? I tried this kind of thing but it didn't work
>
I'm trying to work out the same thing, to no avail. Would anybody be able to
detail how to add Nutch's Analyzer to the Luke's classpath?
What I'm doing at the moment is:
java -classpath lukeall-0.8.1.jar:/path/to/nutchAnalyzer.jar
org.getopt.luke.Luke
Thanks,
--
guille
--
View this message in context: http://www.nabble.com/StandardAnalyzer-vs-KeywordAnalyzer-in-Luke-tp12044434p20796392.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org