You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kai_testing Middleton <ka...@yahoo.com> on 2007/08/08 01:22:09 UTC

StandardAnalyzer vs KeywordAnalyzer in Luke

I'm invoking Luke like this: 
   java -jar lukeall-0.7.1.jar
I run this query:
   content:Nyarubuye

When I use the StandardAnalyzer I get results but when I use the
KeywordAnalyzer I don't get results.  Can someone explain this?  

My corpus was crawled and indexed using a nightly build of nutch (with Lucene
2.2, just like my Luke 0.7.1), crawling a bunch of news sites.  A legitimate
result page would be:
http://news.bbc.co.uk/2/hi/programmes/panorama/3582267.stm

SimpleAnalyzer also works as does StopAnalyzer.  WhitespaceAnalyzer fails.
(SnowballAnalyzer gives me a ClassDefNotFound exception).  PerfieldAnalyzer
gives me a PerfieldAnalyzerWrapper error.




       
____________________________________________________________________________________
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
http://farechase.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: StandardAnalyzer vs KeywordAnalyzer in Luke

Posted by Erick Erickson <er...@gmail.com>.
>From the documentation for both SimpleAnalyzer and
StopAnalyzer...

...with LowerCaseFilter...

So I assume that your problem is the capital "N"...

Erick

On 8/7/07, Kai_testing Middleton <ka...@yahoo.com> wrote:
>
> I'm invoking Luke like this:
>    java -jar lukeall-0.7.1.jar
> I run this query:
>    content:Nyarubuye
>
> When I use the StandardAnalyzer I get results but when I use the
> KeywordAnalyzer I don't get results.  Can someone explain this?
>
> My corpus was crawled and indexed using a nightly build of nutch (with
> Lucene
> 2.2, just like my Luke 0.7.1), crawling a bunch of news sites.  A
> legitimate
> result page would be:
> http://news.bbc.co.uk/2/hi/programmes/panorama/3582267.stm
>
> SimpleAnalyzer also works as does StopAnalyzer.  WhitespaceAnalyzer fails.
> (SnowballAnalyzer gives me a ClassDefNotFound
> exception).  PerfieldAnalyzer
> gives me a PerfieldAnalyzerWrapper error.
>
>
>
>
>
>
> ____________________________________________________________________________________
> Looking for a deal? Find great prices on flights and hotels with Yahoo!
> FareChase.
> http://farechase.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: StandardAnalyzer vs KeywordAnalyzer in Luke

Posted by Grant Ingersoll <gs...@apache.org>.
Nutch uses it's own Analyzer.  You should use the Analyzer that Nutch  
uses in order to get proper results.  That may mean adding the Nutch  
Analyzer to your Luke classpath.

-Grant

On Aug 7, 2007, at 7:22 PM, Kai_testing Middleton wrote:

> I'm invoking Luke like this:
>    java -jar lukeall-0.7.1.jar
> I run this query:
>    content:Nyarubuye
>
> When I use the StandardAnalyzer I get results but when I use the
> KeywordAnalyzer I don't get results.  Can someone explain this?
>
> My corpus was crawled and indexed using a nightly build of nutch  
> (with Lucene
> 2.2, just like my Luke 0.7.1), crawling a bunch of news sites.  A  
> legitimate
> result page would be:
> http://news.bbc.co.uk/2/hi/programmes/panorama/3582267.stm
>
> SimpleAnalyzer also works as does StopAnalyzer.  WhitespaceAnalyzer  
> fails.
> (SnowballAnalyzer gives me a ClassDefNotFound exception).   
> PerfieldAnalyzer
> gives me a PerfieldAnalyzerWrapper error.
>
>
>
>
>
> ______________________________________________________________________ 
> ______________
> Looking for a deal? Find great prices on flights and hotels with  
> Yahoo! FareChase.
> http://farechase.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: StandardAnalyzer vs KeywordAnalyzer in Luke

Posted by Andrzej Bialecki <ab...@getopt.org>.
elguillelmo wrote:
> 
> Kai_testing Middleton wrote:
>> The nutch analyzer is NutchDocumentAnalyzer.  Does anyone know how to add
>> this to the Luke classpath?  I tried this kind of thing but it didn't work
>>
> 
> I'm trying to work out the same thing, to no avail. Would anybody be able to
> detail how to add Nutch's Analyzer to the Luke's classpath?
> 
> What I'm doing at the moment is:
> 
> java -classpath lukeall-0.8.1.jar:/path/to/nutchAnalyzer.jar
> org.getopt.luke.Luke

Well ... It could be done, but not easily.

First, NutchDocumentAnalyzer is dependent on other Nutch classes (so you 
need nutch-${version}.jar) but they in turn depend on Hadoop (so you 
need hadoop-core*.jar), which in turn depends on a dozen or so other 
jars ... All of this needs to be added to classpath.

Second, this analyzer doesn't have a no-args constructor, it needs a 
Hadoop Configuration argument. Luke can handle only no-args or single 
String arg constructors. I would have to change the way Analyzers are 
instantiated in Luke so that you can pass an existing instance (e.g. one 
that you created in the scripting plugin context).

Third, NutchDocumentAnalyzer uses CommonGrams, which in turn _require_ 
the presence of a common-grams.utf8 resource on the classpath.

To summarize: unless you want to get your hands dirty with Luke 
internals it can't be done.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: StandardAnalyzer vs KeywordAnalyzer in Luke

Posted by elguillelmo <gg...@lsi.uned.es>.

Kai_testing Middleton wrote:
> 
> The nutch analyzer is NutchDocumentAnalyzer.  Does anyone know how to add
> this to the Luke classpath?  I tried this kind of thing but it didn't work
> 

I'm trying to work out the same thing, to no avail. Would anybody be able to
detail how to add Nutch's Analyzer to the Luke's classpath?

What I'm doing at the moment is:

java -classpath lukeall-0.8.1.jar:/path/to/nutchAnalyzer.jar
org.getopt.luke.Luke

Thanks,

--
guille
-- 
View this message in context: http://www.nabble.com/StandardAnalyzer-vs-KeywordAnalyzer-in-Luke-tp12044434p20796392.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org