You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Delip Rao <de...@gmail.com> on 2009/06/07 00:25:52 UTC

Search fails every time

Hi,

Mentioned below are snippets from my indexing and searching code. For some
reason, I get zero hits all the time even for terms present in the document
collection. Can somebody point out where I'm going wrong? I'm using
lucene-core-2.4.0.jar.

Thanks!
Delip


-----------------------
Indexer.java

Directory dir = FSDirectory.getDirectory(path);
indexWriter = new IndexWriter(dir, new KeywordAnalyzer(),
     true, IndexWriter.MaxFieldLength.UNLIMITED);

// initialize docid and fileContents (in a loop)
...

currentDocument = new Document();
currentDocument.add(new Field("id", docid,
Field.Store.YES, Field.Index.NOT_ANALYZED));
currentDocument.add(new Field("content", fileContents,
Field.Store.YES, Field.Index.ANALYZED));

indexWriter.addDocument(currentDocument);


----------------------
Searcher.java

indexSearcher = new IndexSearcher(indexPath);
queryParser = new QueryParser("content", new KeywordAnalyzer());

Query query = queryParser.parse(queryString);
TopDocCollector collector = new TopDocCollector(100);
indexSearcher.search(query, collector);
System.err.println("Hits: " + collector.getTotalHits());

Re: Search fails every time

Posted by Delip Rao <de...@gmail.com>.
Thanks Simon & Grant.

Yes, the indexWriter was close()'d before searching. As Grant pointed out,
the issue really was with the Analyzer. Everything worked when I replaced
the KeywordAnalyzer with a SimpleAnalyzer.

Luke seems to be god-sent (no pun) -- With a KeywordAnalyzer, the entire
"content" field is stored as a blob of text even after adding it with
Field.Index.ANALYZED option while SimpleAnalyzer nicely lowercases and
tokenizes the text.

Cheers,
Delip

On Sun, Jun 7, 2009 at 12:51 PM, Simon Willnauer <
simon.willnauer@googlemail.com> wrote:

> Another question,
> do you commit your indexwriter before you open your searcher. You
> could also check how many docs in the index using IndexReader#numDocs
> and pass the index reader to the indexsearchers constructor.
>
> Just a guess too...
>
> simon
>
> On Sun, Jun 7, 2009 at 6:40 PM, Grant Ingersoll<gs...@apache.org>
> wrote:
> > If I had to guess, I'd say you have some type of Analysis mismatch
> between
> > what you are indexing and what you are searching.  Do you really want to
> use
> > the KeywordAnalyzer?
> >
> > You might use Luke (http://www.getopt.org/luke) to have a look at your
> index
> > and see if that sheds some light.
> >
> > Also have a look at the Lucene FAQ:
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> > -Grant
> >
> >
> > On Jun 6, 2009, at 6:25 PM, Delip Rao wrote:
> >
> >> Hi,
> >>
> >> Mentioned below are snippets from my indexing and searching code. For
> some
> >> reason, I get zero hits all the time even for terms present in the
> >> document
> >> collection. Can somebody point out where I'm going wrong? I'm using
> >> lucene-core-2.4.0.jar.
> >>
> >> Thanks!
> >> Delip
> >>
> >>
> >> -----------------------
> >> Indexer.java
> >>
> >> Directory dir = FSDirectory.getDirectory(path);
> >> indexWriter = new IndexWriter(dir, new KeywordAnalyzer(),
> >>    true, IndexWriter.MaxFieldLength.UNLIMITED);
> >>
> >> // initialize docid and fileContents (in a loop)
> >> ...
> >>
> >> currentDocument = new Document();
> >> currentDocument.add(new Field("id", docid,
> >> Field.Store.YES, Field.Index.NOT_ANALYZED));
> >> currentDocument.add(new Field("content", fileContents,
> >> Field.Store.YES, Field.Index.ANALYZED));
> >>
> >> indexWriter.addDocument(currentDocument);
> >>
> >>
> >> ----------------------
> >> Searcher.java
> >>
> >> indexSearcher = new IndexSearcher(indexPath);
> >> queryParser = new QueryParser("content", new KeywordAnalyzer());
> >>
> >> Query query = queryParser.parse(queryString);
> >> TopDocCollector collector = new TopDocCollector(100);
> >> indexSearcher.search(query, collector);
> >> System.err.println("Hits: " + collector.getTotalHits());
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> > Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Search fails every time

Posted by Simon Willnauer <si...@googlemail.com>.
Another question,
do you commit your indexwriter before you open your searcher. You
could also check how many docs in the index using IndexReader#numDocs
and pass the index reader to the indexsearchers constructor.

Just a guess too...

simon

On Sun, Jun 7, 2009 at 6:40 PM, Grant Ingersoll<gs...@apache.org> wrote:
> If I had to guess, I'd say you have some type of Analysis mismatch between
> what you are indexing and what you are searching.  Do you really want to use
> the KeywordAnalyzer?
>
> You might use Luke (http://www.getopt.org/luke) to have a look at your index
> and see if that sheds some light.
>
> Also have a look at the Lucene FAQ:
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
> -Grant
>
>
> On Jun 6, 2009, at 6:25 PM, Delip Rao wrote:
>
>> Hi,
>>
>> Mentioned below are snippets from my indexing and searching code. For some
>> reason, I get zero hits all the time even for terms present in the
>> document
>> collection. Can somebody point out where I'm going wrong? I'm using
>> lucene-core-2.4.0.jar.
>>
>> Thanks!
>> Delip
>>
>>
>> -----------------------
>> Indexer.java
>>
>> Directory dir = FSDirectory.getDirectory(path);
>> indexWriter = new IndexWriter(dir, new KeywordAnalyzer(),
>>    true, IndexWriter.MaxFieldLength.UNLIMITED);
>>
>> // initialize docid and fileContents (in a loop)
>> ...
>>
>> currentDocument = new Document();
>> currentDocument.add(new Field("id", docid,
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>> currentDocument.add(new Field("content", fileContents,
>> Field.Store.YES, Field.Index.ANALYZED));
>>
>> indexWriter.addDocument(currentDocument);
>>
>>
>> ----------------------
>> Searcher.java
>>
>> indexSearcher = new IndexSearcher(indexPath);
>> queryParser = new QueryParser("content", new KeywordAnalyzer());
>>
>> Query query = queryParser.parse(queryString);
>> TopDocCollector collector = new TopDocCollector(100);
>> indexSearcher.search(query, collector);
>> System.err.println("Hits: " + collector.getTotalHits());
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search fails every time

Posted by Grant Ingersoll <gs...@apache.org>.
If I had to guess, I'd say you have some type of Analysis mismatch  
between what you are indexing and what you are searching.  Do you  
really want to use the KeywordAnalyzer?

You might use Luke (http://www.getopt.org/luke) to have a look at your  
index and see if that sheds some light.

Also have a look at the Lucene FAQ: http://wiki.apache.org/lucene-java/LuceneFAQ

-Grant


On Jun 6, 2009, at 6:25 PM, Delip Rao wrote:

> Hi,
>
> Mentioned below are snippets from my indexing and searching code.  
> For some
> reason, I get zero hits all the time even for terms present in the  
> document
> collection. Can somebody point out where I'm going wrong? I'm using
> lucene-core-2.4.0.jar.
>
> Thanks!
> Delip
>
>
> -----------------------
> Indexer.java
>
> Directory dir = FSDirectory.getDirectory(path);
> indexWriter = new IndexWriter(dir, new KeywordAnalyzer(),
>     true, IndexWriter.MaxFieldLength.UNLIMITED);
>
> // initialize docid and fileContents (in a loop)
> ...
>
> currentDocument = new Document();
> currentDocument.add(new Field("id", docid,
> Field.Store.YES, Field.Index.NOT_ANALYZED));
> currentDocument.add(new Field("content", fileContents,
> Field.Store.YES, Field.Index.ANALYZED));
>
> indexWriter.addDocument(currentDocument);
>
>
> ----------------------
> Searcher.java
>
> indexSearcher = new IndexSearcher(indexPath);
> queryParser = new QueryParser("content", new KeywordAnalyzer());
>
> Query query = queryParser.parse(queryString);
> TopDocCollector collector = new TopDocCollector(100);
> indexSearcher.search(query, collector);
> System.err.println("Hits: " + collector.getTotalHits());

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org