You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tomas Kliegr <to...@vse.cz> on 2010/05/03 16:36:16 UTC

No hits to API query over existing Mediawiki index

Hi all,

I am trying to query the lucene 2.1.3 index created by the Mediawiki
Lucene search extensions
(http://www.mediawiki.org/wiki/Extension:Lucene-search). I am able to
access search most fields in the index, apart from the links field in
the wiki.links index. Any query I issue either from the Lucene 3.0.1
API or through Luke returns empty result set. I debugged the next()
method of the SplitTokenStream Tokenizer that is used to add terms to
the links field:

public class SplitAnalyzer extends Analyzer {
	public final static int GROUP_GAP = 200;
	class SplitTokenStream extends Tokenizer {

            public Token next()
            {
             ...
             System.out.println("token:'" + t.termText());
             return t;
             }
}

placed towards the end of the next method returns tokens like:
0:Anarchism
0:Oxford University Press
0:Social anarchism
0:Individualist anarchism

If I try to retrieve documents with the same tokens like this:
TermDocs tds  = linksReader.termDocs(new Term("links","0:Anarchism"));

- or like this: -

Term t =  new Term("links", "0:Anarchism");
TermQuery query = new TermQuery(t);
TopDocs tds = linksSearcher.search(query, 1);

The tds.next() always returns false and td.totalHits is always zero.
The same empty result for searches from Luke with all predefined
Analyzers. I.e. there is no document in the result set, although this
very same string is present in multiple documents, when the index is
searched by Luke.

I also tried with no luck all sorts of variations of the query, such
as searching for 0:Anarchism,anarchism, 0:Anarchism,...

The links field is indexed, stored, tokenized and has term vector

Thanks in advance for any hints
-- 
Tomas Kliegr

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org