You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lutischán Ferenc <lu...@gmail.com> on 2010/01/28 14:04:45 UTC
Lucene full text search
Hi,
I have a problem with Lucene:
I'm indexed an english phrase list with Lucene:
doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO,
Field.Index.ANALYZED));
I searched for the word 'arabic':
Analyzer analyzer = new
StandardAnalyzer(Version.LUCENE_CURRENT);
QueryParser parser = new
QueryParser(Version.LUCENE_CURRENT, this.searchedField, analyzer);
Query query = parser.parse(searchedStr);
TopScoreDocCollector collector =
TopScoreDocCollector.create(10, true);
this.memDict.isearcher.search(query, collector);
foundCnt=collector.getTotalHits();
System.out.println(searchedStr + ":" + foundCnt);
// Iterate through the results:
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (int i = 0; i < hits.length; i++) {
Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
System.out.println("\"r1\"=" + hitDoc.get("r1"));
}
The result list is:
*arabic
**arabic* numerals
gum *arabic
*
But is not in the result list:
moz*arabic*
How to use Lucene to find all the words contains 'arabic'?
Regards,
Ferenc
Re: Lucene full text search
Posted by Erick Erickson <er...@gmail.com>.
Well, there are a couple of approaches:
1> enable leading wildcards and search for *arabic*. You
probably don't want to do this, it's really, really expensive.
2> use the ngram (edgengram?) tokenizers. This'll cost
you some index space, but that may be acceptable.
HTH
Erick
2010/1/28 Lutischán Ferenc <lu...@gmail.com>
> Hi,
>
> I have a problem with Lucene:
> I'm indexed an english phrase list with Lucene:
> doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO,
> Field.Index.ANALYZED));
>
> I searched for the word 'arabic':
>
> Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> this.searchedField, analyzer);
> Query query = parser.parse(searchedStr);
> TopScoreDocCollector collector = TopScoreDocCollector.create(10,
> true);
> this.memDict.isearcher.search(query, collector);
> foundCnt=collector.getTotalHits();
> System.out.println(searchedStr + ":" + foundCnt);
>
> // Iterate through the results:
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
> for (int i = 0; i < hits.length; i++) {
> Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
> System.out.println("\"r1\"=" + hitDoc.get("r1"));
> }
>
> The result list is:
> *arabic
> **arabic* numerals
> gum *arabic
> *
> But is not in the result list:
> moz*arabic*
>
> How to use Lucene to find all the words contains 'arabic'?
>
> Regards,
> Ferenc
>