You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Lutischán Ferenc <lu...@gmail.com> on 2010/01/28 14:04:45 UTC

Lucene full text search

Hi,

I have a problem with Lucene:
I'm indexed an english phrase list with Lucene:
             doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO, 
Field.Index.ANALYZED));

I searched for the word 'arabic':

             Analyzer analyzer = new 
StandardAnalyzer(Version.LUCENE_CURRENT);
             QueryParser parser = new 
QueryParser(Version.LUCENE_CURRENT, this.searchedField, analyzer);
             Query query = parser.parse(searchedStr);
             TopScoreDocCollector collector = 
TopScoreDocCollector.create(10, true);
             this.memDict.isearcher.search(query, collector);
             foundCnt=collector.getTotalHits();
             System.out.println(searchedStr + ":" + foundCnt);

             // Iterate through the results:
             ScoreDoc[] hits = collector.topDocs().scoreDocs;
             for (int i = 0; i < hits.length; i++) {
                 Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
                 System.out.println("\"r1\"=" + hitDoc.get("r1"));
             }

The result list is:
*arabic
**arabic* numerals
gum *arabic
*
But is not in the result list:
moz*arabic*

How to use Lucene to find all the words contains 'arabic'?

Regards,
     Ferenc

Re: Lucene full text search

Posted by Erick Erickson <er...@gmail.com>.

Well, there are a couple of approaches:


1> enable leading wildcards and search for *arabic*. You
     probably don't want to do this, it's really, really expensive.
2> use the ngram (edgengram?) tokenizers. This'll cost
     you some index space, but that may be acceptable.

HTH
Erick

2010/1/28 Lutischán Ferenc <lu...@gmail.com>

> Hi,
>
> I have a problem with Lucene:
> I'm indexed an english phrase list with Lucene:
>            doc.add(new Field("r1", r1.toLowerCase(), Field.Store.NO,
> Field.Index.ANALYZED));
>
> I searched for the word 'arabic':
>
>            Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
>            QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> this.searchedField, analyzer);
>            Query query = parser.parse(searchedStr);
>            TopScoreDocCollector collector = TopScoreDocCollector.create(10,
> true);
>            this.memDict.isearcher.search(query, collector);
>            foundCnt=collector.getTotalHits();
>            System.out.println(searchedStr + ":" + foundCnt);
>
>            // Iterate through the results:
>            ScoreDoc[] hits = collector.topDocs().scoreDocs;
>            for (int i = 0; i < hits.length; i++) {
>                Document hitDoc = this.memDict.isearcher.doc(hits[i].doc);
>                System.out.println("\"r1\"=" + hitDoc.get("r1"));
>            }
>
> The result list is:
> *arabic
> **arabic* numerals
> gum *arabic
> *
> But is not in the result list:
> moz*arabic*
>
> How to use Lucene to find all the words contains 'arabic'?
>
> Regards,
>    Ferenc
>