You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rajiv Roopan <ra...@gmail.com> on 2006/08/01 00:38:40 UTC

Search matching

Hello, I have an index of locations for example. I'm indexing one field
using SimpleAnalyzer.

doc1: albany ny
doc2: hudson ny
doc3: new york ny
doc4: new york mills ny

when I search for "new york ny" , the first result returned is always "new
york mills ny". Am I doing something incorrect?

thanks in advance,
rajiv

Re: Search matching

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Rajiv,

Have a look at the details provided by IndexSearcher.explain() for  
those documents, and you'll get some insight into the factors used to  
rank them.  Since both scores are 1.0, you'll probably want to  
implement your own custom Similarity and override the lengthNorm() to  
adjust that factor.

Another technique you can use is to expand a users query into a more  
sophisticated boolean query, such that a users query for "new york  
ny" would become (in Query.toString format): +new +york +ny "new york  
ny", which would boost exact matches.

	Erik


On Aug 1, 2006, at 1:19 PM, Rajiv Roopan wrote:

> Ok, this is how I'm indexing. Both in indexing and searching I'm using
> SimpleAnalyzer()
>
> String loc = "New York, NY";
> doc.add(new Field("location", loc, Field.Store.NO,  
> Field.Index.TOKENIZED));
>
> String loc2 = "New York Mills, NY";
> doc.add(new Field("location", loc2, Field.Store.NO,  
> Field.Index.TOKENIZED
> ));
>
>
> and this is how I'm searching...
>
>  String searchStr = "New York, NY";
>            Analyzer analyzer = new SimpleAnalyzer();
>            QueryParser parser = new QueryParser("location", analyzer);
>            parser.setDefaultOperator(QueryParser.AND_OPERATOR);
>            Query query = parser.parse( searchStr );
>
>           Hits hits = searcher.search( query );
>
> I've tried all query types and everytime "new york mills, ny" is in  
> hits(0).
> Both results have a score of 1.0. I know I can add some kind of  
> sort to
> always make the shorter field first. But shouldn't the first by  
> default, due
> to the scoring algorithm, be "new york, ny" because it's a shorter  
> field?
>
> let me know if i'm missing something. thanks!
>
> rajiv
>
> On 8/1/06, Simon Willnauer <si...@googlemail.com> wrote:
>>
>> I guess so, but without any information about your code nobody can  
>> tell
>> what.
>> If you provide more information you willl get help!!
>>
>> regards simon
>>
>> On 8/1/06, Rajiv Roopan <ra...@gmail.com> wrote:
>> > Hello, I have an index of locations for example. I'm indexing  
>> one field
>> > using SimpleAnalyzer.
>> >
>> > doc1: albany ny
>> > doc2: hudson ny
>> > doc3: new york ny
>> > doc4: new york mills ny
>> >
>> > when I search for "new york ny" , the first result returned is  
>> always
>> "new
>> > york mills ny". Am I doing something incorrect?
>> >
>> > thanks in advance,
>> > rajiv
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search matching

Posted by Rajiv Roopan <ra...@gmail.com>.
Ok, this is how I'm indexing. Both in indexing and searching I'm using
SimpleAnalyzer()

String loc = "New York, NY";
doc.add(new Field("location", loc, Field.Store.NO, Field.Index.TOKENIZED));

String loc2 = "New York Mills, NY";
 doc.add(new Field("location", loc2, Field.Store.NO, Field.Index.TOKENIZED
));


and this is how I'm searching...

  String searchStr = "New York, NY";
            Analyzer analyzer = new SimpleAnalyzer();
            QueryParser parser = new QueryParser("location", analyzer);
            parser.setDefaultOperator(QueryParser.AND_OPERATOR);
            Query query = parser.parse( searchStr );

           Hits hits = searcher.search( query );

I've tried all query types and everytime "new york mills, ny" is in hits(0).
Both results have a score of 1.0. I know I can add some kind of sort to
always make the shorter field first. But shouldn't the first by default, due
to the scoring algorithm, be "new york, ny" because it's a shorter field?

let me know if i'm missing something. thanks!

rajiv

On 8/1/06, Simon Willnauer <si...@googlemail.com> wrote:
>
> I guess so, but without any information about your code nobody can tell
> what.
> If you provide more information you willl get help!!
>
> regards simon
>
> On 8/1/06, Rajiv Roopan <ra...@gmail.com> wrote:
> > Hello, I have an index of locations for example. I'm indexing one field
> > using SimpleAnalyzer.
> >
> > doc1: albany ny
> > doc2: hudson ny
> > doc3: new york ny
> > doc4: new york mills ny
> >
> > when I search for "new york ny" , the first result returned is always
> "new
> > york mills ny". Am I doing something incorrect?
> >
> > thanks in advance,
> > rajiv
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Search matching

Posted by Simon Willnauer <si...@googlemail.com>.
I guess so, but without any information about your code nobody can tell what.
If you provide more information you willl get help!!

regards simon

On 8/1/06, Rajiv Roopan <ra...@gmail.com> wrote:
> Hello, I have an index of locations for example. I'm indexing one field
> using SimpleAnalyzer.
>
> doc1: albany ny
> doc2: hudson ny
> doc3: new york ny
> doc4: new york mills ny
>
> when I search for "new york ny" , the first result returned is always "new
> york mills ny". Am I doing something incorrect?
>
> thanks in advance,
> rajiv
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org