You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by shb <sh...@gmail.com> on 2009/02/25 15:31:11 UTC

sort lucene results

hi i need help.

i need to search by word in sentences with lucene. for example by the word
"bbb" i got the right results of all the sentences :

"text  ok ok ok bbb" , "text 2 bbb text " , "bbb  text 4...". 

but i need the result by the word place in the sentence like this:

"bbb text 4...". , "text 2 bbb text " , "text 1 ok ok ok bbb" ..

waiting for ideas.. thanks..

 
-- 
View this message in context: http://www.nabble.com/sort-lucene-results-tp22203922p22203922.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: sort lucene results

Posted by Uwe Schindler <uw...@thetaphi.de>.
Go to the analyzer package decription, there is an example of a TokenFilter.

Just step into your analyzers' TokenStream and implement a TokenFilter for
it. The method next() is called for each token by o.a.l.d.Field on indexing.

In documentation of
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/Token.htm
l#setPositionIncrement(int), stands:
"Set it to zero to put multiple terms in the same position. This is useful
if, e.g., a word has multiple stems. Searches for phrases including either
stem will match. In this case, all but the first stem's increment should be
set to zero: the increment of the first instance should be one. Repeating a
token with an increment of zero can also be used to boost the scores of
matches on that token."

So just modify the Tokens from the tokeinizer and return them more than once
to raise boost (and set the position increment for the second and third...
instance of the same token to 0). This is complicated, but may be simple to
implement (not sure).

You can then add the field using the Field ctors with TokenStream.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: shb [mailto:sharonbn@gmail.com]
> Sent: Wednesday, February 25, 2009 3:56 PM
> To: java-dev@lucene.apache.org
> Subject: RE: sort lucene results
> 
> 
> i set the index field like this:
> 
> Field nameField = null;
> while(rs.next() == true)
>             {
>                 String name = rs.getString("name");
>                 nameField = new
> Field("name",name.trim(),Field.Store.YES,Field.Index.TOKENIZED);
>                 doc.add(nameField);
>                 writer.addDocument(doc);
>             }
> 
> can you  write an example how can i use Tokenizer to boolt the words.
> 
> thanks for the quick answer.
> 
> 
> Uwe Schindler wrote:
> >
> > With a custom Tokenizer/Analyzer you could boost the words (tokens)
> during
> > indexing by their position, e.g. first word gets factor 100, second 99
> and
> > so on. As sorting is by relevance, hits where the word is more at the
> > beginning gets higher ranking because of boost.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: shb [mailto:sharonbn@gmail.com]
> >> Sent: Wednesday, February 25, 2009 3:31 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: sort lucene results
> >>
> >>
> >> hi i need help.
> >>
> >> i need to search by word in sentences with lucene. for example by the
> >> word
> >> "bbb" i got the right results of all the sentences :
> >>
> >> "text  ok ok ok bbb" , "text 2 bbb text " , "bbb  text 4...".
> >>
> >> but i need the result by the word place in the sentence like this:
> >>
> >> "bbb text 4...". , "text 2 bbb text " , "text 1 ok ok ok bbb" ..
> >>
> >> waiting for ideas.. thanks..
> >>
> >>
> >> --
> >> View this message in context: http://www.nabble.com/sort-lucene-
> results-
> >> tp22203922p22203922.html
> >> Sent from the Lucene - Java Developer mailing list archive at
> Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> 
> --
> View this message in context: http://www.nabble.com/sort-lucene-results-
> tp22203922p22204437.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: sort lucene results

Posted by shb <sh...@gmail.com>.
i set the index field like this:

Field nameField = null;
while(rs.next() == true)
            {
                String name = rs.getString("name");
                nameField = new
Field("name",name.trim(),Field.Store.YES,Field.Index.TOKENIZED);
                doc.add(nameField);
                writer.addDocument(doc);
            }

can you  write an example how can i use Tokenizer to boolt the words. 

thanks for the quick answer.


Uwe Schindler wrote:
> 
> With a custom Tokenizer/Analyzer you could boost the words (tokens) during
> indexing by their position, e.g. first word gets factor 100, second 99 and
> so on. As sorting is by relevance, hits where the word is more at the
> beginning gets higher ranking because of boost.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
>> -----Original Message-----
>> From: shb [mailto:sharonbn@gmail.com]
>> Sent: Wednesday, February 25, 2009 3:31 PM
>> To: java-dev@lucene.apache.org
>> Subject: sort lucene results
>> 
>> 
>> hi i need help.
>> 
>> i need to search by word in sentences with lucene. for example by the
>> word
>> "bbb" i got the right results of all the sentences :
>> 
>> "text  ok ok ok bbb" , "text 2 bbb text " , "bbb  text 4...".
>> 
>> but i need the result by the word place in the sentence like this:
>> 
>> "bbb text 4...". , "text 2 bbb text " , "text 1 ok ok ok bbb" ..
>> 
>> waiting for ideas.. thanks..
>> 
>> 
>> --
>> View this message in context: http://www.nabble.com/sort-lucene-results-
>> tp22203922p22203922.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/sort-lucene-results-tp22203922p22204437.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: sort lucene results

Posted by Uwe Schindler <uw...@thetaphi.de>.
With a custom Tokenizer/Analyzer you could boost the words (tokens) during
indexing by their position, e.g. first word gets factor 100, second 99 and
so on. As sorting is by relevance, hits where the word is more at the
beginning gets higher ranking because of boost.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: shb [mailto:sharonbn@gmail.com]
> Sent: Wednesday, February 25, 2009 3:31 PM
> To: java-dev@lucene.apache.org
> Subject: sort lucene results
> 
> 
> hi i need help.
> 
> i need to search by word in sentences with lucene. for example by the word
> "bbb" i got the right results of all the sentences :
> 
> "text  ok ok ok bbb" , "text 2 bbb text " , "bbb  text 4...".
> 
> but i need the result by the word place in the sentence like this:
> 
> "bbb text 4...". , "text 2 bbb text " , "text 1 ok ok ok bbb" ..
> 
> waiting for ideas.. thanks..
> 
> 
> --
> View this message in context: http://www.nabble.com/sort-lucene-results-
> tp22203922p22203922.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: sort lucene results

Posted by Chris Hostetter <ho...@fucit.org>.
: but i need the result by the word place in the sentence like this:
: 
: "bbb text 4...". , "text 2 bbb text " , "text 1 ok ok ok bbb" ..

1) SpanFirstQuery should work, it scores higher the closer the nested 
query is to the start -- just use a really high limit,.  if you are only 
dealing with simple Term/Phrase queries it's easy to switch to using SpanTerm and 
SpanNear queries inside of a SpanFirstQuery.

2) Please Use "java-user@lucene" Not "java-dev@lucene"
http://people.apache.org/~hossman/#java-dev

Your question is better suited for the java-user@lucene mailing list ...
not the java-dev@lucene list.  java-dev is for discussing development of
the internals of the Lucene Java library ... it is *not* the appropriate
place to ask questions about how to use the Lucene Java library when
developing your own applications.  Please resend your message to
the java-user mailing list, where you are likely to get more/better
responses since that list also has a larger number of subscribers.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org