You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by elisabeth benoit <el...@gmail.com> on 2011/09/16 14:39:00 UTC

getting answers starting with a requested string first

Hello,

Iif I have a request with

fq=NAME_ANALYZED:tour eiffel

and I have different answers like

Restaurant la tour Eiffel
Hotel la tour Eiffel
Tour Eiffel
...

Is there a way to get answers with NAME_ANALYZED beginning with "tour
Eiffel" first?

Thanks,
Elisabeth

Re: getting answers starting with a requested string first

Posted by elisabeth benoit <el...@gmail.com>.
Thanks a lot for your advice.

What really matters to me is that answers with NAME_ANALYZED=Tour Eiffel
appear first. Then, if "Tour Eiffel Tower By Helicopter" appears before or
after "Hotel la tour Eiffel" doesn't really matter.

Since I send fq=NAME_ANALYZED:tour eiffel, I am sure NAME_ANALYZED will at
least contain those two words. So I figured out that if I sort answers by
this field length, I'll get those called "Tour eiffel" first.

But I'll check the QParser anyway since it seems to be an interesting one.

Best regards,
Elisabeth

2011/9/28 Chris Hostetter <ho...@fucit.org>

>
> : 1) giving NAME_ANALYZED a type where omitNorms=false: I thought this
> would
> : give answers with shorter NAME_ANALYZED field a higher score. I've tested
> : that solution, but it's not working. I guess this is because there is no
> : score for fq parameter (all my answers have same score)
>
> both of those statements are correct.  omitNorms=false will cause length
> normalization to apply, so with the default similarity, shorter field
> values will generally score higher, but norms are very coarse, so it
> won't be very precise; and "fq" queries filter the results,
> but do not affect the score.
>
> : 2) sorting my answers by length desc, and I guess in this case I would
> need
> : to store the length of NAME_ANALYZED field to avoid having to compute it
> on
> : the fly. at this point, this is the only solution I can think of.
>
> that will also be a good way to sort on the length of the field, and will
> give you a lot of precise control.
>
> but sorting on length isn't what you asked about...
>
> : > and I have different answers like
> : >
> : > Restaurant la tour Eiffel
> : > Hotel la tour Eiffel
> : > Tour Eiffel
>        ...
> : > Is there a way to get answers with NAME_ANALYZED beginning with "tour
> : > Eiffel" first?
>
> If you want to score documents higher because they appear at the begining
> of the field value, that is a differnet problem then scoring documents
> higher because they are shorter -- ie: "Tour Eiffel Tower By Helicopter"
> is longer then "Hotel la tour Eiffel", which one do you want to come
> first?
>
> If you want documents to score higher if they appear "early" in the field
> value, you can either index a "marker" token at the begining of the field
> (ie: "S_T_A_R_T Tour Eiffel") and then do all queries on that field as
> phrase queries including that token (shorter matches score higher in
> phrase queries); or you can look into using the "surround" QParser that
> was recently commited to the trunk.  the surround parser has special
> syntax for generting "Span" Queries, which support a "SpanFirst" query
> that scores documents higher based on how close to the begining of a field
> value the match is.
>
>
> -Hoss
>

Re: getting answers starting with a requested string first

Posted by Chris Hostetter <ho...@fucit.org>.
: 1) giving NAME_ANALYZED a type where omitNorms=false: I thought this would
: give answers with shorter NAME_ANALYZED field a higher score. I've tested
: that solution, but it's not working. I guess this is because there is no
: score for fq parameter (all my answers have same score)

both of those statements are correct.  omitNorms=false will cause length 
normalization to apply, so with the default similarity, shorter field 
values will generally score higher, but norms are very coarse, so it 
won't be very precise; and "fq" queries filter the results, 
but do not affect the score.

: 2) sorting my answers by length desc, and I guess in this case I would need
: to store the length of NAME_ANALYZED field to avoid having to compute it on
: the fly. at this point, this is the only solution I can think of.

that will also be a good way to sort on the length of the field, and will 
give you a lot of precise control.

but sorting on length isn't what you asked about...

: > and I have different answers like
: >
: > Restaurant la tour Eiffel
: > Hotel la tour Eiffel
: > Tour Eiffel
	...
: > Is there a way to get answers with NAME_ANALYZED beginning with "tour
: > Eiffel" first?

If you want to score documents higher because they appear at the begining 
of the field value, that is a differnet problem then scoring documents 
higher because they are shorter -- ie: "Tour Eiffel Tower By Helicopter" 
is longer then "Hotel la tour Eiffel", which one do you want to come 
first?

If you want documents to score higher if they appear "early" in the field 
value, you can either index a "marker" token at the begining of the field 
(ie: "S_T_A_R_T Tour Eiffel") and then do all queries on that field as 
phrase queries including that token (shorter matches score higher in 
phrase queries); or you can look into using the "surround" QParser that 
was recently commited to the trunk.  the surround parser has special 
syntax for generting "Span" Queries, which support a "SpanFirst" query 
that scores documents higher based on how close to the begining of a field 
value the match is.


-Hoss

Re: getting answers starting with a requested string first

Posted by elisabeth benoit <el...@gmail.com>.
Hello all,

I'm answering my own post, hoping someone will comment.

I thought about two possibilities to solve my problem:

1) giving NAME_ANALYZED a type where omitNorms=false: I thought this would
give answers with shorter NAME_ANALYZED field a higher score. I've tested
that solution, but it's not working. I guess this is because there is no
score for fq parameter (all my answers have same score)

2) sorting my answers by length desc, and I guess in this case I would need
to store the length of NAME_ANALYZED field to avoid having to compute it on
the fly. at this point, this is the only solution I can think of.

Any comment would be appreciated,
Thanks,
Elisabeth



2011/9/16 elisabeth benoit <el...@gmail.com>

>
> Hello,
>
> Iif I have a request with
>
> fq=NAME_ANALYZED:tour eiffel
>
> and I have different answers like
>
> Restaurant la tour Eiffel
> Hotel la tour Eiffel
> Tour Eiffel
> ...
>
> Is there a way to get answers with NAME_ANALYZED beginning with "tour
> Eiffel" first?
>
> Thanks,
> Elisabeth
>