You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Nourredine K." <no...@yahoo.com> on 2009/09/30 17:33:54 UTC

Questions about synonyms and highlighting

Hi,

Can you please give me some answers for those questions : 

1 - How can I get synonyms found for  a keyword ? 
    
I mean i search "foo" and i have in my synonyms.txt file the following tokens : "foo, foobar, fee" (with expand = true)
My index contains "foo" and "foobar". I want to display a message in a result page, on the header for example, only the 2 matched tokens and not "fee"  like "Results found for foo and foobar" 

2 - Can solR make analysis on an index to extract associations between tokens ?

for example , if "foo" often appears with "fee" in a field, it will associate the 2 tokens.

3 - Is it possible and if so How can I configure solR to set or not highlighting for tokens with diacritics ? 

Settings for "vélo" (all highlighted) ==> the two words "<em>vélo</em>" and "<em>velo</em>" are highlighted
Settings for "vélo" ==> the first word "<em>vélo</em>" is highlighted but not the second  : "velo"

4 - the same question for highlighting with lemmatisation?

Settings for "manage" (all highlighted) ==> the two words<em>manage</em> and "<em>management</em>" are highlighted
Settings for "manage" ==> the first word "<em>manage</em>" is highlighted but not the second  : "management"


Thanks in advance.

Regards 

Nourredine.


__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail

Re : Re : Questions about synonyms and highlighting

Posted by "Nourredine K." <no...@yahoo.com>.

Thanks Avlesh.

Now, I understand better how higtlighting works.

As you've said, since it is based on the analysers, higtlighting will handle things like search.

A precision about #3 and #4 examples , they are exclusives : I wanted to know how to do higtlighting with stemming OR without (not both in same time)

So I think you've answered to #3 too :) All depend on your analysers. And for my case, the ISOLatin1AccentFilterFactory could do the job.

Thanks again Shalin and Avlesh.

Regard,

Nourredine.


> There is no Lemmatisation support in Solr as of now. The only support you
> get is stemming.
> Let me understand this correctly - you basically want the searches to happen
> with stemmed base but want to selectively highlight the original and/or
> stemmed words. Right? If yes, then AFAIK, this is not possible. Search
> passes through your fields analyzers (tokenizers and filters). Highlighters,
> typically, use the same set of analyzers and the behavior will be the same
> as in search; this essentially means that the keywords "manage", "managing",
> "management" and "manager" are REDUCED to "manage" for searchers and
> highlighters.
> If this can be done, then the only place to enable your "feature" could be
> Lucene highlighter api's. Someone more knowledegable can tell you, if that
> is possible.

> I have no idea about your #3, though my idea of handling accentuation is to
> apply a  ISOLatin1AccentFilterFactory and get rid of them altogether :)
> I am curious to know the answer though.

__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail

Re: Re : Questions about synonyms and highlighting

Posted by Avlesh Singh <av...@gmail.com>.

>
> 4 - the same question for highlighting with lemmatisation?
> Settings for "manage" (all highlighted) ==> the two words<em>manage</em>
> and
> "<em>management</em>" are highlighted
> Settings for "manage" ==> the first word "<em>manage</em>" is highlighted
> but
> not the second  : "management"
>

There is no Lemmatisation support in Solr as of now. The only support you
get is stemming.
Let me understand this correctly - you basically want the searches to happen
with stemmed base but want to selectively highlight the original and/or
stemmed words. Right? If yes, then AFAIK, this is not possible. Search
passes through your fields analyzers (tokenizers and filters). Highlighters,
typically, use the same set of analyzers and the behavior will be the same
as in search; this essentially means that the keywords "manage", "managing",
"management" and "manager" are REDUCED to "manage" for searchers and
highlighters.
If this can be done, then the only place to enable your "feature" could be
Lucene highlighter api's. Someone more knowledegable can tell you, if that
is possible.

I have no idea about your #3, though my idea of handling accentuation is to
apply a  ISOLatin1AccentFilterFactory and get rid of them altogether :)
I am curious to know the answer though.

Cheers
Avlesh

On Wed, Oct 7, 2009 at 3:17 PM, Nourredine K. <no...@yahoo.com>wrote:

> > I'm not an expert on hit highlighting but please find some answers
> inline:
>
> Thanks Shalin for your answers. It helps a lot.
>
> I post again questions #3 and #4 for the others :)
>
>
> 3 - Is it possible and if so How can I configure solR to set or not
> highlighting
> for tokens with diacritics ?
>
>
> Settings for "vélo" (all highlighted) ==> the two words "<em>vélo</em>" and
> "<em>velo</em>" are highlighted
> Settings for "vélo" ==> the first word "<em>vélo</em>" is highlighted but
> not
> the second  : "velo"
>
>
> 4 - the same question for highlighting with lemmatisation?
>
>
> Settings for "manage" (all highlighted) ==> the two words<em>manage</em>
> and
> "<em>management</em>" are highlighted
> Settings for "manage" ==> the first word "<em>manage</em>" is highlighted
> but
> not the second  : "management"
> Regard,
>
> Nourredine.
>
>
>

Re : Questions about synonyms and highlighting

Posted by "Nourredine K." <no...@yahoo.com>.

> I'm not an expert on hit highlighting but please find some answers inline:

Thanks Shalin for your answers. It helps a lot.

I post again questions #3 and #4 for the others :)


3 - Is it possible and if so How can I configure solR to set or not highlighting
for tokens with diacritics ? 


Settings for "vélo" (all highlighted) ==> the two words "<em>vélo</em>" and
"<em>velo</em>" are highlighted
Settings for "vélo" ==> the first word "<em>vélo</em>" is highlighted but not
the second  : "velo" 


4 - the same question for highlighting with lemmatisation? 


Settings for "manage" (all highlighted) ==> the two words<em>manage</em> and
"<em>management</em>" are highlighted
Settings for "manage" ==> the first word "<em>manage</em>" is highlighted but
not the second  : "management" 
Regard,

Nourredine.

Re: Questions about synonyms and highlighting

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

I'm not an expert on hit highlighting but please find some answers inline:

On Wed, Sep 30, 2009 at 9:03 PM, Nourredine K. <no...@yahoo.com>wrote:

> Hi,
>
> Can you please give me some answers for those questions :
>
> 1 - How can I get synonyms found for  a keyword ?
>
> I mean i search "foo" and i have in my synonyms.txt file the following
> tokens : "foo, foobar, fee" (with expand = true)
> My index contains "foo" and "foobar". I want to display a message in a
> result page, on the header for example, only the 2 matched tokens and not
> "fee"  like "Results found for foo and foobar"
>
>
Whatever token is available in the index, will be matched but I don't think
it is possible to show only those synonyms which matched some documents.
Adding debugQuery=on can give you some more information like how the score
for a particular document was calculated for the given query.


> 2 - Can solR make analysis on an index to extract associations between
> tokens ?
>
> for example , if "foo" often appears with "fee" in a field, it will
> associate the 2 tokens.
>
>
Solr won't compute associations but there are ways of achieving something
similar. For example, the MoreLikeThis functionality clusters related
documents through co-occurrence of terms in a given field. Also, the
TermVectorComponent can give you position information for terms in a
document. You can use that to build your own co-occurrence associations.

If you just want to query for two words within a fixed position difference,
you can do proximity matches.

http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Proximity%20Searches

Perhaps somebody else can weigh on your question #3 and #4.

-- 
Regards,
Shalin Shekhar Mangar.