You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "joe.cohen.m@gmail.com" <jo...@gmail.com> on 2012/12/12 17:26:51 UTC

Can a field with defined synonym be searched without the synonym?

Hi
I hava a field type without defined synonym.txt which retrieves both 
records with "home" and "house" when I search either one of them.

I want to be able to search this field on the specific value that I enter,
without the synonym filter.

is it possible?

thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DataDirectory: relative path doesn't work

Posted by Patrick Mi <pa...@touchpointgroup.com>.
Thanks for fixing the wiki page http://wiki.apache.org/solr/SolrConfigXml
now it says this:
'If this directory is not absolute, then it is relative to the directory
you're in when you start SOLR.'

It will be nice if you drop me a line here after you make the change on the
document ...

-----Original Message-----
From: Patrick Mi [mailto:patrick.mi@touchpointgroup.com] 
Sent: Tuesday, 26 February 2013 5:49 p.m.
To: solr-user@lucene.apache.org
Subject: DataDirectory: relative path doesn't work 

I am running Solr4.0/Tomcat 7 on Centos6

According to this page http://wiki.apache.org/solr/SolrConfigXml if
<dataDir> is not absolute, then it is relative to the instanceDir of the
SolrCore.

However the index directory is always created under the directory where I
start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore.

Am I doing something wrong in configuration?

Regards,
Patrick


DataDirectory: relative path doesn't work

Posted by Patrick Mi <pa...@touchpointgroup.com>.
I am running Solr4.0/Tomcat 7 on Centos6

According to this page http://wiki.apache.org/solr/SolrConfigXml if
<dataDir> is not absolute, then it is relative to the instanceDir of the
SolrCore.

However the index directory is always created under the directory where I
start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore.

Am I doing something wrong in configuration?

Regards,
Patrick


Re: Can a field with defined synonym be searched without the synonym?

Posted by Walter Underwood <wu...@wunderwood.org>.
I prefer fuzzy search for misspellings. Solr does a very nice job with those, weighting them by the similarity to the matched term.

wunder

On Dec 12, 2012, at 4:45 PM, Jack Krupansky wrote:

> Another great use case for synonyms is misspellings. I saw one synonym list in which the top synonym was the phrase "dead mouse" (which doesn't look misspelled at all); I won't tell you what it's "proper" synonym was, other than to say that it was VERY app/culture-dependent. It was also interesting because the user's original query phrase needed to be given a much lower weighting in order to find what the user was "likely" looking for.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Walter Underwood
> Sent: Wednesday, December 12, 2012 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the synonym?
> 
> If you have tons of content, you can do selective reindexing. You only need to reindex the docs containing the the new terms. If I add a synonym for "babysitter" and "baby sitter", then I can do a search for documents containing either of those, and only reindex those.
> 
> Reverse weighting to even out the IDF would work, but it could be pretty tweaky. If one synonym is very rare, you put in small weight, but then you index several documents with that term and the it is overweighted.
> 
> wunder
> 
> On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:
> 
>> Sure, synonyms have lots of issues and choosing index vs. query is simply picking your poison, but it all depends on your app and your data and your user expectations, and you, the developer, have tools to moderate a lot of these issues.
>> 
>> Index-time synonyms have the problem (among others) that they cannot be changed without reindexing.
>> 
>> One technique is to simulate the query-time synonym filter expansion by having your app preprocess user queries to expand to the OR of the synonyms and then boost or de-boost the synonyms as makes sense for your app.
>> 
>> For example,
>> 
>>  (tv^0.5 OR television^2.5 OR "boob tube"^0.0001)
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Steve Rowe
>> Sent: Wednesday, December 12, 2012 5:28 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Can a field with defined synonym be searched without the synonym?
>> 
>> Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate per-doc, so using it in the way I suggested will not allow for synonym IDF leveling across documents.  Also, scoring obviously includes more factors than IDF.
>> 
>> On Dec 12, 2012, at 5:18 PM, Steve Rowe <sa...@gmail.com> wrote:
>> 
>>> But couldn't the IDF problem be fixed by applying the same IDF to all synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an average, not a max.)
>>> 
>>> (E)dismax applies this query per-field, but AFAICT there is nothing stopping anybody (modulo query parser construction :) ) from using it on synonyms in the same field.
>>> 
>>> Steve
>>> 
>>> On Dec 12, 2012, at 12:50 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>>> 
>>>> Query parsers cannot fix the IDF problem or make query-time synonyms faster. Query synonym expansion makes more search terms. More search terms are more work at query time.
>>>> 
>>>> The IDF problem is real; I've run up against it. The most rare variant of the synonym have the highest score. This probably the opposite of what you want. For me, it was "TV" and "television". Documents with "TV" had higher scores than those with "television".
>>>> 
>>>> wunder
>>>> 
>>>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>>> 
>>>>> @wunder
>>>>> It is a misconception (well, supported by that wiki description) that the
>>>>> query time synonym filter have these problems. It is actually the default
>>>>> parser, that is causing these problems. Look at this if you still think
>>>>> that index time synonyms are cure for all:
>>>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>>> 
>>>>> @joe
>>>>> If you can use the flexible query parser (as linked in by @Swati) then all
>>>>> you need to do is to define a different field with a different tokenizer
>>>>> chain and then swap the field names before the analyzers processes the
>>>>> document (and then rewrite the field name back - for example, we have
>>>>> fields called "author" and "author_nosyn")
>>>>> 
>>>>> roman
>>>>> 
>>>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <wu...@wunderwood.org>wrote:
>>>>> 
>>>>>> Query time synonyms have known problems. They are slower, cause incorrect
>>>>>> IDF, and don't work for phrase synonyms.
>>>>>> 
>>>>>> Apply synonyms at index time and you will have none of those problems.
>>>>>> 
>>>>>> See:
>>>>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>>> 
>>>>>> wunder
>>>>>> 
>>>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>>> 
>>>>>>> Query-time analyzers are still applied, even if you include a string in
>>>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>>>> enclosed in quotes?
>>>>>>> 
>>>>>>> Also look at this, someone who had similar requirements:
>>>>>>> 
>>>>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>>>> synonym?
>>>>>>> 
>>>>>>> 
>>>>>>> I'm aplying only query-time synonym, so I have the original values
>>>>>> stored and indexed.
>>>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>>>> the exact match, without applying a synonym.
>>>>>>> 
>>>>>>> any way to achieve that?
>>>>>>> 
>>>>>>> 
>>>>>>> Upayavira wrote
>>>>>>>> You can only search against terms that are stored in your index. If
>>>>>>>> you have applied index time synonyms, you can't remove them at query
>>>>>> time.
>>>>>>>> 
>>>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>>>> field that doesn't use synonyms, and search against that field instead.
>>>>>>>> 
>>>>>>>> Upayavira
>>>>>>>> 
>>>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>>> 
>>>>>>>> joe.cohen.m@
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> Hi
>>>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>>> 
>>>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>>>> enter, without the synonym filter.
>>>>>>>>> 
>>>>>>>>> is it possible?
>>>>>>>>> 
>>>>>>>>> thanks.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context:
>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>> 
>>>>>> --
>>>>>> Walter Underwood
>>>>>> wunder@wunderwood.org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> --
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> 
>>>> 
>>>> 
> 
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 

--
Walter Underwood
wunder@wunderwood.org




Re: Can a field with defined synonym be searched without the synonym?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Another great use case for synonyms is misspellings. I saw one synonym list 
in which the top synonym was the phrase "dead mouse" (which doesn't look 
misspelled at all); I won't tell you what it's "proper" synonym was, other 
than to say that it was VERY app/culture-dependent. It was also interesting 
because the user's original query phrase needed to be given a much lower 
weighting in order to find what the user was "likely" looking for.

-- Jack Krupansky

-----Original Message----- 
From: Walter Underwood
Sent: Wednesday, December 12, 2012 7:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the 
synonym?

If you have tons of content, you can do selective reindexing. You only need 
to reindex the docs containing the the new terms. If I add a synonym for 
"babysitter" and "baby sitter", then I can do a search for documents 
containing either of those, and only reindex those.

Reverse weighting to even out the IDF would work, but it could be pretty 
tweaky. If one synonym is very rare, you put in small weight, but then you 
index several documents with that term and the it is overweighted.

wunder

On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:

> Sure, synonyms have lots of issues and choosing index vs. query is simply 
> picking your poison, but it all depends on your app and your data and your 
> user expectations, and you, the developer, have tools to moderate a lot of 
> these issues.
>
> Index-time synonyms have the problem (among others) that they cannot be 
> changed without reindexing.
>
> One technique is to simulate the query-time synonym filter expansion by 
> having your app preprocess user queries to expand to the OR of the 
> synonyms and then boost or de-boost the synonyms as makes sense for your 
> app.
>
> For example,
>
>   (tv^0.5 OR television^2.5 OR "boob tube"^0.0001)
>
> -- Jack Krupansky
>
> -----Original Message----- From: Steve Rowe
> Sent: Wednesday, December 12, 2012 5:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the 
> synonym?
>
> Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
> per-doc, so using it in the way I suggested will not allow for synonym IDF 
> leveling across documents.  Also, scoring obviously includes more factors 
> than IDF.
>
> On Dec 12, 2012, at 5:18 PM, Steve Rowe <sa...@gmail.com> wrote:
>
>> But couldn't the IDF problem be fixed by applying the same IDF to all 
>> synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
>> average, not a max.)
>>
>> (E)dismax applies this query per-field, but AFAICT there is nothing 
>> stopping anybody (modulo query parser construction :) ) from using it on 
>> synonyms in the same field.
>>
>> Steve
>>
>> On Dec 12, 2012, at 12:50 PM, Walter Underwood <wu...@wunderwood.org> 
>> wrote:
>>
>>> Query parsers cannot fix the IDF problem or make query-time synonyms 
>>> faster. Query synonym expansion makes more search terms. More search 
>>> terms are more work at query time.
>>>
>>> The IDF problem is real; I've run up against it. The most rare variant 
>>> of the synonym have the highest score. This probably the opposite of 
>>> what you want. For me, it was "TV" and "television". Documents with "TV" 
>>> had higher scores than those with "television".
>>>
>>> wunder
>>>
>>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>>
>>>> @wunder
>>>> It is a misconception (well, supported by that wiki description) that 
>>>> the
>>>> query time synonym filter have these problems. It is actually the 
>>>> default
>>>> parser, that is causing these problems. Look at this if you still think
>>>> that index time synonyms are cure for all:
>>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>>
>>>> @joe
>>>> If you can use the flexible query parser (as linked in by @Swati) then 
>>>> all
>>>> you need to do is to define a different field with a different 
>>>> tokenizer
>>>> chain and then swap the field names before the analyzers processes the
>>>> document (and then rewrite the field name back - for example, we have
>>>> fields called "author" and "author_nosyn")
>>>>
>>>> roman
>>>>
>>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
>>>> <wu...@wunderwood.org>wrote:
>>>>
>>>>> Query time synonyms have known problems. They are slower, cause 
>>>>> incorrect
>>>>> IDF, and don't work for phrase synonyms.
>>>>>
>>>>> Apply synonyms at index time and you will have none of those problems.
>>>>>
>>>>> See:
>>>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>>
>>>>> wunder
>>>>>
>>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>>
>>>>>> Query-time analyzers are still applied, even if you include a string 
>>>>>> in
>>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>>> enclosed in quotes?
>>>>>>
>>>>>> Also look at this, someone who had similar requirements:
>>>>>>
>>>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>>> synonym?
>>>>>>
>>>>>>
>>>>>> I'm aplying only query-time synonym, so I have the original values
>>>>> stored and indexed.
>>>>>> I would've expected that if I search a strin with quotations, i'll 
>>>>>> get
>>>>> the exact match, without applying a synonym.
>>>>>>
>>>>>> any way to achieve that?
>>>>>>
>>>>>>
>>>>>> Upayavira wrote
>>>>>>> You can only search against terms that are stored in your index. If
>>>>>>> you have applied index time synonyms, you can't remove them at query
>>>>> time.
>>>>>>>
>>>>>>> You can, however, use copyField to clone an incoming field to 
>>>>>>> another
>>>>>>> field that doesn't use synonyms, and search against that field 
>>>>>>> instead.
>>>>>>>
>>>>>>> Upayavira
>>>>>>>
>>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>>
>>>>>>> joe.cohen.m@
>>>>>>
>>>>>>> wrote:
>>>>>>>> Hi
>>>>>>>> I hava a field type without defined synonym.txt which retrieves 
>>>>>>>> both
>>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>>
>>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>>> enter, without the synonym filter.
>>>>>>>>
>>>>>>>> is it possible?
>>>>>>>>
>>>>>>>> thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>> --
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Walter Underwood
>>> wunder@wunderwood.org
>>>
>>>
>>>

--
Walter Underwood
wunder@wunderwood.org




Re: Can a field with defined synonym be searched without the synonym?

Posted by Walter Underwood <wu...@wunderwood.org>.
If you have tons of content, you can do selective reindexing. You only need to reindex the docs containing the the new terms. If I add a synonym for "babysitter" and "baby sitter", then I can do a search for documents containing either of those, and only reindex those.

Reverse weighting to even out the IDF would work, but it could be pretty tweaky. If one synonym is very rare, you put in small weight, but then you index several documents with that term and the it is overweighted. 

wunder

On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:

> Sure, synonyms have lots of issues and choosing index vs. query is simply picking your poison, but it all depends on your app and your data and your user expectations, and you, the developer, have tools to moderate a lot of these issues.
> 
> Index-time synonyms have the problem (among others) that they cannot be changed without reindexing.
> 
> One technique is to simulate the query-time synonym filter expansion by having your app preprocess user queries to expand to the OR of the synonyms and then boost or de-boost the synonyms as makes sense for your app.
> 
> For example,
> 
>   (tv^0.5 OR television^2.5 OR "boob tube"^0.0001)
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Steve Rowe
> Sent: Wednesday, December 12, 2012 5:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the synonym?
> 
> Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate per-doc, so using it in the way I suggested will not allow for synonym IDF leveling across documents.  Also, scoring obviously includes more factors than IDF.
> 
> On Dec 12, 2012, at 5:18 PM, Steve Rowe <sa...@gmail.com> wrote:
> 
>> But couldn't the IDF problem be fixed by applying the same IDF to all synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an average, not a max.)
>> 
>> (E)dismax applies this query per-field, but AFAICT there is nothing stopping anybody (modulo query parser construction :) ) from using it on synonyms in the same field.
>> 
>> Steve
>> 
>> On Dec 12, 2012, at 12:50 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>>> Query parsers cannot fix the IDF problem or make query-time synonyms faster. Query synonym expansion makes more search terms. More search terms are more work at query time.
>>> 
>>> The IDF problem is real; I've run up against it. The most rare variant of the synonym have the highest score. This probably the opposite of what you want. For me, it was "TV" and "television". Documents with "TV" had higher scores than those with "television".
>>> 
>>> wunder
>>> 
>>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>> 
>>>> @wunder
>>>> It is a misconception (well, supported by that wiki description) that the
>>>> query time synonym filter have these problems. It is actually the default
>>>> parser, that is causing these problems. Look at this if you still think
>>>> that index time synonyms are cure for all:
>>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>> 
>>>> @joe
>>>> If you can use the flexible query parser (as linked in by @Swati) then all
>>>> you need to do is to define a different field with a different tokenizer
>>>> chain and then swap the field names before the analyzers processes the
>>>> document (and then rewrite the field name back - for example, we have
>>>> fields called "author" and "author_nosyn")
>>>> 
>>>> roman
>>>> 
>>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <wu...@wunderwood.org>wrote:
>>>> 
>>>>> Query time synonyms have known problems. They are slower, cause incorrect
>>>>> IDF, and don't work for phrase synonyms.
>>>>> 
>>>>> Apply synonyms at index time and you will have none of those problems.
>>>>> 
>>>>> See:
>>>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>> 
>>>>> wunder
>>>>> 
>>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>> 
>>>>>> Query-time analyzers are still applied, even if you include a string in
>>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>>> enclosed in quotes?
>>>>>> 
>>>>>> Also look at this, someone who had similar requirements:
>>>>>> 
>>>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>>> synonym?
>>>>>> 
>>>>>> 
>>>>>> I'm aplying only query-time synonym, so I have the original values
>>>>> stored and indexed.
>>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>>> the exact match, without applying a synonym.
>>>>>> 
>>>>>> any way to achieve that?
>>>>>> 
>>>>>> 
>>>>>> Upayavira wrote
>>>>>>> You can only search against terms that are stored in your index. If
>>>>>>> you have applied index time synonyms, you can't remove them at query
>>>>> time.
>>>>>>> 
>>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>>> field that doesn't use synonyms, and search against that field instead.
>>>>>>> 
>>>>>>> Upayavira
>>>>>>> 
>>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>> 
>>>>>>> joe.cohen.m@
>>>>>> 
>>>>>>> wrote:
>>>>>>>> Hi
>>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>> 
>>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>>> enter, without the synonym filter.
>>>>>>>> 
>>>>>>>> is it possible?
>>>>>>>> 
>>>>>>>> thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>>> --
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> --
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> 
>>> 
>>> 

--
Walter Underwood
wunder@wunderwood.org




Re: Can a field with defined synonym be searched without the synonym?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Sure, synonyms have lots of issues and choosing index vs. query is simply 
picking your poison, but it all depends on your app and your data and your 
user expectations, and you, the developer, have tools to moderate a lot of 
these issues.

Index-time synonyms have the problem (among others) that they cannot be 
changed without reindexing.

One technique is to simulate the query-time synonym filter expansion by 
having your app preprocess user queries to expand to the OR of the synonyms 
and then boost or de-boost the synonyms as makes sense for your app.

For example,

    (tv^0.5 OR television^2.5 OR "boob tube"^0.0001)

-- Jack Krupansky

-----Original Message----- 
From: Steve Rowe
Sent: Wednesday, December 12, 2012 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the 
synonym?

Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
per-doc, so using it in the way I suggested will not allow for synonym IDF 
leveling across documents.  Also, scoring obviously includes more factors 
than IDF.

On Dec 12, 2012, at 5:18 PM, Steve Rowe <sa...@gmail.com> wrote:

> But couldn't the IDF problem be fixed by applying the same IDF to all 
> synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
> average, not a max.)
>
> (E)dismax applies this query per-field, but AFAICT there is nothing 
> stopping anybody (modulo query parser construction :) ) from using it on 
> synonyms in the same field.
>
> Steve
>
> On Dec 12, 2012, at 12:50 PM, Walter Underwood <wu...@wunderwood.org> 
> wrote:
>
>> Query parsers cannot fix the IDF problem or make query-time synonyms 
>> faster. Query synonym expansion makes more search terms. More search 
>> terms are more work at query time.
>>
>> The IDF problem is real; I've run up against it. The most rare variant of 
>> the synonym have the highest score. This probably the opposite of what 
>> you want. For me, it was "TV" and "television". Documents with "TV" had 
>> higher scores than those with "television".
>>
>> wunder
>>
>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>
>>> @wunder
>>> It is a misconception (well, supported by that wiki description) that 
>>> the
>>> query time synonym filter have these problems. It is actually the 
>>> default
>>> parser, that is causing these problems. Look at this if you still think
>>> that index time synonyms are cure for all:
>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>
>>> @joe
>>> If you can use the flexible query parser (as linked in by @Swati) then 
>>> all
>>> you need to do is to define a different field with a different tokenizer
>>> chain and then swap the field names before the analyzers processes the
>>> document (and then rewrite the field name back - for example, we have
>>> fields called "author" and "author_nosyn")
>>>
>>> roman
>>>
>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
>>> <wu...@wunderwood.org>wrote:
>>>
>>>> Query time synonyms have known problems. They are slower, cause 
>>>> incorrect
>>>> IDF, and don't work for phrase synonyms.
>>>>
>>>> Apply synonyms at index time and you will have none of those problems.
>>>>
>>>> See:
>>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>
>>>> wunder
>>>>
>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>
>>>>> Query-time analyzers are still applied, even if you include a string 
>>>>> in
>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>> enclosed in quotes?
>>>>>
>>>>> Also look at this, someone who had similar requirements:
>>>>>
>>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>> synonym?
>>>>>
>>>>>
>>>>> I'm aplying only query-time synonym, so I have the original values
>>>> stored and indexed.
>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>> the exact match, without applying a synonym.
>>>>>
>>>>> any way to achieve that?
>>>>>
>>>>>
>>>>> Upayavira wrote
>>>>>> You can only search against terms that are stored in your index. If
>>>>>> you have applied index time synonyms, you can't remove them at query
>>>> time.
>>>>>>
>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>> field that doesn't use synonyms, and search against that field 
>>>>>> instead.
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>
>>>>>> joe.cohen.m@
>>>>>
>>>>>> wrote:
>>>>>>> Hi
>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>
>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>> enter, without the synonym filter.
>>>>>>>
>>>>>>> is it possible?
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>> --
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
> 

Re: Can a field with defined synonym be searched without the synonym?

Posted by Steve Rowe <sa...@gmail.com>.
Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate per-doc, so using it in the way I suggested will not allow for synonym IDF leveling across documents.  Also, scoring obviously includes more factors than IDF.

On Dec 12, 2012, at 5:18 PM, Steve Rowe <sa...@gmail.com> wrote:

> But couldn't the IDF problem be fixed by applying the same IDF to all synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an average, not a max.)
> 
> (E)dismax applies this query per-field, but AFAICT there is nothing stopping anybody (modulo query parser construction :) ) from using it on synonyms in the same field.
> 
> Steve
> 
> On Dec 12, 2012, at 12:50 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
>> Query parsers cannot fix the IDF problem or make query-time synonyms faster. Query synonym expansion makes more search terms. More search terms are more work at query time.
>> 
>> The IDF problem is real; I've run up against it. The most rare variant of the synonym have the highest score. This probably the opposite of what you want. For me, it was "TV" and "television". Documents with "TV" had higher scores than those with "television". 
>> 
>> wunder
>> 
>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>> 
>>> @wunder
>>> It is a misconception (well, supported by that wiki description) that the
>>> query time synonym filter have these problems. It is actually the default
>>> parser, that is causing these problems. Look at this if you still think
>>> that index time synonyms are cure for all:
>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>> 
>>> @joe
>>> If you can use the flexible query parser (as linked in by @Swati) then all
>>> you need to do is to define a different field with a different tokenizer
>>> chain and then swap the field names before the analyzers processes the
>>> document (and then rewrite the field name back - for example, we have
>>> fields called "author" and "author_nosyn")
>>> 
>>> roman
>>> 
>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <wu...@wunderwood.org>wrote:
>>> 
>>>> Query time synonyms have known problems. They are slower, cause incorrect
>>>> IDF, and don't work for phrase synonyms.
>>>> 
>>>> Apply synonyms at index time and you will have none of those problems.
>>>> 
>>>> See:
>>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>> 
>>>> wunder
>>>> 
>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>> 
>>>>> Query-time analyzers are still applied, even if you include a string in
>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>> enclosed in quotes?
>>>>> 
>>>>> Also look at this, someone who had similar requirements:
>>>>> 
>>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>> synonym?
>>>>> 
>>>>> 
>>>>> I'm aplying only query-time synonym, so I have the original values
>>>> stored and indexed.
>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>> the exact match, without applying a synonym.
>>>>> 
>>>>> any way to achieve that?
>>>>> 
>>>>> 
>>>>> Upayavira wrote
>>>>>> You can only search against terms that are stored in your index. If
>>>>>> you have applied index time synonyms, you can't remove them at query
>>>> time.
>>>>>> 
>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>> field that doesn't use synonyms, and search against that field instead.
>>>>>> 
>>>>>> Upayavira
>>>>>> 
>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>> 
>>>>>> joe.cohen.m@
>>>>> 
>>>>>> wrote:
>>>>>>> Hi
>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>> 
>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>> enter, without the synonym filter.
>>>>>>> 
>>>>>>> is it possible?
>>>>>>> 
>>>>>>> thanks.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>>>> --
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
> 


Re: Can a field with defined synonym be searched without the synonym?

Posted by Steve Rowe <sa...@gmail.com>.
But couldn't the IDF problem be fixed by applying the same IDF to all synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an average, not a max.)

(E)dismax applies this query per-field, but AFAICT there is nothing stopping anybody (modulo query parser construction :) ) from using it on synonyms in the same field.

Steve

On Dec 12, 2012, at 12:50 PM, Walter Underwood <wu...@wunderwood.org> wrote:

> Query parsers cannot fix the IDF problem or make query-time synonyms faster. Query synonym expansion makes more search terms. More search terms are more work at query time.
> 
> The IDF problem is real; I've run up against it. The most rare variant of the synonym have the highest score. This probably the opposite of what you want. For me, it was "TV" and "television". Documents with "TV" had higher scores than those with "television". 
> 
> wunder
> 
> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
> 
>> @wunder
>> It is a misconception (well, supported by that wiki description) that the
>> query time synonym filter have these problems. It is actually the default
>> parser, that is causing these problems. Look at this if you still think
>> that index time synonyms are cure for all:
>> https://issues.apache.org/jira/browse/LUCENE-4499
>> 
>> @joe
>> If you can use the flexible query parser (as linked in by @Swati) then all
>> you need to do is to define a different field with a different tokenizer
>> chain and then swap the field names before the analyzers processes the
>> document (and then rewrite the field name back - for example, we have
>> fields called "author" and "author_nosyn")
>> 
>> roman
>> 
>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <wu...@wunderwood.org>wrote:
>> 
>>> Query time synonyms have known problems. They are slower, cause incorrect
>>> IDF, and don't work for phrase synonyms.
>>> 
>>> Apply synonyms at index time and you will have none of those problems.
>>> 
>>> See:
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>> 
>>> wunder
>>> 
>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>> 
>>>> Query-time analyzers are still applied, even if you include a string in
>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>> enclosed in quotes?
>>>> 
>>>> Also look at this, someone who had similar requirements:
>>>> 
>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Can a field with defined synonym be searched without the
>>> synonym?
>>>> 
>>>> 
>>>> I'm aplying only query-time synonym, so I have the original values
>>> stored and indexed.
>>>> I would've expected that if I search a strin with quotations, i'll get
>>> the exact match, without applying a synonym.
>>>> 
>>>> any way to achieve that?
>>>> 
>>>> 
>>>> Upayavira wrote
>>>>> You can only search against terms that are stored in your index. If
>>>>> you have applied index time synonyms, you can't remove them at query
>>> time.
>>>>> 
>>>>> You can, however, use copyField to clone an incoming field to another
>>>>> field that doesn't use synonyms, and search against that field instead.
>>>>> 
>>>>> Upayavira
>>>>> 
>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>> 
>>>>> joe.cohen.m@
>>>> 
>>>>> wrote:
>>>>>> Hi
>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>> records with "home" and "house" when I search either one of them.
>>>>>> 
>>>>>> I want to be able to search this field on the specific value that I
>>>>>> enter, without the synonym filter.
>>>>>> 
>>>>>> is it possible?
>>>>>> 
>>>>>> thanks.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>> --
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> 
>>> 
>>> 
>>> 
> 
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 


Re: Can a field with defined synonym be searched without the synonym?

Posted by Walter Underwood <wu...@wunderwood.org>.
Perhaps you could use two indexed fields, one with synonym expansion and one without.

wunder

On Dec 12, 2012, at 11:33 PM, Burgmans, Tom wrote:

> In our case it's the opposite. For our clients it is very important that every synonym gets equal chances in the relevancy calculation. The fact that "nol" scores higher than "net operating loss", simply because its document frequency is lower, is unacceptable and a reason to look for ways to disable the IDF from the score calculation. But that is in fact something I don't like to do since IDF is such an elementary part of the algorithm (and very useful for non-synonym searches).
> 
> Pre-processing synonyms to apply 'reverse weighting' is also a strategy to consider but I agree with Walter that this very error-prone, things could get easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment contain exactly the same content, so it would require different tuned synonyms dictionary for each of them...meh...
> 
> In our previous search engine (FAST ESP) we basically switched off IDF, but I am still a bit hoping that there is a more sophisticated solution with Solr.
> 
> 
> -----Original Message-----
> From: Walter Underwood [mailto:wunder@wunderwood.org]
> Sent: Thursday 13 December 2012 02:30
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the synonym?
> 
> All of the applications I've seen with user control over synonym expansion where recall-oriented. The "give me all matches for X" kind of problem. So ranking is not as important.
> 
> wunder
> 
> On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:
> 
>> Well, this IDF problem has more sides. So, let's say your synonym file
>> contains multi-token synonyms (it does, right? or perhaps you don't need
>> it? well, some people do)
>> 
>> "TV, TV set, TV foo, television"
>> 
>> if you use the default synonym expansion, when you index 'television'
>> 
>> you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
>> the same as that of 'television' - but IDF of 'foo' and 'set' has changed
>> (their frequency increased, their IDF decreased) -- TV's have in fact made
>> 'foo' term very frequent and undesirable
>> 
>> So, you might be sure that IDF of 'TV' and 'television' are the same, but
>> you are not aware it has 'screwed' other (desirable) terms - so it really
>> depends. And I wouldn't argue these cases are esoteric.
>> 
>> And finally: there are use cases out there, where people NEED to switch off
>> synonym expansion at will (find only these documents, that contain the word
>> 'TV' and not that bloody 'foo'). This cannot be done if the index contains
>> all synonym terms (unless you have a way to mark the original and the
>> synonym in the index).
>> 
>> roman
>> 
>> 
>> On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood <wu...@wunderwood.org>wrote:
>> 
>>> Query parsers cannot fix the IDF problem or make query-time synonyms
>>> faster. Query synonym expansion makes more search terms. More search terms
>>> are more work at query time.
>>> 
>>> The IDF problem is real; I've run up against it. The most rare variant of
>>> the synonym have the highest score. This probably the opposite of what you
>>> want. For me, it was "TV" and "television". Documents with "TV" had higher
>>> scores than those with "television".
>>> 
>>> wunder
>>> 
>>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>> 
>>>> @wunder
>>>> It is a misconception (well, supported by that wiki description) that the
>>>> query time synonym filter have these problems. It is actually the default
>>>> parser, that is causing these problems. Look at this if you still think
>>>> that index time synonyms are cure for all:
>>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>> 
>>>> @joe
>>>> If you can use the flexible query parser (as linked in by @Swati) then
>>> all
>>>> you need to do is to define a different field with a different tokenizer
>>>> chain and then swap the field names before the analyzers processes the
>>>> document (and then rewrite the field name back - for example, we have
>>>> fields called "author" and "author_nosyn")
>>>> 
>>>> roman
>>>> 
>>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <
>>> wunder@wunderwood.org>wrote:
>>>> 
>>>>> Query time synonyms have known problems. They are slower, cause
>>> incorrect
>>>>> IDF, and don't work for phrase synonyms.
>>>>> 
>>>>> Apply synonyms at index time and you will have none of those problems.
>>>>> 
>>>>> See:
>>>>> 
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>> 
>>>>> wunder
>>>>> 
>>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>> 
>>>>>> Query-time analyzers are still applied, even if you include a string in
>>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>>> enclosed in quotes?
>>>>>> 
>>>>>> Also look at this, someone who had similar requirements:
>>>>>> 
>>>>> 
>>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>>> synonym?
>>>>>> 
>>>>>> 
>>>>>> I'm aplying only query-time synonym, so I have the original values
>>>>> stored and indexed.
>>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>>> the exact match, without applying a synonym.
>>>>>> 
>>>>>> any way to achieve that?
>>>>>> 
>>>>>> 
>>>>>> Upayavira wrote
>>>>>>> You can only search against terms that are stored in your index. If
>>>>>>> you have applied index time synonyms, you can't remove them at query
>>>>> time.
>>>>>>> 
>>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>>> field that doesn't use synonyms, and search against that field
>>> instead.
>>>>>>> 
>>>>>>> Upayavira
>>>>>>> 
>>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>> 
>>>>>>> joe.cohen.m@
>>>>>> 
>>>>>>> wrote:
>>>>>>>> Hi
>>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>> 
>>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>>> enter, without the synonym filter.
>>>>>>>> 
>>>>>>>> is it possible?
>>>>>>>> 
>>>>>>>> thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> 
>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>> 
>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>>> --
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> --
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> 
>>> 
>>> 
>>> 
> 
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 
> 
> This email and any attachments may contain confidential or privileged information
> and is intended for the addressee only. If you are not the intended recipient, please
> immediately notify us by email or telephone and delete the original email and attachments
> without using, disseminating or reproducing its contents to anyone other than the intended
> recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete transmission of
> of this email or any attachments, nor for unauthorized use by its employees.
> 
> Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The Netherlands, and is registered
> with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.

--
Walter Underwood
wunder@wunderwood.org




RE: Can a field with defined synonym be searched without the synonym?

Posted by "Burgmans, Tom" <to...@wolterskluwer.com>.
In our case it's the opposite. For our clients it is very important that every synonym gets equal chances in the relevancy calculation. The fact that "nol" scores higher than "net operating loss", simply because its document frequency is lower, is unacceptable and a reason to look for ways to disable the IDF from the score calculation. But that is in fact something I don't like to do since IDF is such an elementary part of the algorithm (and very useful for non-synonym searches).

Pre-processing synonyms to apply 'reverse weighting' is also a strategy to consider but I agree with Walter that this very error-prone, things could get easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment contain exactly the same content, so it would require different tuned synonyms dictionary for each of them...meh...

In our previous search engine (FAST ESP) we basically switched off IDF, but I am still a bit hoping that there is a more sophisticated solution with Solr.


-----Original Message-----
From: Walter Underwood [mailto:wunder@wunderwood.org]
Sent: Thursday 13 December 2012 02:30
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the synonym?

All of the applications I've seen with user control over synonym expansion where recall-oriented. The "give me all matches for X" kind of problem. So ranking is not as important.

wunder

On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:

> Well, this IDF problem has more sides. So, let's say your synonym file
> contains multi-token synonyms (it does, right? or perhaps you don't need
> it? well, some people do)
>
> "TV, TV set, TV foo, television"
>
> if you use the default synonym expansion, when you index 'television'
>
> you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
> the same as that of 'television' - but IDF of 'foo' and 'set' has changed
> (their frequency increased, their IDF decreased) -- TV's have in fact made
> 'foo' term very frequent and undesirable
>
> So, you might be sure that IDF of 'TV' and 'television' are the same, but
> you are not aware it has 'screwed' other (desirable) terms - so it really
> depends. And I wouldn't argue these cases are esoteric.
>
> And finally: there are use cases out there, where people NEED to switch off
> synonym expansion at will (find only these documents, that contain the word
> 'TV' and not that bloody 'foo'). This cannot be done if the index contains
> all synonym terms (unless you have a way to mark the original and the
> synonym in the index).
>
> roman
>
>
> On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood <wu...@wunderwood.org>wrote:
>
>> Query parsers cannot fix the IDF problem or make query-time synonyms
>> faster. Query synonym expansion makes more search terms. More search terms
>> are more work at query time.
>>
>> The IDF problem is real; I've run up against it. The most rare variant of
>> the synonym have the highest score. This probably the opposite of what you
>> want. For me, it was "TV" and "television". Documents with "TV" had higher
>> scores than those with "television".
>>
>> wunder
>>
>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>
>>> @wunder
>>> It is a misconception (well, supported by that wiki description) that the
>>> query time synonym filter have these problems. It is actually the default
>>> parser, that is causing these problems. Look at this if you still think
>>> that index time synonyms are cure for all:
>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>
>>> @joe
>>> If you can use the flexible query parser (as linked in by @Swati) then
>> all
>>> you need to do is to define a different field with a different tokenizer
>>> chain and then swap the field names before the analyzers processes the
>>> document (and then rewrite the field name back - for example, we have
>>> fields called "author" and "author_nosyn")
>>>
>>> roman
>>>
>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <
>> wunder@wunderwood.org>wrote:
>>>
>>>> Query time synonyms have known problems. They are slower, cause
>> incorrect
>>>> IDF, and don't work for phrase synonyms.
>>>>
>>>> Apply synonyms at index time and you will have none of those problems.
>>>>
>>>> See:
>>>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>
>>>> wunder
>>>>
>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>
>>>>> Query-time analyzers are still applied, even if you include a string in
>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>> enclosed in quotes?
>>>>>
>>>>> Also look at this, someone who had similar requirements:
>>>>>
>>>>
>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>> synonym?
>>>>>
>>>>>
>>>>> I'm aplying only query-time synonym, so I have the original values
>>>> stored and indexed.
>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>> the exact match, without applying a synonym.
>>>>>
>>>>> any way to achieve that?
>>>>>
>>>>>
>>>>> Upayavira wrote
>>>>>> You can only search against terms that are stored in your index. If
>>>>>> you have applied index time synonyms, you can't remove them at query
>>>> time.
>>>>>>
>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>> field that doesn't use synonyms, and search against that field
>> instead.
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>
>>>>>> joe.cohen.m@
>>>>>
>>>>>> wrote:
>>>>>>> Hi
>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>
>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>> enter, without the synonym filter.
>>>>>>>
>>>>>>> is it possible?
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>>
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>> --
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
>>

--
Walter Underwood
wunder@wunderwood.org




This email and any attachments may contain confidential or privileged information
and is intended for the addressee only. If you are not the intended recipient, please
immediately notify us by email or telephone and delete the original email and attachments
without using, disseminating or reproducing its contents to anyone other than the intended
recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete transmission of
of this email or any attachments, nor for unauthorized use by its employees.

Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The Netherlands, and is registered
with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.

Re: Can a field with defined synonym be searched without the synonym?

Posted by Walter Underwood <wu...@wunderwood.org>.
All of the applications I've seen with user control over synonym expansion where recall-oriented. The "give me all matches for X" kind of problem. So ranking is not as important.

wunder

On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:

> Well, this IDF problem has more sides. So, let's say your synonym file
> contains multi-token synonyms (it does, right? or perhaps you don't need
> it? well, some people do)
> 
> "TV, TV set, TV foo, television"
> 
> if you use the default synonym expansion, when you index 'television'
> 
> you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
> the same as that of 'television' - but IDF of 'foo' and 'set' has changed
> (their frequency increased, their IDF decreased) -- TV's have in fact made
> 'foo' term very frequent and undesirable
> 
> So, you might be sure that IDF of 'TV' and 'television' are the same, but
> you are not aware it has 'screwed' other (desirable) terms - so it really
> depends. And I wouldn't argue these cases are esoteric.
> 
> And finally: there are use cases out there, where people NEED to switch off
> synonym expansion at will (find only these documents, that contain the word
> 'TV' and not that bloody 'foo'). This cannot be done if the index contains
> all synonym terms (unless you have a way to mark the original and the
> synonym in the index).
> 
> roman
> 
> 
> On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood <wu...@wunderwood.org>wrote:
> 
>> Query parsers cannot fix the IDF problem or make query-time synonyms
>> faster. Query synonym expansion makes more search terms. More search terms
>> are more work at query time.
>> 
>> The IDF problem is real; I've run up against it. The most rare variant of
>> the synonym have the highest score. This probably the opposite of what you
>> want. For me, it was "TV" and "television". Documents with "TV" had higher
>> scores than those with "television".
>> 
>> wunder
>> 
>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>> 
>>> @wunder
>>> It is a misconception (well, supported by that wiki description) that the
>>> query time synonym filter have these problems. It is actually the default
>>> parser, that is causing these problems. Look at this if you still think
>>> that index time synonyms are cure for all:
>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>> 
>>> @joe
>>> If you can use the flexible query parser (as linked in by @Swati) then
>> all
>>> you need to do is to define a different field with a different tokenizer
>>> chain and then swap the field names before the analyzers processes the
>>> document (and then rewrite the field name back - for example, we have
>>> fields called "author" and "author_nosyn")
>>> 
>>> roman
>>> 
>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <
>> wunder@wunderwood.org>wrote:
>>> 
>>>> Query time synonyms have known problems. They are slower, cause
>> incorrect
>>>> IDF, and don't work for phrase synonyms.
>>>> 
>>>> Apply synonyms at index time and you will have none of those problems.
>>>> 
>>>> See:
>>>> 
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>> 
>>>> wunder
>>>> 
>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>> 
>>>>> Query-time analyzers are still applied, even if you include a string in
>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>> enclosed in quotes?
>>>>> 
>>>>> Also look at this, someone who had similar requirements:
>>>>> 
>>>> 
>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>> synonym?
>>>>> 
>>>>> 
>>>>> I'm aplying only query-time synonym, so I have the original values
>>>> stored and indexed.
>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>> the exact match, without applying a synonym.
>>>>> 
>>>>> any way to achieve that?
>>>>> 
>>>>> 
>>>>> Upayavira wrote
>>>>>> You can only search against terms that are stored in your index. If
>>>>>> you have applied index time synonyms, you can't remove them at query
>>>> time.
>>>>>> 
>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>> field that doesn't use synonyms, and search against that field
>> instead.
>>>>>> 
>>>>>> Upayavira
>>>>>> 
>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>> 
>>>>>> joe.cohen.m@
>>>>> 
>>>>>> wrote:
>>>>>>> Hi
>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>> 
>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>> enter, without the synonym filter.
>>>>>>> 
>>>>>>> is it possible?
>>>>>>> 
>>>>>>> thanks.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> 
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>> 
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>>>> --
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wunder@wunderwood.org




Re: Can a field with defined synonym be searched without the synonym?

Posted by Roman Chyla <ro...@gmail.com>.
Well, this IDF problem has more sides. So, let's say your synonym file
contains multi-token synonyms (it does, right? or perhaps you don't need
it? well, some people do)

"TV, TV set, TV foo, television"

if you use the default synonym expansion, when you index 'television'

you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
the same as that of 'television' - but IDF of 'foo' and 'set' has changed
(their frequency increased, their IDF decreased) -- TV's have in fact made
'foo' term very frequent and undesirable

So, you might be sure that IDF of 'TV' and 'television' are the same, but
you are not aware it has 'screwed' other (desirable) terms - so it really
depends. And I wouldn't argue these cases are esoteric.

And finally: there are use cases out there, where people NEED to switch off
synonym expansion at will (find only these documents, that contain the word
'TV' and not that bloody 'foo'). This cannot be done if the index contains
all synonym terms (unless you have a way to mark the original and the
synonym in the index).

roman


On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood <wu...@wunderwood.org>wrote:

> Query parsers cannot fix the IDF problem or make query-time synonyms
> faster. Query synonym expansion makes more search terms. More search terms
> are more work at query time.
>
> The IDF problem is real; I've run up against it. The most rare variant of
> the synonym have the highest score. This probably the opposite of what you
> want. For me, it was "TV" and "television". Documents with "TV" had higher
> scores than those with "television".
>
> wunder
>
> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>
> > @wunder
> > It is a misconception (well, supported by that wiki description) that the
> > query time synonym filter have these problems. It is actually the default
> > parser, that is causing these problems. Look at this if you still think
> > that index time synonyms are cure for all:
> > https://issues.apache.org/jira/browse/LUCENE-4499
> >
> > @joe
> > If you can use the flexible query parser (as linked in by @Swati) then
> all
> > you need to do is to define a different field with a different tokenizer
> > chain and then swap the field names before the analyzers processes the
> > document (and then rewrite the field name back - for example, we have
> > fields called "author" and "author_nosyn")
> >
> > roman
> >
> > On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <
> wunder@wunderwood.org>wrote:
> >
> >> Query time synonyms have known problems. They are slower, cause
> incorrect
> >> IDF, and don't work for phrase synonyms.
> >>
> >> Apply synonyms at index time and you will have none of those problems.
> >>
> >> See:
> >>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> >>
> >> wunder
> >>
> >> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
> >>
> >>> Query-time analyzers are still applied, even if you include a string in
> >> quotes. Would you expect "foo" to not match "Foo" just because it's
> >> enclosed in quotes?
> >>>
> >>> Also look at this, someone who had similar requirements:
> >>>
> >>
> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
> >>> Sent: Wednesday, December 12, 2012 12:09 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Can a field with defined synonym be searched without the
> >> synonym?
> >>>
> >>>
> >>> I'm aplying only query-time synonym, so I have the original values
> >> stored and indexed.
> >>> I would've expected that if I search a strin with quotations, i'll get
> >> the exact match, without applying a synonym.
> >>>
> >>> any way to achieve that?
> >>>
> >>>
> >>> Upayavira wrote
> >>>> You can only search against terms that are stored in your index. If
> >>>> you have applied index time synonyms, you can't remove them at query
> >> time.
> >>>>
> >>>> You can, however, use copyField to clone an incoming field to another
> >>>> field that doesn't use synonyms, and search against that field
> instead.
> >>>>
> >>>> Upayavira
> >>>>
> >>>> On Wed, Dec 12, 2012, at 04:26 PM,
> >>>
> >>>> joe.cohen.m@
> >>>
> >>>> wrote:
> >>>>> Hi
> >>>>> I hava a field type without defined synonym.txt which retrieves both
> >>>>> records with "home" and "house" when I search either one of them.
> >>>>>
> >>>>> I want to be able to search this field on the specific value that I
> >>>>> enter, without the synonym filter.
> >>>>>
> >>>>> is it possible?
> >>>>>
> >>>>> thanks.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> View this message in context:
> >>>>>
> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
> >>>>> e-searched-without-the-synonym-tp4026381.html
> >>>>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >> --
> >> Walter Underwood
> >> wunder@wunderwood.org
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>

Re: Can a field with defined synonym be searched without the synonym?

Posted by Walter Underwood <wu...@wunderwood.org>.
Query parsers cannot fix the IDF problem or make query-time synonyms faster. Query synonym expansion makes more search terms. More search terms are more work at query time.

The IDF problem is real; I've run up against it. The most rare variant of the synonym have the highest score. This probably the opposite of what you want. For me, it was "TV" and "television". Documents with "TV" had higher scores than those with "television". 

wunder

On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:

> @wunder
> It is a misconception (well, supported by that wiki description) that the
> query time synonym filter have these problems. It is actually the default
> parser, that is causing these problems. Look at this if you still think
> that index time synonyms are cure for all:
> https://issues.apache.org/jira/browse/LUCENE-4499
> 
> @joe
> If you can use the flexible query parser (as linked in by @Swati) then all
> you need to do is to define a different field with a different tokenizer
> chain and then swap the field names before the analyzers processes the
> document (and then rewrite the field name back - for example, we have
> fields called "author" and "author_nosyn")
> 
> roman
> 
> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <wu...@wunderwood.org>wrote:
> 
>> Query time synonyms have known problems. They are slower, cause incorrect
>> IDF, and don't work for phrase synonyms.
>> 
>> Apply synonyms at index time and you will have none of those problems.
>> 
>> See:
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>> 
>> wunder
>> 
>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>> 
>>> Query-time analyzers are still applied, even if you include a string in
>> quotes. Would you expect "foo" to not match "Foo" just because it's
>> enclosed in quotes?
>>> 
>>> Also look at this, someone who had similar requirements:
>>> 
>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>> 
>>> 
>>> -----Original Message-----
>>> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Can a field with defined synonym be searched without the
>> synonym?
>>> 
>>> 
>>> I'm aplying only query-time synonym, so I have the original values
>> stored and indexed.
>>> I would've expected that if I search a strin with quotations, i'll get
>> the exact match, without applying a synonym.
>>> 
>>> any way to achieve that?
>>> 
>>> 
>>> Upayavira wrote
>>>> You can only search against terms that are stored in your index. If
>>>> you have applied index time synonyms, you can't remove them at query
>> time.
>>>> 
>>>> You can, however, use copyField to clone an incoming field to another
>>>> field that doesn't use synonyms, and search against that field instead.
>>>> 
>>>> Upayavira
>>>> 
>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>> 
>>>> joe.cohen.m@
>>> 
>>>> wrote:
>>>>> Hi
>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>> records with "home" and "house" when I search either one of them.
>>>>> 
>>>>> I want to be able to search this field on the specific value that I
>>>>> enter, without the synonym filter.
>>>>> 
>>>>> is it possible?
>>>>> 
>>>>> thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wunder@wunderwood.org




Re: Can a field with defined synonym be searched without the synonym?

Posted by Roman Chyla <ro...@gmail.com>.
@wunder
It is a misconception (well, supported by that wiki description) that the
query time synonym filter have these problems. It is actually the default
parser, that is causing these problems. Look at this if you still think
that index time synonyms are cure for all:
https://issues.apache.org/jira/browse/LUCENE-4499

@joe
If you can use the flexible query parser (as linked in by @Swati) then all
you need to do is to define a different field with a different tokenizer
chain and then swap the field names before the analyzers processes the
document (and then rewrite the field name back - for example, we have
fields called "author" and "author_nosyn")

roman

On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <wu...@wunderwood.org>wrote:

> Query time synonyms have known problems. They are slower, cause incorrect
> IDF, and don't work for phrase synonyms.
>
> Apply synonyms at index time and you will have none of those problems.
>
> See:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> wunder
>
> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>
> > Query-time analyzers are still applied, even if you include a string in
> quotes. Would you expect "foo" to not match "Foo" just because it's
> enclosed in quotes?
> >
> > Also look at this, someone who had similar requirements:
> >
> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
> >
> >
> > -----Original Message-----
> > From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com]
> > Sent: Wednesday, December 12, 2012 12:09 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can a field with defined synonym be searched without the
> synonym?
> >
> >
> > I'm aplying only query-time synonym, so I have the original values
> stored and indexed.
> > I would've expected that if I search a strin with quotations, i'll get
> the exact match, without applying a synonym.
> >
> > any way to achieve that?
> >
> >
> > Upayavira wrote
> >> You can only search against terms that are stored in your index. If
> >> you have applied index time synonyms, you can't remove them at query
> time.
> >>
> >> You can, however, use copyField to clone an incoming field to another
> >> field that doesn't use synonyms, and search against that field instead.
> >>
> >> Upayavira
> >>
> >> On Wed, Dec 12, 2012, at 04:26 PM,
> >
> >> joe.cohen.m@
> >
> >> wrote:
> >>> Hi
> >>> I hava a field type without defined synonym.txt which retrieves both
> >>> records with "home" and "house" when I search either one of them.
> >>>
> >>> I want to be able to search this field on the specific value that I
> >>> enter, without the synonym filter.
> >>>
> >>> is it possible?
> >>>
> >>> thanks.
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
> >>> e-searched-without-the-synonym-tp4026381.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>

Re: Can a field with defined synonym be searched without the synonym?

Posted by Walter Underwood <wu...@wunderwood.org>.
Query time synonyms have known problems. They are slower, cause incorrect IDF, and don't work for phrase synonyms.

Apply synonyms at index time and you will have none of those problems.

See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

wunder

On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:

> Query-time analyzers are still applied, even if you include a string in quotes. Would you expect "foo" to not match "Foo" just because it's enclosed in quotes?
> 
> Also look at this, someone who had similar requirements:
> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
> 
> 
> -----Original Message-----
> From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com] 
> Sent: Wednesday, December 12, 2012 12:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can a field with defined synonym be searched without the synonym?
> 
> 
> I'm aplying only query-time synonym, so I have the original values stored and indexed.
> I would've expected that if I search a strin with quotations, i'll get the exact match, without applying a synonym.
> 
> any way to achieve that?
> 
> 
> Upayavira wrote
>> You can only search against terms that are stored in your index. If 
>> you have applied index time synonyms, you can't remove them at query time.
>> 
>> You can, however, use copyField to clone an incoming field to another 
>> field that doesn't use synonyms, and search against that field instead.
>> 
>> Upayavira
>> 
>> On Wed, Dec 12, 2012, at 04:26 PM,
> 
>> joe.cohen.m@
> 
>> wrote:
>>> Hi
>>> I hava a field type without defined synonym.txt which retrieves both 
>>> records with "home" and "house" when I search either one of them.
>>> 
>>> I want to be able to search this field on the specific value that I 
>>> enter, without the synonym filter.
>>> 
>>> is it possible?
>>> 
>>> thanks.
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>> e-searched-without-the-synonym-tp4026381.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wunder@wunderwood.org




RE: Can a field with defined synonym be searched without the synonym?

Posted by Swati Swoboda <ss...@igloosoftware.com>.
Query-time analyzers are still applied, even if you include a string in quotes. Would you expect "foo" to not match "Foo" just because it's enclosed in quotes?

Also look at this, someone who had similar requirements:
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html


-----Original Message-----
From: joe.cohen.m@gmail.com [mailto:joe.cohen.m@gmail.com] 
Sent: Wednesday, December 12, 2012 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the synonym?


I'm aplying only query-time synonym, so I have the original values stored and indexed.
I would've expected that if I search a strin with quotations, i'll get the exact match, without applying a synonym.

any way to achieve that?


Upayavira wrote
> You can only search against terms that are stored in your index. If 
> you have applied index time synonyms, you can't remove them at query time.
> 
> You can, however, use copyField to clone an incoming field to another 
> field that doesn't use synonyms, and search against that field instead.
> 
> Upayavira
> 
> On Wed, Dec 12, 2012, at 04:26 PM,

> joe.cohen.m@

>  wrote:
>> Hi
>> I hava a field type without defined synonym.txt which retrieves both 
>> records with "home" and "house" when I search either one of them.
>> 
>> I want to be able to search this field on the specific value that I 
>> enter, without the synonym filter.
>> 
>> is it possible?
>> 
>> thanks.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>> e-searched-without-the-synonym-tp4026381.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can a field with defined synonym be searched without the synonym?

Posted by "joe.cohen.m@gmail.com" <jo...@gmail.com>.
I'm aplying only query-time synonym, so I have the original values stored
and indexed.
I would've expected that if I search a strin with quotations, i'll get the
exact match, without applying a synonym.

any way to achieve that?


Upayavira wrote
> You can only search against terms that are stored in your index. If you
> have applied index time synonyms, you can't remove them at query time.
> 
> You can, however, use copyField to clone an incoming field to another
> field that doesn't use synonyms, and search against that field instead.
> 
> Upayavira
> 
> On Wed, Dec 12, 2012, at 04:26 PM, 

> joe.cohen.m@

>  wrote:
>> Hi
>> I hava a field type without defined synonym.txt which retrieves both 
>> records with "home" and "house" when I search either one of them.
>> 
>> I want to be able to search this field on the specific value that I
>> enter,
>> without the synonym filter.
>> 
>> is it possible?
>> 
>> thanks.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can a field with defined synonym be searched without the synonym?

Posted by Upayavira <uv...@odoko.co.uk>.
You can only search against terms that are stored in your index. If you
have applied index time synonyms, you can't remove them at query time.

You can, however, use copyField to clone an incoming field to another
field that doesn't use synonyms, and search against that field instead.

Upayavira

On Wed, Dec 12, 2012, at 04:26 PM, joe.cohen.m@gmail.com wrote:
> Hi
> I hava a field type without defined synonym.txt which retrieves both 
> records with "home" and "house" when I search either one of them.
> 
> I want to be able to search this field on the specific value that I
> enter,
> without the synonym filter.
> 
> is it possible?
> 
> thanks.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381.html
> Sent from the Solr - User mailing list archive at Nabble.com.