You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Patrick Jungermann <pa...@googlemail.com> on 2009/10/09 01:16:03 UTC

multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Hi list,

I worked on a field type and its analyzing chain, at which I want to use
the SynonymFilter with entries similar to:

foo bar=>foo_bar

During the analysis phase, I used the /admin/analysis.jsp view to test
the analyzing results produced by the created field type. The output
shows that a query "foo bar" will first be separated by the
WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
SynonymFilter will replace the both tokens with "foo_bar". But as I
tried this at "real" query time with the request handler "standard" and
also with "dismax", the tokens "foo" and "bar" were not replaced. The
parsedQueryString was something similar to "field:foo field:bar". At
index time, it works like expected.

Has anybody experienced this and/or knows a workaround, a solution for it?


Thanks, Patrick






Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Patrick Jungermann <pa...@googlemail.com>.
Thanks Hoss,

after your hints that had partially confirmed my considerations, I had
made some tests with the FieldQParser. At the beginning, I had have some
problems, but finally, I was able to solve the problem of multi-word
synonyms at query time in a way that is suitable for us - and possibly
for others, too.

At my solution, I re-used the FieldQParserPlugin. At first, I ported it
to the new API (incrementToken instead of next, etc.) and then I
modified the code so, that no PhraseQueries will be created but only
BooleanQueries.

Now with my new QParserPlugin that based on the FieldQParserPlugin, it's
possible to search for things like "foo bar baz", where "foo bar" has to
be changed to "foo_bar" and where at the end the tokens "foo_bar" und
"baz" will be created, so that both could match independently.


Patrick



Chris Hostetter schrieb:
> : The cause of my problem should be the query parsing, but I don't know,
> : if there is any solution for it. I need a possibility that works like
> : the analysis/query parsing within /admin/analysis.jsp view.
> 
> The behavior you are describing is very well documented on the wiki...
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> 
> in general, QueryParsers parse input strigs according to their 
> parsing rules, then send each component of th input string to the 
> analyzer.  this is a fundentmal behavior, w/o it the query parser would 
> have no way of knowing when to make a phrase query, or a term query, or 
> which field to use.
> 
> You may find something like the FieldQParserPlugin helpful as it has *no* 
> markup of it's own, it just hands the string off to an analyzer based on 
> the specified field ... but it will still generate a phrase query when a 
> single piece of input generates multiple tokens with non-zero offsets from 
> eachother, which also confuses people sometimes (not sure if that's what 
> you'd want)
> 
> : >> SynonymFilter will replace the both tokens with "foo_bar". But as I
> : >> tried this at "real" query time with the request handler "standard" and
> 
> you've used the phrase '"real" query time' (in contrast to analysis.jsp) a 
> few times in this thread ... to be clear about something: there is nothing 
> different between analysis.jsp and what happens when a query is executed, 
> the reason you see different behavior is because you are pasteeing what 
> you consider a "query string" into the analysis form, but that's not what 
> happens at query time, and it's not what that form expects -- that form is 
> designed for users to paste in the strings that the query parser would 
> extract from it's query syntax.  it's not suprising that you'll get 
> something different then if you just did a straight search on the same 
> input, any different then it would be suprising if pasting 
> "fieldname:value +otherfield:value" in analysis.jsp didn't produce the 
> same tokens as a query for that string.
> 
> 
> -Hoss
> 
> From - Fri


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Chris Hostetter <ho...@fucit.org>.
: The cause of my problem should be the query parsing, but I don't know,
: if there is any solution for it. I need a possibility that works like
: the analysis/query parsing within /admin/analysis.jsp view.

The behavior you are describing is very well documented on the wiki...
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

in general, QueryParsers parse input strigs according to their 
parsing rules, then send each component of th input string to the 
analyzer.  this is a fundentmal behavior, w/o it the query parser would 
have no way of knowing when to make a phrase query, or a term query, or 
which field to use.

You may find something like the FieldQParserPlugin helpful as it has *no* 
markup of it's own, it just hands the string off to an analyzer based on 
the specified field ... but it will still generate a phrase query when a 
single piece of input generates multiple tokens with non-zero offsets from 
eachother, which also confuses people sometimes (not sure if that's what 
you'd want)

: >> SynonymFilter will replace the both tokens with "foo_bar". But as I
: >> tried this at "real" query time with the request handler "standard" and

you've used the phrase '"real" query time' (in contrast to analysis.jsp) a 
few times in this thread ... to be clear about something: there is nothing 
different between analysis.jsp and what happens when a query is executed, 
the reason you see different behavior is because you are pasteeing what 
you consider a "query string" into the analysis form, but that's not what 
happens at query time, and it's not what that form expects -- that form is 
designed for users to paste in the strings that the query parser would 
extract from it's query syntax.  it's not suprising that you'll get 
something different then if you just did a straight search on the same 
input, any different then it would be suprising if pasting 
"fieldname:value +otherfield:value" in analysis.jsp didn't produce the 
same tokens as a query for that string.


-Hoss


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Patrick Jungermann <pa...@googlemail.com>.
Hi Koji,

the problem is, that this doesn't fit all of our requirements. We have
some Solr documents that must not be matched by "foo" or "bar" but by
"foo bar" as part of the query. Also, we have some other documents that
could be matched by "foo" and "foo bar" or "bar" and "foo bar".

The best way to handle this, seems to be by using synonyms that allows
the precise configuration of this and that could be managed by an
editorial staff.

Besides, foo bar=>foo_bar works at anything (index time, analysis.jsp)
but query time.


Patrick


Koji Sekiguchi schrieb:
> Hi Patrick,
> 
> Why don't you define:
> 
> foo bar, foo_bar (and expand="true")
> 
> instead of:
> 
> foo bar=>foo_bar
> 
> in only indexing side? Doesn't it make a change for the better?
> 
> Koji
> 
> 
> Patrick Jungermann wrote:
>> Hi Koji,
>>
>> using phrase queries is no alternative for us, because all query parts
>> has to be optional parts. The phrase query workaround will work for a
>> query "foo bar", but only for this exact query. If the user queries for
>> "foo bar baz", it will be changed to "foo_bar baz", but it will not
>> match the indexed documents that only contains "foo_bar". And this is,
>> what we need here.
>>
>> The cause of my problem should be the query parsing, but I don't know,
>> if there is any solution for it. I need a possibility that works like
>> the analysis/query parsing within /admin/analysis.jsp view.
>>
>>
>> Patrick
>>
>>
>>
>> Koji Sekiguchi schrieb:
>>  
>>> Patrick,
>>>
>>>    
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>>>       
>>> I guess because you are searching q=foo bar, this causes OR query.
>>> Use q="foo bar", instead.
>>>
>>> Koji
>>>
>>>
>>> Patrick Jungermann wrote:
>>>    
>>>> Hi list,
>>>>
>>>> I worked on a field type and its analyzing chain, at which I want to
>>>> use
>>>> the SynonymFilter with entries similar to:
>>>>
>>>> foo bar=>foo_bar
>>>>
>>>> During the analysis phase, I used the /admin/analysis.jsp view to test
>>>> the analyzing results produced by the created field type. The output
>>>> shows that a query "foo bar" will first be separated by the
>>>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>>>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>>>> tried this at "real" query time with the request handler "standard" and
>>>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>>>
>>>> Has anybody experienced this and/or knows a workaround, a solution for
>>>> it?
>>>>
>>>>
>>>> Thanks, Patrick
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         
>>
>>
>>   
> 


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Patrick,

Why don't you define:

foo bar, foo_bar (and expand="true")

instead of:

foo bar=>foo_bar

in only indexing side? Doesn't it make a change for the better?

Koji


Patrick Jungermann wrote:
> Hi Koji,
>
> using phrase queries is no alternative for us, because all query parts
> has to be optional parts. The phrase query workaround will work for a
> query "foo bar", but only for this exact query. If the user queries for
> "foo bar baz", it will be changed to "foo_bar baz", but it will not
> match the indexed documents that only contains "foo_bar". And this is,
> what we need here.
>
> The cause of my problem should be the query parsing, but I don't know,
> if there is any solution for it. I need a possibility that works like
> the analysis/query parsing within /admin/analysis.jsp view.
>
>
> Patrick
>
>
>
> Koji Sekiguchi schrieb:
>   
>> Patrick,
>>
>>     
>>> parsedQueryString was something similar to "field:foo field:bar". At
>>> index time, it works like expected.
>>>       
>> I guess because you are searching q=foo bar, this causes OR query.
>> Use q="foo bar", instead.
>>
>> Koji
>>
>>
>> Patrick Jungermann wrote:
>>     
>>> Hi list,
>>>
>>> I worked on a field type and its analyzing chain, at which I want to use
>>> the SynonymFilter with entries similar to:
>>>
>>> foo bar=>foo_bar
>>>
>>> During the analysis phase, I used the /admin/analysis.jsp view to test
>>> the analyzing results produced by the created field type. The output
>>> shows that a query "foo bar" will first be separated by the
>>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>>> tried this at "real" query time with the request handler "standard" and
>>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>>> parsedQueryString was something similar to "field:foo field:bar". At
>>> index time, it works like expected.
>>>
>>> Has anybody experienced this and/or knows a workaround, a solution for
>>> it?
>>>
>>>
>>> Thanks, Patrick
>>>
>>>
>>>
>>>
>>>
>>>
>>>   
>>>       
>
>
>   


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Patrick Jungermann <pa...@googlemail.com>.
Hi Chantal,

yes, I'm using the SynonymFilter at index and query chain. Using it only
at query time or only at index time was part of former considerations,
but both don't fit all of our requirements.

But as I wrote in my first mail, it works only within the
/admin/analysis.jsp view and not at "real" query time.


Patrick


Chantal Ackermann schrieb:
> Hi Patrick,
> 
> have you added that SynonymFilter to the index chain and the query
> chain? You have to add it to both if you want to have it replaced at
> index and query time. It might also be enough to add it to the query
> chain only. Than your index still preserves the original data.
> 
> Cheers,
> Chantal
> 
> Patrick Jungermann schrieb:
>> Hi Koji,
>>
>> using phrase queries is no alternative for us, because all query parts
>> has to be optional parts. The phrase query workaround will work for a
>> query "foo bar", but only for this exact query. If the user queries for
>> "foo bar baz", it will be changed to "foo_bar baz", but it will not
>> match the indexed documents that only contains "foo_bar". And this is,
>> what we need here.
>>
>> The cause of my problem should be the query parsing, but I don't know,
>> if there is any solution for it. I need a possibility that works like
>> the analysis/query parsing within /admin/analysis.jsp view.
>>
>>
>> Patrick
>>
>>
>>
>> Koji Sekiguchi schrieb:
>>> Patrick,
>>>
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>> I guess because you are searching q=foo bar, this causes OR query.
>>> Use q="foo bar", instead.
>>>
>>> Koji
>>>
>>>
>>> Patrick Jungermann wrote:
>>>> Hi list,
>>>>
>>>> I worked on a field type and its analyzing chain, at which I want to
>>>> use
>>>> the SynonymFilter with entries similar to:
>>>>
>>>> foo bar=>foo_bar
>>>>
>>>> During the analysis phase, I used the /admin/analysis.jsp view to test
>>>> the analyzing results produced by the created field type. The output
>>>> shows that a query "foo bar" will first be separated by the
>>>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>>>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>>>> tried this at "real" query time with the request handler "standard" and
>>>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>>>> parsedQueryString was something similar to "field:foo field:bar". At
>>>> index time, it works like expected.
>>>>
>>>> Has anybody experienced this and/or knows a workaround, a solution for
>>>> it?
>>>>
>>>>
>>>> Thanks, Patrick
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
> 


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Patrick,

have you added that SynonymFilter to the index chain and the query 
chain? You have to add it to both if you want to have it replaced at 
index and query time. It might also be enough to add it to the query 
chain only. Than your index still preserves the original data.

Cheers,
Chantal

Patrick Jungermann schrieb:
> Hi Koji,
> 
> using phrase queries is no alternative for us, because all query parts
> has to be optional parts. The phrase query workaround will work for a
> query "foo bar", but only for this exact query. If the user queries for
> "foo bar baz", it will be changed to "foo_bar baz", but it will not
> match the indexed documents that only contains "foo_bar". And this is,
> what we need here.
> 
> The cause of my problem should be the query parsing, but I don't know,
> if there is any solution for it. I need a possibility that works like
> the analysis/query parsing within /admin/analysis.jsp view.
> 
> 
> Patrick
> 
> 
> 
> Koji Sekiguchi schrieb:
>> Patrick,
>>
>>> parsedQueryString was something similar to "field:foo field:bar". At
>>> index time, it works like expected.
>> I guess because you are searching q=foo bar, this causes OR query.
>> Use q="foo bar", instead.
>>
>> Koji
>>
>>
>> Patrick Jungermann wrote:
>>> Hi list,
>>>
>>> I worked on a field type and its analyzing chain, at which I want to use
>>> the SynonymFilter with entries similar to:
>>>
>>> foo bar=>foo_bar
>>>
>>> During the analysis phase, I used the /admin/analysis.jsp view to test
>>> the analyzing results produced by the created field type. The output
>>> shows that a query "foo bar" will first be separated by the
>>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>>> tried this at "real" query time with the request handler "standard" and
>>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>>> parsedQueryString was something similar to "field:foo field:bar". At
>>> index time, it works like expected.
>>>
>>> Has anybody experienced this and/or knows a workaround, a solution for
>>> it?
>>>
>>>
>>> Thanks, Patrick
>>>
>>>
>>>
>>>
>>>
>>>
>>>


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Patrick Jungermann <pa...@googlemail.com>.
Hi Koji,

using phrase queries is no alternative for us, because all query parts
has to be optional parts. The phrase query workaround will work for a
query "foo bar", but only for this exact query. If the user queries for
"foo bar baz", it will be changed to "foo_bar baz", but it will not
match the indexed documents that only contains "foo_bar". And this is,
what we need here.

The cause of my problem should be the query parsing, but I don't know,
if there is any solution for it. I need a possibility that works like
the analysis/query parsing within /admin/analysis.jsp view.


Patrick



Koji Sekiguchi schrieb:
> Patrick,
> 
>> parsedQueryString was something similar to "field:foo field:bar". At
>> index time, it works like expected.
> 
> I guess because you are searching q=foo bar, this causes OR query.
> Use q="foo bar", instead.
> 
> Koji
> 
> 
> Patrick Jungermann wrote:
>> Hi list,
>>
>> I worked on a field type and its analyzing chain, at which I want to use
>> the SynonymFilter with entries similar to:
>>
>> foo bar=>foo_bar
>>
>> During the analysis phase, I used the /admin/analysis.jsp view to test
>> the analyzing results produced by the created field type. The output
>> shows that a query "foo bar" will first be separated by the
>> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
>> SynonymFilter will replace the both tokens with "foo_bar". But as I
>> tried this at "real" query time with the request handler "standard" and
>> also with "dismax", the tokens "foo" and "bar" were not replaced. The
>> parsedQueryString was something similar to "field:foo field:bar". At
>> index time, it works like expected.
>>
>> Has anybody experienced this and/or knows a workaround, a solution for
>> it?
>>
>>
>> Thanks, Patrick
>>
>>
>>
>>
>>
>>
>>   
> 


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Patrick,

 > parsedQueryString was something similar to "field:foo field:bar". At
 > index time, it works like expected.

I guess because you are searching q=foo bar, this causes OR query.
Use q="foo bar", instead.

Koji


Patrick Jungermann wrote:
> Hi list,
>
> I worked on a field type and its analyzing chain, at which I want to use
> the SynonymFilter with entries similar to:
>
> foo bar=>foo_bar
>
> During the analysis phase, I used the /admin/analysis.jsp view to test
> the analyzing results produced by the created field type. The output
> shows that a query "foo bar" will first be separated by the
> WhitespaceTokenizer to the two tokens "foo" and "bar", and that the
> SynonymFilter will replace the both tokens with "foo_bar". But as I
> tried this at "real" query time with the request handler "standard" and
> also with "dismax", the tokens "foo" and "bar" were not replaced. The
> parsedQueryString was something similar to "field:foo field:bar". At
> index time, it works like expected.
>
> Has anybody experienced this and/or knows a workaround, a solution for it?
>
>
> Thanks, Patrick
>
>
>
>
>
>
>