You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Way Cool <wa...@gmail.com> on 2012/02/29 07:45:49 UTC

Couple issues with edismax in 3.5

Hi, Guys,

I am having the following issues with edismax:

1. Search for 4X6 generated the following parsed query:
+DisjunctionMaxQuery((((id:4 id:x id:6)^1.2) | ((name:4 name:x
name:6)^1.025) )
while the search for "4 X 6" (with space in between)  generated the query
below: (I like this one)
+((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
+((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
+((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)

Is that really intentional? The first query is pretty weird because it will
return all of the docs with one of 4, x, 6.

Any easy way we can force "4X6" search to be the same as "4 X 6"?

2. Issue with multi words synonym because edismax separates keywords to
multiple words via the line below:
clauses = splitIntoClauses(userQuery, false);
and seems like edismax doesn't quite respect fieldType at query time, for
example, handling stopWords differently than what's specified in schema.

For example: I have the following synonym:
AAA BBB, AAABBB, AAA-BBB, CCC DDD

When I search for "AAA-BBB", it works, however search for "CCC DDD" was not
returning results containing AAABBB. What is interesting is that
admin/analysis.jsp is returning great results.


Thanks,

YH

Re: Couple issues with edismax in 3.5

Posted by William Bell <bi...@gmail.com>.
Actually the results are great with lucene. The issue is with edismax.
I did figure out the issue...

The scoring was putting different results based on distance, when I
really need the scoring to be:

score=tf(user_query,"smith") and add geodist() only if tf > 0. this is
pretty difficult to do in SOLR 3.5, but trivail in 4.0.

When are we getting tf() in 3.5 ?

Bill


On Mon, Mar 5, 2012 at 9:31 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>> I also get an issue with "." with
>> edismax.
>>
>> For example: Dr. Smith gices me different results than "dr
>> Smith"
>
> I believe this is related to analysis ( rather than query parser). You can inspect output admin/analysis.jsp.
>
> What happens when you switch to &defType=lucene ? Dr. Smith yields same results with dr Smith?



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: Couple issues with edismax in 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> I also get an issue with "." with
> edismax.
> 
> For example: Dr. Smith gices me different results than "dr
> Smith"

I believe this is related to analysis ( rather than query parser). You can inspect output admin/analysis.jsp. 

What happens when you switch to &defType=lucene ? Dr. Smith yields same results with dr Smith?

Re: Couple issues with edismax in 3.5

Posted by William Bell <bi...@gmail.com>.
I also get an issue with "." with edismax.

For example: Dr. Smith gices me different results than "dr Smith"

On Thu, Mar 1, 2012 at 10:18 PM, Way Cool <wa...@gmail.com> wrote:
> Thanks Ahmet! That's good to know someone else also tried to make  phrase
> queries to fix multi-word synonym issue. :-)
>
>
> On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> > I don't think mm will help here because it defaults to 100%
>> > already by the
>> > following code.
>>
>> Default behavior of mm has changed recently. So it is a good idea to
>> explicitly set it to 100%. Then all of the search terms must match.
>>
>> > Regarding multi-word synonym, what is the best way to handle
>> > it now? Make
>> > it as a phrase with " or adding -  in between?
>> > I don't like index time expansion because it adds lots of
>> > noises.
>>
>> Solr wiki advices to use them at index time for various reasons.
>>
>> "... The recommended approach for dealing with synonyms like this, is to
>> expand the synonym when indexing..."
>>
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>
>> However index time synonyms has its own problems as well. If you add a new
>> synonym, you need to re-index those documents that contain this  newly
>> added synonym.
>>
>> Also highlighting highlights whole phrases. For example you have :
>>    us, united states
>> Searching for states will highlight both united and stated.
>> Not sure but this seems fixed with LUCENE-3668
>>
>> I was thinking to have query expansion module to handle multi-word
>> synonyms at query time only. Either using o.a.l.search.Query manipulation
>> or String manipulation. Similar to Lukas' posting here
>> http://www.searchworkings.org/forum/-/message_boards/view_message/146097
>>
>>
>>
>>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: Couple issues with edismax in 3.5

Posted by Way Cool <wa...@gmail.com>.
Thanks Ahmet! That's good to know someone else also tried to make  phrase
queries to fix multi-word synonym issue. :-)


On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan <io...@yahoo.com> wrote:

> > I don't think mm will help here because it defaults to 100%
> > already by the
> > following code.
>
> Default behavior of mm has changed recently. So it is a good idea to
> explicitly set it to 100%. Then all of the search terms must match.
>
> > Regarding multi-word synonym, what is the best way to handle
> > it now? Make
> > it as a phrase with " or adding -  in between?
> > I don't like index time expansion because it adds lots of
> > noises.
>
> Solr wiki advices to use them at index time for various reasons.
>
> "... The recommended approach for dealing with synonyms like this, is to
> expand the synonym when indexing..."
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> However index time synonyms has its own problems as well. If you add a new
> synonym, you need to re-index those documents that contain this  newly
> added synonym.
>
> Also highlighting highlights whole phrases. For example you have :
>    us, united states
> Searching for states will highlight both united and stated.
> Not sure but this seems fixed with LUCENE-3668
>
> I was thinking to have query expansion module to handle multi-word
> synonyms at query time only. Either using o.a.l.search.Query manipulation
> or String manipulation. Similar to Lukas' posting here
> http://www.searchworkings.org/forum/-/message_boards/view_message/146097
>
>
>
>

Re: Couple issues with edismax in 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> I don't think mm will help here because it defaults to 100%
> already by the
> following code.

Default behavior of mm has changed recently. So it is a good idea to explicitly set it to 100%. Then all of the search terms must match.

> Regarding multi-word synonym, what is the best way to handle
> it now? Make
> it as a phrase with " or adding -  in between?
> I don't like index time expansion because it adds lots of
> noises.

Solr wiki advices to use them at index time for various reasons. 

"... The recommended approach for dealing with synonyms like this, is to expand the synonym when indexing..." 

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

However index time synonyms has its own problems as well. If you add a new synonym, you need to re-index those documents that contain this  newly added synonym. 

Also highlighting highlights whole phrases. For example you have :
    us, united states
Searching for states will highlight both united and stated.
Not sure but this seems fixed with LUCENE-3668

I was thinking to have query expansion module to handle multi-word synonyms at query time only. Either using o.a.l.search.Query manipulation or String manipulation. Similar to Lukas' posting here
http://www.searchworkings.org/forum/-/message_boards/view_message/146097




Re: Couple issues with edismax in 3.5

Posted by Way Cool <wa...@gmail.com>.
Thanks Ahmet for your reply.

I don't think mm will help here because it defaults to 100% already by the
following code.

 if (parsedUserQuery != null && doMinMatched) {
        String minShouldMatch = solrParams.get(DMP.MM, "100%");
        if (parsedUserQuery instanceof BooleanQuery) {
          U.setMinShouldMatch((BooleanQuery)parsedUserQuery,
minShouldMatch);
        }
      }

Regarding multi-word synonym, what is the best way to handle it now? Make
it as a phrase with " or adding -  in between?
I don't like index time expansion because it adds lots of noises.

That's good to know Analysis.jsp does not perform actual query parsing. I
was hoping edismax can do something similar to analysis tool because it
shows everything I need for multi-word synonym.

Thanks.

On Wed, Feb 29, 2012 at 1:23 AM, Ahmet Arslan <io...@yahoo.com> wrote:

> > 1. Search for 4X6 generated the following parsed query:
> > +DisjunctionMaxQuery((((id:4 id:x id:6)^1.2) | ((name:4
> > name:x
> > name:6)^1.025) )
> > while the search for "4 X 6" (with space in between)
> > generated the query
> > below: (I like this one)
> > +((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
> > +((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
> > +((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)
> >
> > Is that really intentional? The first query is pretty weird
> > because it will
> > return all of the docs with one of 4, x, 6.
>
> Minimum Should Match (mm) parameter is used to control how many search
> terms should match. For example, you can set it to &mm=100%.
>
> Also you can tweak relevancy be setting phrase fields (pf) parameter.
>
> > Any easy way we can force "4X6" search to be the same as "4
> > X 6"?
> >
> > 2. Issue with multi words synonym because edismax separates
> > keywords to
> > multiple words via the line below:
> > clauses = splitIntoClauses(userQuery, false);
> > and seems like edismax doesn't quite respect fieldType at
> > query time, for
> > example, handling stopWords differently than what's
> > specified in schema.
> >
> > For example: I have the following synonym:
> > AAA BBB, AAABBB, AAA-BBB, CCC DDD
> >
> > When I search for "AAA-BBB", it works, however search for
> > "CCC DDD" was not
> > returning results containing AAABBB. What is interesting is
> > that
> > admin/analysis.jsp is returning great results.
>
> Query string is tokenized (according to white spaces) before it reaches
> analyzer. https://issues.apache.org/jira/browse/LUCENE-2605
> That's why multi-word synonyms are not advised to use at query time.
>
> Analysis.jsp does not perform actual query parsing.
>

Re: Couple issues with edismax in 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> 1. Search for 4X6 generated the following parsed query:
> +DisjunctionMaxQuery((((id:4 id:x id:6)^1.2) | ((name:4
> name:x
> name:6)^1.025) )
> while the search for "4 X 6" (with space in between) 
> generated the query
> below: (I like this one)
> +((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
> +((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
> +((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)
> 
> Is that really intentional? The first query is pretty weird
> because it will
> return all of the docs with one of 4, x, 6.

Minimum Should Match (mm) parameter is used to control how many search terms should match. For example, you can set it to &mm=100%.

Also you can tweak relevancy be setting phrase fields (pf) parameter.

> Any easy way we can force "4X6" search to be the same as "4
> X 6"?
> 
> 2. Issue with multi words synonym because edismax separates
> keywords to
> multiple words via the line below:
> clauses = splitIntoClauses(userQuery, false);
> and seems like edismax doesn't quite respect fieldType at
> query time, for
> example, handling stopWords differently than what's
> specified in schema.
> 
> For example: I have the following synonym:
> AAA BBB, AAABBB, AAA-BBB, CCC DDD
> 
> When I search for "AAA-BBB", it works, however search for
> "CCC DDD" was not
> returning results containing AAABBB. What is interesting is
> that
> admin/analysis.jsp is returning great results.

Query string is tokenized (according to white spaces) before it reaches analyzer. https://issues.apache.org/jira/browse/LUCENE-2605
That's why multi-word synonyms are not advised to use at query time. 

Analysis.jsp does not perform actual query parsing.