You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by abhayd <aj...@hotmail.com> on 2012/04/30 08:13:42 UTC
solr.WordDelimiterFilterFactory query time
hi
I am using solr.WordDelimiterFilterFactory for a text_en field during query
time.
my title for document is: blackberry torch 9810
My query : torch9810 works fine
It splits alpha numeric and gets me the document.
But when query is:blackberry9810 it splits to blackberry 9810 but I dont get
the document I mentioned above.
If i change query to blackberry 9810 (two words) i get the document.
Can anyone explain what I m doing wrong? When i query blackberry9810 i would
like to get the same results as blackberry 9810
thanks
abhay
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by Jack Krupansky <ja...@basetechnology.com>.
When WDF filters blackberry9810 it will treat it as a sequence of tokens but
as if it were a phrase, like "blackberry 9810", with the two terms adjacent,
at least with the edismax query parser. I'm not sure what the other query
parsers do.
If you are using edismax, you can set the QS (query slop) request parameter
to 1 (rather than 0), so that blackberry9810 will be treated as "blackberry
9810"~1 which means that an additional term can (optionally) be matched by
the phrase query.
In other words, "blackberry9810"~1 would match blackberry torch 9810 as well
as blackberry 9810.
-- Jack Krupansky
-----Original Message-----
From: abhayd
Sent: Monday, April 30, 2012 2:13 AM
To: solr-user@lucene.apache.org
Subject: solr.WordDelimiterFilterFactory query time
hi
I am using solr.WordDelimiterFilterFactory for a text_en field during query
time.
my title for document is: blackberry torch 9810
My query : torch9810 works fine
It splits alpha numeric and gets me the document.
But when query is:blackberry9810 it splits to blackberry 9810 but I dont get
the document I mentioned above.
If i change query to blackberry 9810 (two words) i get the document.
Can anyone explain what I m doing wrong? When i query blackberry9810 i would
like to get the same results as blackberry 9810
thanks
abhay
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by abhayd <aj...@hotmail.com>.
hi jack & erick,
Thanks
I do have qs set in solrconfig for query handler dismax settings.
<str name="qs">10</str>
Still does not work
abhay
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951038.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by abhayd <aj...@hotmail.com>.
hi jack,
tried &qs=10 but unfortunately it does not seem to help.
Not sure what else could be wrong
abhay
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951083.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by abhayd <aj...@hotmail.com>.
hi jack,
It worked with dismax. I was using a our search partner provided wrapper
around dismax and it seems like it has a bug.
I switched to dismax and all is working fine now.
Thanks for help
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3980123.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by Jack Krupansky <ja...@basetechnology.com>.
Great. But could you tell us all what settings you had wrong and how you
changed them so that somebody else with the problem searching the email
archive will be able to see your solution? Thanks.
-- Jack Krupansky
-----Original Message-----
From: abhayd
Sent: Monday, April 30, 2012 4:51 PM
To: solr-user@lucene.apache.org
Subject: Re: solr.WordDelimiterFilterFactory query time
hi jack,
thanks, i figured out the issue. It was settings during query and index time
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951811.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by abhayd <aj...@hotmail.com>.
hi jack,
thanks, i figured out the issue. It was settings during query and index time
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951811.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by Jack Krupansky <ja...@basetechnology.com>.
Just to be clear, I used the Solr example schema and indexed two test
documents, one with "Blackberry 9810" and one with "Blackberry torch 9810"
in the sku field (which uses field type text_en_splitting_tight which uses
WDF) and the following query returns both documents:
http://localhost:8983/solr/select/?q=blackberry9810&debug=true&qs=1&qf=sku&defType=dismax
You might try the same and at least verify that that is working for you.
With &qs=0 it returns only one document.
Assuming that the example works fine for you, that suggests that there is
something else going on with your analyzer. Maybe there might be a
difference between the WDF settings for the index and query analyzers. You
should attach your schema and config as well as the &debug output.
-- Jack Krupansky
-----Original Message-----
From: abhayd
Sent: Monday, April 30, 2012 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: solr.WordDelimiterFilterFactory query time
hi jack,
tried &qs=10 but unfortunately it does not seem to help.
Not sure what else could be wrong
abhay
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951082.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by abhayd <aj...@hotmail.com>.
hi jack,
tried &qs=10 but unfortunately it does not seem to help.
Not sure what else could be wrong
abhay
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3951082.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by Jack Krupansky <ja...@basetechnology.com>.
The &qs=1 request parameter should work for the dismax query parser as well
as edismax.
-- Jack Krupansky
-----Original Message-----
From: Erick Erickson
Sent: Monday, April 30, 2012 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: solr.WordDelimiterFilterFactory query time
See Jack's comments about phrases, all your parsed
queries are phrases, and your indexed terms aren't
next to each other.
Best
Erick
On Mon, Apr 30, 2012 at 10:54 AM, abhayd <aj...@hotmail.com> wrote:
> hi Erick,
> autoGeneratePhraseQueries="false" is set for field type. And it works fine
> for standard query parser.
>
> Problem seem to be when i start using dismax. As u suggested i checked
> analysis tool and even after word delimiter is applied i see search term
> as
> "blackberry 9801" so i dont think it stemmer.
>
> here is debug out put (partial only )
> -----------------------------------
>
> <lst name="debug">
>
> <lst name="queryBoosting">
> <str name="q">blackberry9801</str>
> <null name="match"/>
> </lst>
> <str name="rawquerystring">blackberry9801</str>
> <str name="querystring">blackberry9801</str>
>
> <str name="parsedquery">
> DisjunctionMaxQuery((click_terms:"blackberry 9801"^5.0 |
> description:"blackberry 9801"^3.0 | displayName:"blackberry 9801"^15.0 |
> displayNameEscaped:"blackberry 9801"^15.0 | manufacturer:"blackberry
> 9801"^10.0 | text_all:"blackberry 9801" | title:"blackberry
> 9801"^5.0)~0.01)
> </str>
>
> <str name="parsedquery_toString">
> (click_terms:"blackberry 9801"^5.0 | description:"blackberry 9801"^3.0 |
> displayName:"blackberry 9801"^15.0 | displayNameEscaped:"blackberry
> 9801"^15.0 | manufacturer:"blackberry 9801"^10.0 | text_all:"blackberry
> 9801" | title:"blackberry 9801"^5.0)~0.01
> </str>
> -----------------------
>
> field definition
> ----------------
> <fieldType class="solr.TextField" name="text_en"
> positionIncrementGap="100" autoGeneratePhraseQueries="false">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter catenateAll="0" catenateNumbers="1" catenateWords="1"
> class="solr.WordDelimiterFilterFactory" generateNumberParts="1"
> generateWordParts="1" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
> <filter catenateAll="0" catenateNumbers="0" catenateWords="0"
> class="solr.WordDelimiterFilterFactory" generateNumberParts="1"
> generateWordParts="1" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> ----------------------------------------------
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3950922.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by Erick Erickson <er...@gmail.com>.
See Jack's comments about phrases, all your parsed
queries are phrases, and your indexed terms aren't
next to each other.
Best
Erick
On Mon, Apr 30, 2012 at 10:54 AM, abhayd <aj...@hotmail.com> wrote:
> hi Erick,
> autoGeneratePhraseQueries="false" is set for field type. And it works fine
> for standard query parser.
>
> Problem seem to be when i start using dismax. As u suggested i checked
> analysis tool and even after word delimiter is applied i see search term as
> "blackberry 9801" so i dont think it stemmer.
>
> here is debug out put (partial only )
> -----------------------------------
>
> <lst name="debug">
>
> <lst name="queryBoosting">
> <str name="q">blackberry9801</str>
> <null name="match"/>
> </lst>
> <str name="rawquerystring">blackberry9801</str>
> <str name="querystring">blackberry9801</str>
>
> <str name="parsedquery">
> DisjunctionMaxQuery((click_terms:"blackberry 9801"^5.0 |
> description:"blackberry 9801"^3.0 | displayName:"blackberry 9801"^15.0 |
> displayNameEscaped:"blackberry 9801"^15.0 | manufacturer:"blackberry
> 9801"^10.0 | text_all:"blackberry 9801" | title:"blackberry 9801"^5.0)~0.01)
> </str>
>
> <str name="parsedquery_toString">
> (click_terms:"blackberry 9801"^5.0 | description:"blackberry 9801"^3.0 |
> displayName:"blackberry 9801"^15.0 | displayNameEscaped:"blackberry
> 9801"^15.0 | manufacturer:"blackberry 9801"^10.0 | text_all:"blackberry
> 9801" | title:"blackberry 9801"^5.0)~0.01
> </str>
> -----------------------
>
> field definition
> ----------------
> <fieldType class="solr.TextField" name="text_en"
> positionIncrementGap="100" autoGeneratePhraseQueries="false">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter catenateAll="0" catenateNumbers="1" catenateWords="1"
> class="solr.WordDelimiterFilterFactory" generateNumberParts="1"
> generateWordParts="1" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
> <filter catenateAll="0" catenateNumbers="0" catenateWords="0"
> class="solr.WordDelimiterFilterFactory" generateNumberParts="1"
> generateWordParts="1" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> ----------------------------------------------
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3950922.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by abhayd <aj...@hotmail.com>.
hi Erick,
autoGeneratePhraseQueries="false" is set for field type. And it works fine
for standard query parser.
Problem seem to be when i start using dismax. As u suggested i checked
analysis tool and even after word delimiter is applied i see search term as
"blackberry 9801" so i dont think it stemmer.
here is debug out put (partial only )
-----------------------------------
<lst name="debug">
<lst name="queryBoosting">
<str name="q">blackberry9801</str>
<null name="match"/>
</lst>
<str name="rawquerystring">blackberry9801</str>
<str name="querystring">blackberry9801</str>
<str name="parsedquery">
DisjunctionMaxQuery((click_terms:"blackberry 9801"^5.0 |
description:"blackberry 9801"^3.0 | displayName:"blackberry 9801"^15.0 |
displayNameEscaped:"blackberry 9801"^15.0 | manufacturer:"blackberry
9801"^10.0 | text_all:"blackberry 9801" | title:"blackberry 9801"^5.0)~0.01)
</str>
<str name="parsedquery_toString">
(click_terms:"blackberry 9801"^5.0 | description:"blackberry 9801"^3.0 |
displayName:"blackberry 9801"^15.0 | displayNameEscaped:"blackberry
9801"^15.0 | manufacturer:"blackberry 9801"^10.0 | text_all:"blackberry
9801" | title:"blackberry 9801"^5.0)~0.01
</str>
-----------------------
field definition
----------------
<fieldType class="solr.TextField" name="text_en"
positionIncrementGap="100" autoGeneratePhraseQueries="false">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter catenateAll="0" catenateNumbers="1" catenateWords="1"
class="solr.WordDelimiterFilterFactory" generateNumberParts="1"
generateWordParts="1" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
<filter catenateAll="0" catenateNumbers="0" catenateWords="0"
class="solr.WordDelimiterFilterFactory" generateNumberParts="1"
generateWordParts="1" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
----------------------------------------------
--
View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045p3950922.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.WordDelimiterFilterFactory query time
Posted by Erick Erickson <er...@gmail.com>.
Try attaching &debugQuery=on to your query and seeing if that helps
you understand what's going on. If that doesn't help, also look at
admin/analysis. If all that doesn't help, post your schema definition
for the field type and the results of &debugQuery=on (you might
look at: http://wiki.apache.org/solr/UsingMailingLists).
But my first guess is that you have the stemmer filter in front of
your WDDF filter, so your input is stemmed to something like
blackberri at index time, but if our stemmer is after WDDF, at
query time you search for blackberry.
Or you have phrases enabled and it's looking for balckberry right
next to 9810.
But those are guesses..
Best
Erick
On Mon, Apr 30, 2012 at 2:13 AM, abhayd <aj...@hotmail.com> wrote:
> hi
>
> I am using solr.WordDelimiterFilterFactory for a text_en field during query
> time.
>
> my title for document is: blackberry torch 9810
> My query : torch9810 works fine
> It splits alpha numeric and gets me the document.
>
> But when query is:blackberry9810 it splits to blackberry 9810 but I dont get
> the document I mentioned above.
> If i change query to blackberry 9810 (two words) i get the document.
>
> Can anyone explain what I m doing wrong? When i query blackberry9810 i would
> like to get the same results as blackberry 9810
>
> thanks
> abhay
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/solr-WordDelimiterFilterFactory-query-time-tp3950045.html
> Sent from the Solr - User mailing list archive at Nabble.com.