You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by alexw <aw...@crossview.com> on 2011/04/08 18:02:50 UTC

Special characters during indexing and searching

Hi,

I have a field named "productName" in my schema which uses the standard
"text" field type. And one of my product name is "star/bit". When I search
for "star/bit" (without quotes) using the dismax request hander, NO results
was found.

After some research, looks like during indexing, "star/bit" was tokenized
into "star", "bit" and "starbit" by the WordDelimiterFilterFactory. And at
search time, "star/bit" (without quotes) was turned into a "star bit" phrase
query, therefore no matches since there is no "star bit" phrase in the
index.

My question is: 

1. Am I understanding it correctly? If yes, how can I make sure a match can
be found? If not, what's really happening?

2. Why does dismax turn "star/bit" into a "star bit" phrase search instead
of searching for "star" or "bit" or "starbit"?

Thanks for your help!


Alex


--
View this message in context: http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2795914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Special characters during indexing and searching

Posted by alexw <aw...@crossview.com>.
Sorry wrong link to the thread, here is the correct one:

http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-td2795914.html

--
View this message in context: http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2797158.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Special characters during indexing and searching

Posted by alexw <aw...@crossview.com>.
I am using Nabble to view the thread, and the format seems to be ok:

http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=reply&node=2796849

1> what version of Solr.

Solr 1.4

2> have you looked in your index (admin page and/or luke) to see if what you
have indexed there is what you think you have indexed there?

Yes. When I search for "bit" only, I get 3 results as follows:

−

0
15
−

on
productName
on
0
bit
dismax
10
2.2


−

−

bit/star

−

bit/star

−

bit/star



3> What happens if you just query for the raw phrase, q=productName:"bit
star"~3 or something? 

This is the response:


−

0
0
−

on
0
productName:"bit star"~3
10
2.2





--
View this message in context: http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2797097.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Special characters during indexing and searching

Posted by Erick Erickson <er...@gmail.com>.
I'm having real trouble with the formatting. Either Google has changed or
somehow
all the markup is getting stripped on your end. Could you send as plain text
and
see if that works?

But from what I can make out, we're doing *something* different. Because I
get
parsed queries like below, and they're quite a bit different. So, additional
questions:
1> what version of Solr.
2> have you looked in your index (admin page and/or luke) to see if what you
have indexed
     there is what you think you have indexed there?
3> What happens if you just query for the raw phrase, q=productName:"bit
star"~3 or something?

Best
Erick

Here's what I get.....

+DisjunctionMaxQuery((id:bit/star^10.0 | text:"bit (star bitstar)"^0.5 |
cat:bit/star^1.4 | manu:"bit star"^1.1 | name:"bit star"^1.2 | features:"bit
(star bitstar)" | sku:bitstar^1.5)~0.01)
DisjunctionMaxQuery((manu_exact:bitstar^1.9 | features:"bit (star
bitstar)"~100^1.1 | text:"bit (star bitstar)"~100^0.2 | manu:"bit
star"~100^1.4 | name:"bit star"~100^1.5)~0.01)
FunctionQuery((1000.0/(1.0*float(float(price))+1000.0))^0.3)
FunctionQuery((int(popularity))^0.5)
*
*
*
*

On Fri, Apr 8, 2011 at 3:37 PM, alexw <aw...@crossview.com> wrote:

> Thanks Erick. Here is the Solr response with debug on. The productName IS
> in
> the qf parameter in dismax. I have also pasted my dismax definition and the
> "text" field type definition:
>
>
> -
>
> 0
> 47
> -
>
> on
> on
> 0
> bit/star
> dismax
> 10
> 2.2
>
>
>
> -
>
> -
>
> bit/star
>
>
> bit/star
> bit/star
> -
>
> +DisjunctionMaxQuery((longDesc:"bit star" | shortDesc:"bit star"^1.2 |
> productName:"bit star"^10.0)~0.01) DisjunctionMaxQuery((longDesc:"bit
> star"~10^5.0 | shortDesc:"bit star"~10^6.0 | productName:"bit
> star"~10^50.0)~0.01)
>
> -
>
> +(longDesc:"bit star" | shortDesc:"bit star"^1.2 | productName:"bit
> star"^10.0)~0.01 (longDesc:"bit star"~10^5.0 | shortDesc:"bit star"~10^6.0
> |
> productName:"bit star"~10^50.0)~0.01
>
>
> DisMaxQParser
>
>
> -
>
> 15.0
> -
>
> 0.0
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
>
> -
>
> 15.0
> -
>
> 15.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
> -
>
> 0.0
>
>
>
>
>
>
> =============================================
> Dismax Definition:
>
>
>
>     dismax
>     explicit
>     0.01
>
>        productName^10 shortDesc^1.2 longDesc^1.0
>
>
>        productName^50 shortDesc^6.0 longDesc^5.0
>
>
>
>        productName,shortDesc,thumbnail
>
>     3&lt;75%
>     10
>     *:*
>
>     text features name
>
>     0
>
>     name
>     regex
>
>
>       elevator
>       spellcheck
>
>
>
>
> =============================================
> "text" field type definition:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2796760.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Special characters during indexing and searching

Posted by alexw <aw...@crossview.com>.
Thanks Erick. Here is the Solr response with debug on. The productName IS in
the qf parameter in dismax. I have also pasted my dismax definition and the
"text" field type definition:


−

0
47
−

on
on
0
bit/star
dismax
10
2.2



−

−

bit/star


bit/star
bit/star
−

+DisjunctionMaxQuery((longDesc:"bit star" | shortDesc:"bit star"^1.2 |
productName:"bit star"^10.0)~0.01) DisjunctionMaxQuery((longDesc:"bit
star"~10^5.0 | shortDesc:"bit star"~10^6.0 | productName:"bit
star"~10^50.0)~0.01)

−

+(longDesc:"bit star" | shortDesc:"bit star"^1.2 | productName:"bit
star"^10.0)~0.01 (longDesc:"bit star"~10^5.0 | shortDesc:"bit star"~10^6.0 |
productName:"bit star"~10^50.0)~0.01


DisMaxQParser


−

15.0
−

0.0
−

0.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0


−

15.0
−

15.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0






=============================================
Dismax Definition:


    
     dismax
     explicit
     0.01
     
        productName^10 shortDesc^1.2 longDesc^1.0
     
     
        productName^50 shortDesc^6.0 longDesc^5.0
     
    
     
        productName,shortDesc,thumbnail
     
     3&lt;75%
     10
     *:*
     
     text features name
     
     0
     
     name
     regex 
    
    
       elevator
       spellcheck
    
  


=============================================
"text" field type definition:


			
				
			
				
				
				
				
			
			
				
				
				
				
				
				
			
		




--
View this message in context: http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2796760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Special characters during indexing and searching

Posted by Erick Erickson <er...@gmail.com>.
This works fine for me. Tack on &debugQuery=on to your URL
and post that please unless the stuff below helps....

But note a couple of things
1> productName isn't part of the default dismax configuration in your
solrconfig.xml file, so unless you put it there it's not being searched
on. Try putting it in the "qf" field.
2> the phrase queries are optional, just used to boost.
3> this is exactly how I'd expect these queries to be parsed, so one of
us is missing something <G>..

Best
Erick

Also see below...

On Fri, Apr 8, 2011 at 12:02 PM, alexw <aw...@crossview.com> wrote:

> Hi,
>
> I have a field named "productName" in my schema which uses the standard
> "text" field type. And one of my product name is "star/bit". When I search
> for "star/bit" (without quotes) using the dismax request hander, NO results
> was found.
>
> After some research, looks like during indexing, "star/bit" was tokenized
> into "star", "bit" and "starbit" by the WordDelimiterFilterFactory. And at
> search time, "star/bit" (without quotes) was turned into a "star bit"
> phrase
> query, therefore no matches since there is no "star bit" phrase in the
> index.
>
>
True. this is expected. The phrase queries are defined by the pf field in
your
dismax definition. Is your productName in there? But notice that those are
all optional, their presence is used to boost the query if the phrase is
present.


> My question is:
>
> 1. Am I understanding it correctly? If yes, how can I make sure a match can
> be found? If not, what's really happening?
>
>
Pretty much, except see above.


> 2. Why does dismax turn "star/bit" into a "star bit" phrase search instead
> of searching for "star" or "bit" or "starbit"?
>
>
because it was told to by the configuration (see above).


> Thanks for your help!
>
>
> Alex
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2795914.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>