You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kevin Osborn <os...@yahoo.com> on 2009/04/21 00:59:14 UTC

query on part number not matching

 
I have a manufacturer part number: CISCO7204VXR-CH. The indexer produces:

1        2       3      4
cisco    7204    vxr    ch
                        vxrch
                        cisco7204vxrch

If I query on CISCO7204VXR-CH, I get:

1        2       3      4
cisco    7204    vxr    ch

Everything matches. But if I query on CISCO7204VXRCH, I get

1        2       3
cisco    7204    vxrch

This
does not match on term 3. So, the match fails in this case and returns
no results. It seems like it is demanding that every term in the index
matches, which doesn't make a whole lot of sense. Should just be the
query, right?


      

Re: query on part number not matching

Posted by Kevin Osborn <os...@yahoo.com>.
Or in this case, I was using DisMax. My ps was 5, but I didn't have a qs field. Setting qs to a small value did the trick.




________________________________
From: Yonik Seeley <yo...@lucidimagination.com>
To: solr-user@lucene.apache.org
Sent: Monday, April 20, 2009 6:09:51 PM
Subject: Re: query on part number not matching

On Mon, Apr 20, 2009 at 8:50 PM, Kevin Osborn <os...@yahoo.com> wrote:
> Looks like the format didn't come through in the email. ch, vxrch, and cisco7204xvrch are all in position 4.

Ah... the traditional way to "handle" that case is to use a little
slop with the phrase query.

-Yonik



      

Re: query on part number not matching

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Apr 20, 2009 at 8:50 PM, Kevin Osborn <os...@yahoo.com> wrote:
> Looks like the format didn't come through in the email. ch, vxrch, and cisco7204xvrch are all in position 4.

Ah... the traditional way to "handle" that case is to use a little
slop with the phrase query.

-Yonik

Re: query on part number not matching

Posted by Kevin Osborn <os...@yahoo.com>.
Looks like the format didn't come through in the email. ch, vxrch, and cisco7204xvrch are all in position 4.

But, your suggestion of turning off catenateAll may work out. I'll have do some testing to make sure that it doesn't have any unintended consequences. Specifically, I am worried about a case like "XYZ123-3" and the customer searching on "XYZ1233". Ideally, that would produce a match.




________________________________
From: Yonik Seeley <yo...@lucidimagination.com>
To: solr-user@lucene.apache.org
Sent: Monday, April 20, 2009 5:14:32 PM
Subject: Re: query on part number not matching

On Mon, Apr 20, 2009 at 6:59 PM, Kevin Osborn <os...@yahoo.com> wrote:
>
> I have a manufacturer part number: CISCO7204VXR-CH. The indexer produces:
>
> 1        2       3      4
> cisco    7204    vxr    ch
>                        vxrch
>                        cisco7204vxrch

It looks like you're using catenateAll, which doesn't do any good if
the query analyzer splits on alpha-numeric transitions.  Turn that off
to save yourself some space.


> If I query on CISCO7204VXR-CH, I get:
>
> 1        2       3      4
> cisco    7204    vxr    ch
>
> Everything matches. But if I query on CISCO7204VXRCH, I get
>
> 1        2       3
> cisco    7204    vxrch
>
> This
> does not match on term 3.

But it does...  The index has vxr, vxrch, and cisco7204vxrch all at position 3.


> So, the match fails in this case and returns
> no results. It seems like it is demanding that every term in the index
> matches, which doesn't make a whole lot of sense. Should just be the
> query, right?

Right.  Lucene doesn't really have phrase queries with optional terms
in it though.

-Yonik



      

Re: query on part number not matching

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Apr 20, 2009 at 6:59 PM, Kevin Osborn <os...@yahoo.com> wrote:
>
> I have a manufacturer part number: CISCO7204VXR-CH. The indexer produces:
>
> 1        2       3      4
> cisco    7204    vxr    ch
>                        vxrch
>                        cisco7204vxrch

It looks like you're using catenateAll, which doesn't do any good if
the query analyzer splits on alpha-numeric transitions.  Turn that off
to save yourself some space.


> If I query on CISCO7204VXR-CH, I get:
>
> 1        2       3      4
> cisco    7204    vxr    ch
>
> Everything matches. But if I query on CISCO7204VXRCH, I get
>
> 1        2       3
> cisco    7204    vxrch
>
> This
> does not match on term 3.

But it does...  The index has vxr, vxrch, and cisco7204vxrch all at position 3.


> So, the match fails in this case and returns
> no results. It seems like it is demanding that every term in the index
> matches, which doesn't make a whole lot of sense. Should just be the
> query, right?

Right.  Lucene doesn't really have phrase queries with optional terms
in it though.

-Yonik