You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ben <be...@autonomic.net> on 2009/06/29 15:29:57 UTC

Excluding Characters and SubStrings in a Faceted Wildcard Query

Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :

1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ","  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?

I hope somebody can help :)

Thanks,

Ben


Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

Posted by Norberto Meijome <nu...@gmail.com>.
On Mon, 29 Jun 2009 15:10:59 +0100
Ben <be...@autonomic.net> wrote:

> Hi Erik,
> 
> I'm not sure exactly how much context you need here, so I'll try to keep 
> it short and expand as needed.
> 
> The column I am faceting contains a comma deliniated set of vectors. 
> Each vector is made up of {Make,Year,Model} e.g. 
> _ford_1996_focus,mercedes_1996_clk,ford_2000_focus
> 
> I have a custom request handler, where if I want to find all the cars 
> from 1996 I pass in a facet query for the Year (1996) which is 
> transformed to a wildcard facet query :
> 
> _*_1996_*
> 
> In otherwords, it'll match any records whose vector column contains a 
> string, which somewhere has a car from 1996.
> 
> Why not put the Make, Year and Model in separate columns and do a facet 
> query of multiple columns?... because once we've selected 1996, we 
> should (in the above example) then be offering "ford and mercedes" as 
> further facet choices, and nothing more. If the parts were in their own 
> columns, there would be no way to tie the Makes and Models to specific 
> years, for example.
> 
[...]

Hi,
It must be late and I probably need more $coffee... but isn't what u just
described (search for 1996, show 'ford', 'mercedes') how facets DO work?
 once you have the facet on the make field, and solr told you that both 'ford'
 and 'mercedes' are available in that field, it is up to you to search for
 'make=ford and date=1996" if you ONLY want fords, generation 1996... 

cheers,
B
_________________________
{Beto|Norberto|Numard} Meijome

"He has the attention span of a lightning bolt."
  Robert Redford

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

Posted by Ben <be...@autonomic.net>.
Hi Erik,

I'm not sure exactly how much context you need here, so I'll try to keep 
it short and expand as needed.

The column I am faceting contains a comma deliniated set of vectors. 
Each vector is made up of {Make,Year,Model} e.g. 
_ford_1996_focus,mercedes_1996_clk,ford_2000_focus

I have a custom request handler, where if I want to find all the cars 
from 1996 I pass in a facet query for the Year (1996) which is 
transformed to a wildcard facet query :

_*_1996_*

In otherwords, it'll match any records whose vector column contains a 
string, which somewhere has a car from 1996.

Why not put the Make, Year and Model in separate columns and do a facet 
query of multiple columns?... because once we've selected 1996, we 
should (in the above example) then be offering "ford and mercedes" as 
further facet choices, and nothing more. If the parts were in their own 
columns, there would be no way to tie the Makes and Models to specific 
years, for example.

At anyrate, the wildcard search returns the entire match 
(_ford_1996_focus,mercedes_1996_clk,ford_2000_focus). I then have to do 
another RegExp over it to extract only the two parts (the first ford and 
mercedes) that were from 1996. This isn't using SOLR's cache very 
effectively.

It would be excellent if SOLR could break up that comma separated list 
into three different parts, and run the RegExp over each , returning 
only those which match. Is that what you're implying with Analysis? If 
that were the case, I'd not need to worry about character exclusion.

Sorry if that's a bit fuzzy... it's hard trying to explain enough to be 
useful, but not too much that it turns into an essay!!!

Thanks,
Ben

The solution I'm using is to form a vector

Erik Hatcher wrote:
> Ben,
>
> Could you post an example of the type of data you're dealing with and 
> how you want it handled?   I suspect there is a way to accomplish what 
> you want using an analyzed field, or by preprocessing the data you're 
> indexing.
>
>     Erik
>
> On Jun 29, 2009, at 9:29 AM, Ben wrote:
>
>> Hello,
>>
>> I've been using SOLR for a while now, but am stuck for information on 
>> two issues :
>>
>> 1) Is it possible to exclude characters in a SOLR facet wildcard query?
>> e.g.
>> [^,]* to match any character except an ","  ?
>>
>> 2) Can one setup the facet wildcard query to return the exact sub 
>> strings it matched of the queried facet, rather than the whole string?
>>
>> I hope somebody can help :)
>>
>> Thanks,
>>
>> Ben
>


Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Ben,

Could you post an example of the type of data you're dealing with and  
how you want it handled?   I suspect there is a way to accomplish what  
you want using an analyzed field, or by preprocessing the data you're  
indexing.

	Erik

On Jun 29, 2009, at 9:29 AM, Ben wrote:

> Hello,
>
> I've been using SOLR for a while now, but am stuck for information  
> on two issues :
>
> 1) Is it possible to exclude characters in a SOLR facet wildcard  
> query?
> e.g.
> [^,]* to match any character except an ","  ?
>
> 2) Can one setup the facet wildcard query to return the exact sub  
> strings it matched of the queried facet, rather than the whole string?
>
> I hope somebody can help :)
>
> Thanks,
>
> Ben