You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Abeba Tensai <ab...@gmail.com> on 2008/04/12 05:51:19 UTC

filtering search using regex

hi,

I have a question ... I need to be able to filter a search using a regex. I
cannot used facet as the filtering is pretty complex (but easy to perform
using a regex).
For instance I have stored in the field ID the value 12G and I want to
basically filter out all the results that are > 12 with G so for instance
14G will match but 8G and 14B would not. Using a regex this is simply
"[1-9]+[3-9]G" ..
i am wondering what the right approach is to tackle such a situation ..

thanks.

Re: filtering search using regex

Posted by Chris Hostetter <ho...@fucit.org>.
Solr doesn't provide any regex based searching features out of the box.  

There are some regex based query classes in lucene, if you wrote a custom 
Solr plugin to do the query parsing, you could use them.

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
http://people.apache.org/~hossman/#xyproblem

If you could elaborate a little more on the exact use case you are trying 
to solve, people might be able to offer you alternative solutions you've 
never thought of ... supporting regex search is a much harder problem then 
finding creative ways to support range queries on unclean data (which is 
what the root of your issue seems to be).

Tell us more about your data, and the types of queries you need to support 
(without making the assumption that regexes is the best way to 
support them)


-Hoss


Re: filtering search using regex

Posted by Abeba Tensai <ab...@gmail.com>.
hi mathieu,

i cannot split the data in a very meaningful way. My example is a bit
misleading.
basically, I can have the field FILTER indexed with 12K or 13B-14K and the
user might want results that have field FILTER > 13 and K, the first one
(12K) won't match because 12 < 13 but the second will (it has in the string
14K which is 14 > 12 and K). So I don't think that I can alter the way I
indexed data to be able to use facet .. it is just not possible ..
I am just trying to have people use regular free text search but then filter
the results using a regex ..

thanks.

On 4/12/08, Mathieu Lecarme <ma...@garambrogne.net> wrote:
>
>  hi,
> >
> > I have a question ... I need to be able to filter a search using a
> > regex. I
> > cannot used facet as the filtering is pretty complex (but easy to
> > perform
> > using a regex).
> > For instance I have stored in the field ID the value 12G and I want to
> > basically filter out all the results that are > 12 with G so for
> > instance
> > 14G will match but 8G and 14B would not. Using a regex this is simply
> > "[1-9]+[3-9]G" ..
> > i am wondering what the right approach is to tackle such a situation ..
> >
> > thanks.
> >
> regex match is only useful when you first select a prefix, wich is a basic
> lucene feature : put the pointer just up to the first term begining with
> "toto".
> Your query don't have any prefix.
> What happen if you split your data in two field "12"  and "G", "14" and
> "B", or, better, if it's number, "12G" can be indexed as "12000000"?
>
> M.
>

Re: filtering search using regex

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
> hi,
>
> I have a question ... I need to be able to filter a search using a  
> regex. I
> cannot used facet as the filtering is pretty complex (but easy to  
> perform
> using a regex).
> For instance I have stored in the field ID the value 12G and I want to
> basically filter out all the results that are > 12 with G so for  
> instance
> 14G will match but 8G and 14B would not. Using a regex this is simply
> "[1-9]+[3-9]G" ..
> i am wondering what the right approach is to tackle such a  
> situation ..
>
> thanks.
regex match is only useful when you first select a prefix, wich is a  
basic lucene feature : put the pointer just up to the first term  
begining with "toto".
Your query don't have any prefix.
What happen if you split your data in two field "12"  and "G", "14"  
and "B", or, better, if it's number, "12G" can be indexed as "12000000"?

M.