You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Arumugam, Suresh" <Su...@emc.com> on 2015/01/24 05:58:06 UTC

Solr regex query help

Hi All,

We have indexed the documents to Solr & not able to query using the Regex.

Our data looks like as below in a Text Field, which is indexed using the ClassicTokenizer.

                1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC Power/Reset action  ::PIPE:: Delayed shutdown timer disabled ::PIPE:: Asserted

                We tried lookup this string with the Regex.
PIPE*[0-9]{2}\/[0-9}{2}\/[0-9]{4}*Delayed shutdown*Asserted

                Since the analyzer tokenized the data, the regex match is happening on the terms & it's not working as we expect.

Can you please help us in finding an equivalent way to query this in Solr ?

The following are the details about our environment.


1.       Solr 4.10.3 as well as Solr 4.8

2.       JDK 1.7_51

3.       SolrConfig.xml & Schema.xml attached.

The regex query as below is working
msg:/[0-9]{2}/

But when we want to match more than one terms the regex doesn't seems to be working.
Please help us in resolving this issue.

Thanks in advance.

Regards,
Suresh.A

Re: Solr regex query help

Posted by Jack Krupansky <ja...@gmail.com>.
When I first read your post I thought this example had something to do with
"pipe", but now I realize that "::PIPE::" is simply a symbolic
representation of what we software people call a "pipe", namely the
vertical bar character used as a field separator. Usually, terms and tokens
are all of the same type and for a single field, but your use case seems to
be a set of discrete fields that have been mashed into a single field with
"pipe" separators.

My suggestion is that either you fix the problem upstream, so that separate
fields are sent to Solr, or maybe use a Solr update processor to pull the
string apart and store the individual pieces as separate fields.

As always, the first question is not how to store your data, but how your
users intend to access your data. Post some sample queries. I imagine that
any sane user would like to reference individual fields by name.


-- Jack Krupansky

On Fri, Jan 23, 2015 at 11:58 PM, Arumugam, Suresh <Su...@emc.com>
wrote:

> Hi All,
>
>
>
> We have indexed the documents to Solr & not able to query using the Regex.
>
>
>
> Our data looks like as below in a Text Field, which is indexed using the
> ClassicTokenizer.
>
>
>
> *                1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC
> Power/Reset action  ::PIPE:: Delayed shutdown timer disabled ::PIPE::
> Asserted*
>
>
>
>                 We tried lookup this string with the Regex.
>
> *PIPE*[0-9]{2}\/[0-9}{2}\/[0-9]{4}*Delayed shutdown*Asserted*
>
>
>
>                 Since the analyzer tokenized the data, the regex match is
> happening on the terms & it’s not working as we expect.
>
>
>
> Can you please help us in finding an equivalent way to query this in Solr
> ?
>
>
>
> The following are the details about our environment.
>
>
>
> 1.       Solr 4.10.3 as well as Solr 4.8
>
> 2.       JDK 1.7_51
>
> 3.       SolrConfig.xml & Schema.xml attached.
>
>
>
> The regex query as below is working
>
> msg:/[0-9]{2}/
>
>
>
> But when we want to match more than one terms the regex doesn't seems to
> be working.
>
> Please help us in resolving this issue.
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Suresh.A
>

Re: Solr regex query help

Posted by Erik Hatcher <er...@gmail.com>.
If you make your field type "string" the regex may work as expected.  

But as others said, splitting into separate fields is likely more flexible. 

    Erik


> On Jan 23, 2015, at 23:58, Arumugam, Suresh <Su...@emc.com> wrote:
> 
> Hi All,
>  
> We have indexed the documents to Solr & not able to query using the Regex.
>  
> Our data looks like as below in a Text Field, which is indexed using the ClassicTokenizer.
>  
>                 1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC Power/Reset action  ::PIPE:: Delayed shutdown timer disabled ::PIPE:: Asserted
>  
>                 We tried lookup this string with the Regex.
> PIPE*[0-9]{2}\/[0-9}{2}\/[0-9]{4}*Delayed shutdown*Asserted
>  
>                 Since the analyzer tokenized the data, the regex match is happening on the terms & it’s not working as we expect.
>  
> Can you please help us in finding an equivalent way to query this in Solr ?
>  
> The following are the details about our environment.
>  
> 1.       Solr 4.10.3 as well as Solr 4.8
> 2.       JDK 1.7_51
> 3.       SolrConfig.xml & Schema.xml attached.
>  
> The regex query as below is working
> msg:/[0-9]{2}/
>  
> But when we want to match more than one terms the regex doesn't seems to be working.
> Please help us in resolving this issue.
>  
> Thanks in advance.
>  
> Regards,
> Suresh.A
> <schema.xml>
> <solrconfig.xml>

RE: Solr regex query help

Posted by "Arumugam, Suresh" <Su...@emc.com>.
Hi Erick,

Thanks for the response.

I understood the reason for the regex match not working.

The help that I am looking from this forum is as below.

	1.  All the example regex query are to match one term only, Is there a way in Solr to match multiple term?
                2. How can make a similar query in Solr? Sample Query will help us instead of providing the Query parser name.

Lack of documentation on these feature is causing the confusion to me.

Thanks in advance for your help.

Regards,
Suresh.A

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, January 23, 2015 9:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr regex query help

Right. As I mentioned on the original JIRA, the regex match is happening on _terms_.
You are conflating the original input (the entire field) with the individual terms that the regex is applied to.

I suggest that you look at the admin/analysis page. There you'll see the terms that are indexed and you'll see that the regex simply cannot work since it assumes that the regex is applied to the entire input rather than the results of the analysis chain.

I further suggest that you explore tokenization and how individual terms are searched. The admin/analysis page is invaluable in this endeavor.

The root cause of your confusion is that, given you're using ClassicTokenizer, you have a bunch of individual terms that are being searched, _not_ the whole input. So the regex is bound to fail since you're thinking in terms of the entire input rather than the result of your analysis chain, i.e. tokenization + filters as defined in schema.xml.

FWIW,
Erick

On Fri, Jan 23, 2015 at 8:58 PM, Arumugam, Suresh <Su...@emc.com>
wrote:

> Hi All,
>
>
>
> We have indexed the documents to Solr & not able to query using the Regex.
>
>
>
> Our data looks like as below in a Text Field, which is indexed using 
> the ClassicTokenizer.
>
>
>
> *                1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC
> Power/Reset action  ::PIPE:: Delayed shutdown timer disabled ::PIPE::
> Asserted*
>
>
>
>                 We tried lookup this string with the Regex.
>
> *PIPE*[0-9]{2}\/[0-9}{2}\/[0-9]{4}*Delayed shutdown*Asserted*
>
>
>
>                 Since the analyzer tokenized the data, the regex match 
> is happening on the terms & it’s not working as we expect.
>
>
>
> Can you please help us in finding an equivalent way to query this in 
> Solr ?
>
>
>
> The following are the details about our environment.
>
>
>
> 1.       Solr 4.10.3 as well as Solr 4.8
>
> 2.       JDK 1.7_51
>
> 3.       SolrConfig.xml & Schema.xml attached.
>
>
>
> The regex query as below is working
>
> msg:/[0-9]{2}/
>
>
>
> But when we want to match more than one terms the regex doesn't seems 
> to be working.
>
> Please help us in resolving this issue.
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Suresh.A
>

Re: Solr regex query help

Posted by Erick Erickson <er...@gmail.com>.
Right. As I mentioned on the original JIRA, the regex match is happening on
_terms_.
You are conflating the original input (the entire field) with the
individual terms that the
regex is applied to.

I suggest that you look at the admin/analysis page. There you'll see the
terms that are
indexed and you'll see that the regex simply cannot work since it assumes
that the
regex is applied to the entire input rather than the results of the
analysis chain.

I further suggest that you explore tokenization and how
individual terms are searched. The admin/analysis page is invaluable in this
endeavor.

The root cause of your confusion is that, given you're using
ClassicTokenizer,
you have a bunch of individual terms that are being searched, _not_ the
whole
input. So the regex is bound to fail since you're thinking in terms of the
entire
input rather than the result of your analysis chain, i.e. tokenization +
filters
as defined in schema.xml.

FWIW,
Erick

On Fri, Jan 23, 2015 at 8:58 PM, Arumugam, Suresh <Su...@emc.com>
wrote:

> Hi All,
>
>
>
> We have indexed the documents to Solr & not able to query using the Regex.
>
>
>
> Our data looks like as below in a Text Field, which is indexed using the
> ClassicTokenizer.
>
>
>
> *                1b ::PIPE:: 04/14/2014 ::PIPE:: 01:32:48 ::PIPE:: BMC
> Power/Reset action  ::PIPE:: Delayed shutdown timer disabled ::PIPE::
> Asserted*
>
>
>
>                 We tried lookup this string with the Regex.
>
> *PIPE*[0-9]{2}\/[0-9}{2}\/[0-9]{4}*Delayed shutdown*Asserted*
>
>
>
>                 Since the analyzer tokenized the data, the regex match is
> happening on the terms & it’s not working as we expect.
>
>
>
> Can you please help us in finding an equivalent way to query this in Solr
> ?
>
>
>
> The following are the details about our environment.
>
>
>
> 1.       Solr 4.10.3 as well as Solr 4.8
>
> 2.       JDK 1.7_51
>
> 3.       SolrConfig.xml & Schema.xml attached.
>
>
>
> The regex query as below is working
>
> msg:/[0-9]{2}/
>
>
>
> But when we want to match more than one terms the regex doesn't seems to
> be working.
>
> Please help us in resolving this issue.
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Suresh.A
>