You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jai <ja...@gmail.com> on 2013/08/27 20:31:31 UTC

Solr 4.2 Regular expression, returning only matched substring

Hi,

is it possible to get only the matched substring of a text/string type
field in response.
i am trying to search with regular expression and do facet on different
strings (substring of the field) that matches this regular expression.

For example if i write a regular expression to match email, is there any
way to return only the matched email from the indexed sentence, so that i
can do facet on it.

will really appreciate any help.

thanks and regards
jai

Re: Solr 4.2 Regular expression, returning only matched substring

Posted by jai2 <ja...@gmail.com>.
hi Erick,

Thanks alot for your reply. i am still looking for any feasible solution,
currently i can only think of creating another core having schema with
patterntokenizer class field types, load it and re-index search results on
this temp core.

is there any way to provide list of patterns for tokenizing? like we do for
stopword filter by using text file? else we will have to create fields for
each pattern individually.

thanks and regards
jai



On Wed, Aug 28, 2013 at 4:45 PM, Erick Erickson [via Lucene] <
ml-node+s472066n4086966h1@n3.nabble.com> wrote:

> Ah, OK. Nothing springs to mind. Even faceting on the individual values
> of the field counts _documents_ that match, but doesn't give you
> which particular values matched. I suppose that in that case you could
> run your regex over the returned labels for the facets.
>
> But that's a really ugly solution. Problem is that in a field with 1M
> unique values you'd get a list 1M long perhaps which wouldn't perform
> at all well.
>
> Depending, you could enumerate your terms (see TermsComponent)
> using terms.regex to get a list of all terms that matched your regex
> up-front, then do some relatively painful facet querying on a long list
> of the returned values, again not something I'd do in a high-query
> environment. Depends I guess on how busy your website is....
>
> Best
> Erick
>
>
> On Wed, Aug 28, 2013 at 4:18 AM, jai2 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4086966&i=0>>
> wrote:
>
> > hi Erick,
> >
> > Appreciate your reply. Facet.query will give count of matches not the
> count
> > of unique pattern matches.
> >
> > if i give regular expression [0-9]{3} to match a 3 digit number it will
> > return total occurrences of three digit numbers, but i want to know
> > occurrences of unique 3 numbers. lets say i have number 100 occurred 10
> > times and 500 occurred 5 times. facet.query will return count as 15,
> > instead
> > of giving count of 100 and 500 individually.
> >
> > Hope i made myself clear. is there any way to to this?
> >
> > thanks and regards
> > jai
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-4-2-Regular-expression-returning-only-matched-substring-tp4086868p4086944.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-4-2-Regular-expression-returning-only-matched-substring-tp4086868p4086966.html
>  To unsubscribe from Solr 4.2 Regular expression, returning only matched
> substring, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4086868&code=amFpNGxvdmVAZ21haWwuY29tfDQwODY4Njh8LTIwNDk4NjMyNzM=>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Regular-expression-returning-only-matched-substring-tp4086868p4087790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2 Regular expression, returning only matched substring

Posted by Erick Erickson <er...@gmail.com>.
Ah, OK. Nothing springs to mind. Even faceting on the individual values
of the field counts _documents_ that match, but doesn't give you
which particular values matched. I suppose that in that case you could
run your regex over the returned labels for the facets.

But that's a really ugly solution. Problem is that in a field with 1M
unique values you'd get a list 1M long perhaps which wouldn't perform
at all well.

Depending, you could enumerate your terms (see TermsComponent)
using terms.regex to get a list of all terms that matched your regex
up-front, then do some relatively painful facet querying on a long list
of the returned values, again not something I'd do in a high-query
environment. Depends I guess on how busy your website is....

Best
Erick


On Wed, Aug 28, 2013 at 4:18 AM, jai2 <ja...@gmail.com> wrote:

> hi Erick,
>
> Appreciate your reply. Facet.query will give count of matches not the count
> of unique pattern matches.
>
> if i give regular expression [0-9]{3} to match a 3 digit number it will
> return total occurrences of three digit numbers, but i want to know
> occurrences of unique 3 numbers. lets say i have number 100 occurred 10
> times and 500 occurred 5 times. facet.query will return count as 15,
> instead
> of giving count of 100 and 500 individually.
>
> Hope i made myself clear. is there any way to to this?
>
> thanks and regards
> jai
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-2-Regular-expression-returning-only-matched-substring-tp4086868p4086944.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr 4.2 Regular expression, returning only matched substring

Posted by jai2 <ja...@gmail.com>.
hi Erick,

Appreciate your reply. Facet.query will give count of matches not the count
of unique pattern matches.

if i give regular expression [0-9]{3} to match a 3 digit number it will
return total occurrences of three digit numbers, but i want to know
occurrences of unique 3 numbers. lets say i have number 100 occurred 10
times and 500 occurred 5 times. facet.query will return count as 15, instead
of giving count of 100 and 500 individually.

Hope i made myself clear. is there any way to to this? 

thanks and regards
jai



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Regular-expression-returning-only-matched-substring-tp4086868p4086944.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.2 Regular expression, returning only matched substring

Posted by Erick Erickson <er...@gmail.com>.
You can facet by arbitrary query, does that work? See facet.query...


Best
Erick


On Tue, Aug 27, 2013 at 2:31 PM, Jai <ja...@gmail.com> wrote:

> Hi,
>
> is it possible to get only the matched substring of a text/string type
> field in response.
> i am trying to search with regular expression and do facet on different
> strings (substring of the field) that matches this regular expression.
>
> For example if i write a regular expression to match email, is there any
> way to return only the matched email from the indexed sentence, so that i
> can do facet on it.
>
> will really appreciate any help.
>
> thanks and regards
> jai
>