You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sebastian M <mi...@yahoo.com> on 2011/01/11 17:22:01 UTC
default RegexFragmenter
Hello,
I'm investigating an issue where spellcheck queries are tokenized without
being explicitly told to do so, resulting in suggestions such as
"www.www.product4sale.com.com" for the queries such as
"www.product4sale.com".
The default RegexFragmenter fragmenter (name="regex") uses the regular
expression:
[-\w ,/\n\"']{20,200}
I understand parts of it, but I'm not sure about the - sign, or the slash
midway through it.
I would like to perhaps tailor this regular expression to not cause query
terms such as "www.product4sale.com" to be broken down on the period marks,
but just be kept as they are.
Any suggestions or answers are highly appreciated!
Sebastian
--
View this message in context: http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: default RegexFragmenter
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Sebastian,
If I remember my regular expressions, that - and / are really just that. The
stuff inside angle brackets means "any of the characters between [ and ]". -
and / are just two of those characters, along with newline, space, comma, etc.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Sebastian M <mi...@yahoo.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 11, 2011 11:22:01 AM
> Subject: default RegexFragmenter
>
>
> Hello,
>
> I'm investigating an issue where spellcheck queries are tokenized without
> being explicitly told to do so, resulting in suggestions such as
> "www.www.product4sale.com.com" for the queries such as
> "www.product4sale.com".
>
> The default RegexFragmenter fragmenter (name="regex") uses the regular
> expression:
>
> [-\w ,/\n\"']{20,200}
>
> I understand parts of it, but I'm not sure about the - sign, or the slash
> midway through it.
> I would like to perhaps tailor this regular expression to not cause query
> terms such as "www.product4sale.com" to be broken down on the period marks,
> but just be kept as they are.
>
> Any suggestions or answers are highly appreciated!
>
> Sebastian
> --
> View this message in context:
>http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>