You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jack Krupansky <ja...@basetechnology.com> on 2013/06/26 18:35:35 UTC
Re: [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by ErikHatcher
Doc bug: "solr.PatternCaptureGroupTokenFilter" s.b.
"solr.PatternCaptureGroupTokenFilterFactory"
Both the wiki and the Javadoc have the same issue.
Also, I just happened to notice that there is no unit test for the factory,
unlike other filter factories.
-- Jack Krupansky
-----Original Message-----
From: Apache Wiki
Sent: Tuesday, June 25, 2013 10:48 AM
To: Apache Wiki
Subject: [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by
ErikHatcher
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for
change notification.
The "AnalyzersTokenizersTokenFilters" page has been changed by ErikHatcher:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=146&rev2=147
Comment:
added PatternCaptureGroupFilterFactory
. Example: `" Kittens! ", "Duck" ==> "Kittens!", "Duck"`.
Optionally, the "updateOffsets" attribute will update the start and end
position offsets.
+
+ <<Anchor(PatternCaptureGroupFilter)>>
+
+ === solr.PatternCaptureGroupFilterFactory ===
+ <!> [[Solr4.4]]
+
+ Emits tokens for each capture group in a regular expression
+
+ For example, the following definition will tokenize the input text of
"http://www.foo.com/index" into "http://www.foo.com" and "www.foo.com".
+
+ {{{
+ <fieldType name="url_base" class="solr.TextField"
positionIncrementGap="100">
+ <analyzer>
+ <tokenizer class="solr.KeywordTokenizerFactory">
+ <filter class="solr.PatternCaptureGroupTokenFilter"
pattern="(https?://([a-zA-Z\-_0-9.]+))" preserve_original="false">
+ </analyzer>
+ </fieldType>
+ }}}
+
+ If none of the patterns match, or if preserve_original is true, the
original token will also be emitted.
+
=== solr.PatternReplaceFilterFactory ===
Like the !PatternReplaceCharFilterFactory, but operates post-tokenization.
See "When to use a Char Filter vs. a Token Filter" above.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by ErikHatcher
Posted by Jack Krupansky <ja...@basetechnology.com>.
Sigh... make that "solr.PatternCaptureGroupFilterFactory"
-- Jack Krupansky
-----Original Message-----
From: Jack Krupansky
Sent: Wednesday, June 26, 2013 12:35 PM
To: dev@lucene.apache.org
Subject: Re: [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by
ErikHatcher
Doc bug: "solr.PatternCaptureGroupTokenFilter" s.b.
"solr.PatternCaptureGroupTokenFilterFactory"
Both the wiki and the Javadoc have the same issue.
Also, I just happened to notice that there is no unit test for the factory,
unlike other filter factories.
-- Jack Krupansky
-----Original Message-----
From: Apache Wiki
Sent: Tuesday, June 25, 2013 10:48 AM
To: Apache Wiki
Subject: [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by
ErikHatcher
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for
change notification.
The "AnalyzersTokenizersTokenFilters" page has been changed by ErikHatcher:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=146&rev2=147
Comment:
added PatternCaptureGroupFilterFactory
. Example: `" Kittens! ", "Duck" ==> "Kittens!", "Duck"`.
Optionally, the "updateOffsets" attribute will update the start and end
position offsets.
+
+ <<Anchor(PatternCaptureGroupFilter)>>
+
+ === solr.PatternCaptureGroupFilterFactory ===
+ <!> [[Solr4.4]]
+
+ Emits tokens for each capture group in a regular expression
+
+ For example, the following definition will tokenize the input text of
"http://www.foo.com/index" into "http://www.foo.com" and "www.foo.com".
+
+ {{{
+ <fieldType name="url_base" class="solr.TextField"
positionIncrementGap="100">
+ <analyzer>
+ <tokenizer class="solr.KeywordTokenizerFactory">
+ <filter class="solr.PatternCaptureGroupTokenFilter"
pattern="(https?://([a-zA-Z\-_0-9.]+))" preserve_original="false">
+ </analyzer>
+ </fieldType>
+ }}}
+
+ If none of the patterns match, or if preserve_original is true, the
original token will also be emitted.
+
=== solr.PatternReplaceFilterFactory ===
Like the !PatternReplaceCharFilterFactory, but operates post-tokenization.
See "When to use a Char Filter vs. a Token Filter" above.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org