You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Koji Sekiguchi (JIRA)" <ji...@apache.org> on 2009/12/13 17:53:18 UTC
[jira] Created: (SOLR-1653) add PatternReplaceCharFilter
add PatternReplaceCharFilter
----------------------------
Key: SOLR-1653
URL: https://issues.apache.org/jira/browse/SOLR-1653
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
Fix For: 1.5
Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
Usage:
{code:title=schema.xml}
<fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory"
groupedPattern="([nN][oO]\.)\s*(\d+)"
replaceGroups="1,2" blockDelimiters=":;"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798271#action_12798271 ]
Koji Sekiguchi commented on SOLR-1653:
--------------------------------------
Thanks, Paul! I've just committed revision 897357.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add
PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790056#action_12790056 ]
Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:30 AM:
----------------------------------------------------------------
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc=1234=5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order of the groups|
was (Author: koji):
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order of the groups|
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790067#action_12790067 ]
Noble Paul commented on SOLR-1653:
----------------------------------
I guess this can be achieved with the matcher#replaceAll() directly
input = see-ing looking
regex = (\w+)(ing)
replaceWith = $1
input = abc=1234=5678
regex =(\w+)=(\d+)=(\d+)
replaceWith=$3=$1=$2
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790577#action_12790577 ]
Shalin Shekhar Mangar commented on SOLR-1653:
---------------------------------------------
bq. If there is no objections, I'll commit later today.
+1
Thanks Koji!
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi resolved SOLR-1653.
----------------------------------
Resolution: Fixed
Committed revision 890798. Thanks Shalin and Noble for taking time to review the patch.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790056#action_12790056 ]
Koji Sekiguchi commented on SOLR-1653:
--------------------------------------
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order of the groups|
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790572#action_12790572 ]
Koji Sekiguchi commented on SOLR-1653:
--------------------------------------
I see that existing "PatternReplaceFilter" (not CharFilter) is using "pattern". But it uses "replacement", not "replaceWith". I think I use "pattern" and "replacement".
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add
PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790056#action_12790056 ]
Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:28 AM:
----------------------------------------------------------------
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order of the groups|
was (Author: koji):
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the order of the groups|
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add
PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790056#action_12790056 ]
Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:29 AM:
----------------------------------------------------------------
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order of the groups|
was (Author: koji):
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order of the groups|
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi updated SOLR-1653:
---------------------------------
Attachment: SOLR-1653.patch
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790129#action_12790129 ]
Noble Paul commented on SOLR-1653:
----------------------------------
bq.I need to process one match at a time.
I guess regex can process one match at a time.
The most important point is that , we don't need to educate the users on this new syntax. (I am still not clear about the syntax) . No need to write any parsing code and maintain it
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add
PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790056#action_12790056 ]
Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:27 AM:
----------------------------------------------------------------
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the order of the groups|
was (Author: koji):
Ok. I'll show you same samples ;-)
||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove "ing" from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted|
|No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern|
|abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order of the groups|
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790565#action_12790565 ]
Noble Paul commented on SOLR-1653:
----------------------------------
In Solr we refer to Regular Expression Strings as 'regex' . If you think 'pattern' is ok , go ahead.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Paul taylor (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797601#action_12797601 ]
Paul taylor commented on SOLR-1653:
-----------------------------------
Hi, Im using in non Solr in an analyser, and think there maybe a performance issue because you cannot pass a compiled Pattern. In the reusableTokenStream() method you cannot reset a charfilter like you can a tokenizer so it as to recompile the pattern everytime
i.e.
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException {
SavedStreams streams = (SavedStreams)getPreviousTokenStream();
if (streams == null) {
streams = new SavedStreams();
setPreviousTokenStream(streams);
streams.tokenStream = new StandardTokenizer(Version.LUCENE_CURRENT,new PatternReplaceCharFilter("(no\\.) ([0-9]+)","$1$2,reader));
streams.filteredTokenStream = new StandardFilter(streams.filteredTokenStream);
streams.filteredTokenStream = new AccentFilter(streams.filteredTokenStream);
streams.filteredTokenStream = new LowercaseFilter(streams.filteredTokenStream);
}
else {
streams.tokenStream.reset(new PatternReplaceCharFilter("(no\\.) ([0-9]+)","$1$2",reader));
}
return streams.filteredTokenStream;
}
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi updated SOLR-1653:
---------------------------------
Attachment: SOLR-1653.patch
Excuse myself, because I tried to correct offset per group in a match when I started the first patch, I introduced my own syntax. But, yes, now I've implemented the offset correction per match, so I can use standard syntax. Here is the new patch.
Usage:
{code:title=schema.xml}
<fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="([nN][oO]\.)\s*(\d+)"
replaceWith="$1$2"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
{code}
If there is no objections, I'll commit later today.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch, SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790127#action_12790127 ]
Koji Sekiguchi commented on SOLR-1653:
--------------------------------------
bq. I guess this can be achieved with the matcher#replaceAll() directly
You're right if we don't correct offset of the output char stream. I need to process one match at a time.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789957#action_12789957 ]
Koji Sekiguchi commented on SOLR-1653:
--------------------------------------
I'll commit in a few days.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Sekiguchi reassigned SOLR-1653:
------------------------------------
Assignee: Koji Sekiguchi
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790026#action_12790026 ]
Shalin Shekhar Mangar commented on SOLR-1653:
---------------------------------------------
Koji, even after reading through the test, I do not understand how to use it. Are the characters in curly braces, written down for non-groups only? What if I want to remove one particular group?
It is always good to write a use-case and an example in the issue description itself.
> add PatternReplaceCharFilter
> ----------------------------
>
> Key: SOLR-1653
> URL: https://issues.apache.org/jira/browse/SOLR-1653
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Affects Versions: 1.4
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1653.patch
>
>
> Add a new CharFilter that uses a regular expression for the target of replace string in char stream.
> Usage:
> {code:title=schema.xml}
> <fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
> <analyzer>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> groupedPattern="([nN][oO]\.)\s*(\d+)"
> replaceGroups="1,2" blockDelimiters=":;"/>
> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> </analyzer>
> </fieldType>
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.