You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/04/22 13:11:49 UTC

[jira] Created: (LUCENE-2409) add a tokenfilter for icu transforms

add a tokenfilter for icu transforms
------------------------------------

                 Key: LUCENE-2409
                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/*
    Affects Versions: 3.1
            Reporter: Robert Muir
            Priority: Minor
             Fix For: 3.1


I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.

This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
to transform text. 

These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
or you can make your own from your own set of rules.

User's Guide: http://userguide.icu-project.org/transforms/general
Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2409) add a tokenfilter for icu transforms

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859801#action_12859801 ] 

Robert Muir commented on LUCENE-2409:
-------------------------------------

Thanks Uwe, i will remove the "crude benchmark" (as you can bench tokenfilters with benchmark), and add some examples and stuff to overview.html


> add a tokenfilter for icu transforms
> ------------------------------------
>
>                 Key: LUCENE-2409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2409.patch
>
>
> I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.
> This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
> to transform text. 
> These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
> or you can make your own from your own set of rules.
> User's Guide: http://userguide.icu-project.org/transforms/general
> Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-2409) add a tokenfilter for icu transforms

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-2409:
-----------------------------------

    Assignee: Robert Muir

> add a tokenfilter for icu transforms
> ------------------------------------
>
>                 Key: LUCENE-2409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2409.patch, LUCENE-2409.patch
>
>
> I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.
> This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
> to transform text. 
> These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
> or you can make your own from your own set of rules.
> User's Guide: http://userguide.icu-project.org/transforms/general
> Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2409) add a tokenfilter for icu transforms

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859793#action_12859793 ] 

Uwe Schindler commented on LUCENE-2409:
---------------------------------------

Go for it, its a private impl class, what should we do else. Speed, speed, speed. Its better than coping into a StringBuilder before and after. Even Java 6 has no Replaceable interface!

> add a tokenfilter for icu transforms
> ------------------------------------
>
>                 Key: LUCENE-2409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2409.patch
>
>
> I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.
> This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
> to transform text. 
> These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
> or you can make your own from your own set of rules.
> User's Guide: http://userguide.icu-project.org/transforms/general
> Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2409) add a tokenfilter for icu transforms

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2409:
--------------------------------

    Attachment: LUCENE-2409.patch

attached is a patch, its a little ugly since CharTermAttribute doesn't implement Replaceable :)


> add a tokenfilter for icu transforms
> ------------------------------------
>
>                 Key: LUCENE-2409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2409.patch
>
>
> I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.
> This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
> to transform text. 
> These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
> or you can make your own from your own set of rules.
> User's Guide: http://userguide.icu-project.org/transforms/general
> Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2409) add a tokenfilter for icu transforms

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2409.
---------------------------------

    Resolution: Fixed

Committed revision 937039.

> add a tokenfilter for icu transforms
> ------------------------------------
>
>                 Key: LUCENE-2409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2409.patch, LUCENE-2409.patch
>
>
> I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.
> This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
> to transform text. 
> These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
> or you can make your own from your own set of rules.
> User's Guide: http://userguide.icu-project.org/transforms/general
> Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2409) add a tokenfilter for icu transforms

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2409:
--------------------------------

    Attachment: LUCENE-2409.patch

attached is an updated patch, with examples in the overview etc.

I would like to commit at the end of the day if no one objects.

> add a tokenfilter for icu transforms
> ------------------------------------
>
>                 Key: LUCENE-2409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2409
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2409.patch, LUCENE-2409.patch
>
>
> I pulled the ICUTransformFilter out of LUCENE-1488 and create an issue for it here.
> This is a tokenfilter that applies an ICU Transliterator, which is a context-sensitive way
> to transform text. 
> These are typically rule-based and you can use ones included with ICU (such as Traditional-Simplified)
> or you can make your own from your own set of rules.
> User's Guide: http://userguide.icu-project.org/transforms/general
> Rule Tutorial: http://userguide.icu-project.org/transforms/general/rules

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org