You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2007/04/24 00:02:15 UTC

[jira] Updated: (SOLR-211) regex split() Tokenizer

     [ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-211:
-------------------------------

    Attachment: SOLR-211-RegexSplitTokenizer.patch

simple regex tokenizer and a test.


<fieldType name="splitText" class="solr.TextField" positionIncrementGap="100">
     <analyzer>
       <tokenizer class="solr.RegexSplitTokenizerFactory" regex="--"/>
       <filter class="solr.TrimFilterFactory" />
     </analyzer>
 </fieldType>


Given a field:
  "Architecture--United States--19th century"

will create tokens for:
  "Architecture"
  "United States"
 "19th century"



> regex split() Tokenizer
> -----------------------
>
>                 Key: SOLR-211
>                 URL: https://issues.apache.org/jira/browse/SOLR-211
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>         Attachments: SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.