You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2007/12/04 17:17:43 UTC

[jira] Updated: (LUCENE-1077) Analysis Sinks package

     [ https://issues.apache.org/jira/browse/LUCENE-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated LUCENE-1077:
------------------------------------

    Attachment: LUCENE-1077.patch

This is a fairly trivial start to to this, but it creates the sinks package in the contrib/Analysis section and adds a simple TokenRangeSinkTokenizer and test.  This can be used to siphon off tokens that fall in a range.  All it does is count the tokens that go by and add those that fall in the range.  It might be useful for documents that you know have certain structures.  For instance, if you know the first 5 tokens of your docs are X.

More to follow.

> Analysis Sinks package
> ----------------------
>
>                 Key: LUCENE-1077
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1077
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis, contrib/*
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1077.patch
>
>
> With the advent of the new TeeTokenFilter and SinkTokenizer, there now exists some interesting new things that can be done in the analysis phase of indexing.  See LUCENE-1058.
> This patch provides some new implementations of SinkTokenizer that may be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org