You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/04/06 14:29:33 UTC
[jira] Commented: (LUCENE-2302) Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)

    [ https://issues.apache.org/jira/browse/LUCENE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853905#action_12853905 ] 

Robert Muir commented on LUCENE-2302:
-------------------------------------

bq. Token now implemnts CharSequence but violates its contract

I don't think this is correct. I don't care at all about Token's .toString method, but i care that the analysis api isn't broken.
if we do this, then the analysis API is completely wrong when using a Token Attribute Factory.
In my opinion we should do one of the following two things in the backwards compatibility section, but not break the analysis API:

# Token and TokenAttributeFactory was completely removed due to its backwards compatibility problems.
# Token's toString method was changed to match the CharSequence interface.


> Replacement for TermAttribute+Impl with extended capabilities (byte[] support, CharSequence, Appendable)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2302
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: Flex Branch
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch, LUCENE-2302.patch
>
>
> For flexible indexing terms can be simple byte[] arrays, while the current TermAttribute only supports char[]. This is fine for plain text, but e.g NumericTokenStream should directly work on the byte[] array.
> Also TermAttribute lacks of some interfaces that would make it simplier for users to work with them: Appendable and CharSequence
> I propose to create a new interface "CharTermAttribute" with a clean new API that concentrates on CharSequence and Appendable.
> The implementation class will simply support the old and new interface working on the same term buffer. DEFAULT_ATTRIBUTE_FACTORY will take care of this. So if somebody adds a TermAttribute, he will get an implementation class that can be also used as CharTermAttribute. As both attributes create the same impl instance both calls to addAttribute are equal. So a TokenFilter that adds CharTermAttribute to the source will work with the same instance as the Tokenizer that requested the (deprecated) TermAttribute.
> To also support byte[] only terms like Collation or NumericField needs, a separate getter-only interface will be added, that returns a reusable BytesRef, e.g. BytesRefGetterAttribute. The default implementation class will also support this interface. For backwards compatibility with old self-made-TermAttribute implementations, the indexer will check with hasAttribute(), if the BytesRef getter interface is there and if not will wrap a old-style TermAttribute (a deprecated wrapper class will be provided): new BytesRefGetterAttributeWrapper(TermAttribute), that is used by the indexer then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org