You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2009/08/12 15:09:16 UTC

[jira] Created: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Can't specify AttributeSource for Tokenizer
-------------------------------------------

                 Key: LUCENE-1804
                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Yonik Seeley


One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742354#action_12742354 ] 

Yonik Seeley commented on LUCENE-1804:
--------------------------------------

It makes delegation possible.  Say one wanted to create a new Tokenizer by wrapping an existing Tokenizer or TokenStream.

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742385#action_12742385 ] 

Uwe Schindler commented on LUCENE-1804:
---------------------------------------

OK, I was wondering, because TokenFilter is there for this pupose and TokenStream only provides the AttributeSource ctor because the TokenFilter subclass needs this. So one could also simply create a TokenFilter and put it ontop of the Tokenizer to wrap? new TokenFilter(new WrappedTokenizer())  - why need a Tokenizer for that when TokenFilter is made for it?

But for completeness, this ctor should also get the Reader/CharStream (as all other ctors have the Reader param).

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742388#action_12742388 ] 

Yonik Seeley commented on LUCENE-1804:
--------------------------------------

bq. But for completeness, this ctor should also get the Reader/CharStream (as all other ctors have the Reader param).

Wouldn't tokenizer.reset(reader) serve the same purpose?  I'm not sure why all those different constructors are there.

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742441#action_12742441 ] 

Yonik Seeley commented on LUCENE-1804:
--------------------------------------

OO design principal of not removing functionality - Tokenizer's superclass can specify it's AttributeSource... why can't Tokenizer?  We shouldn't disallow it just because we can't immediately think of a use case.

bq. I am still not sure, why a simple TokenFilter does not serve the same pupose you would like to have with Tokenizer here.

Simplest case: a Tokenizer that delegates to an existing Tokenizer or TokenStream?

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742439#action_12742439 ] 

Uwe Schindler commented on LUCENE-1804:
---------------------------------------

Normally it would be ok. E.g. in the reuse of TokenStreams, the simpliest would be to create the tokenizer with a null Reader first and only reset(Reader) it before first use. I think, this has historical reasons and to keep consistent we should add the ctors. Or deprecate all Reader ctors and state, that you should create a reusable Tokenizer and call reset(Reader).

I am still not sure, why a simple TokenFilter does not serve the same pupose you would like to have with Tokenizer here. Why not simply wrap the Tokenizer with a TokenFilter that already has the possibility to delegate? If it is because you miss the reset(Reader) call, we could think about adding this to TokenFilter, that passes to the delegated Tokenizer (using instanceof checks).

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742352#action_12742352 ] 

Uwe Schindler commented on LUCENE-1804:
---------------------------------------

Why do you need this?

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-1804:
---------------------------------

    Attachment: LUCENE-1804.patch

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1804) Can't specify AttributeSource for Tokenizer

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved LUCENE-1804.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9

Committed.

I'm not sure it's worth adding constructors for all combinations of parameters, esp when the trend is toward reuse, and specifying the reader separately - but I think that can be a different issue (whether to remove some of the existing constructors or not).

> Can't specify AttributeSource for Tokenizer
> -------------------------------------------
>
>                 Key: LUCENE-1804
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1804
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1804.patch
>
>
> One can't currently specify the attribute source for a Tokenizer like one can with any other TokenStream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org