You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Nadav Har'El (JIRA)" <ji...@apache.org> on 2007/01/18 17:21:29 UTC
[jira] Commented: (LUCENE-580) Pre-analyzed fields

    [ https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465797 ] 

Nadav Har'El commented on LUCENE-580:
-------------------------------------

This patch will be useful for users LUCENE-755, the payloads patch. That patch adds "payloads" to tokens, but using it to add a few tokens with payloads in some field can be ugly because you need to split the code into two places: at one place you add the field, only text, and at another place you need to write a special analyzer which will work only on that field, recognize the specific tokens and add the payloads to them. This patch makes this easier, because when you add a field, you can add it pre-analyzed, already as a list of tokens, and these tokens will already have their special payloads in them.

I have just a few comments on this patch:

1. The description above suggests that it might not work if the same field name is used for two Field's, one stored and the other preanalyzed. I think it is important that this combination (as well as all other combinations) are supported. I actually use all these combinations in my code, and I don't see why it should cause problems.

2. The patch has some strange changes in the comments, changing the word "Index" to "NotificationService". I bet this wasn't intentional :-)

3. The new Field constructor still has a "Index" paramter, taking TOKENIZED, UN_TOKENIZED or NO_NORMS (only NO is forbidden). I wonder, what's the difference between TOKENIZED and UN_TOKENIZED in this case? The NO_NORMS is a very useful case, because it allows you to do something not previously possible in Lucene (a tokenized field, but without norms). Perhaps this parameter should be better documented in the javadoc comment.

4. In the new Field constructor's comment, the phrase "if name or reader" should be "if name or tokenStream".

Thanks!

> Pre-analyzed fields
> -------------------
>
>                 Key: LUCENE-580
>                 URL: https://issues.apache.org/jira/browse/LUCENE-580
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 1.9
>            Reporter: Karl Wettin
>            Priority: Minor
>         Attachments: preanalyze.tar
>
>
> Adds the possibility to set a TokenStream at Field constrution time, available as tokenStreamValue in addition to stringValue, readerValue and binaryValue.
> There might be some problems with mixing stored fields with the same name as a field with tokenStreamValue.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org