You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Rowe (Created) (JIRA)" <ji...@apache.org> on 2011/12/23 23:52:31 UTC

[jira] [Created] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Update org.apache.lucene.analysis package summary
-------------------------------------------------

                 Key: LUCENE-3666
                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
             Project: Lucene - Java
          Issue Type: Improvement
          Components: general/javadocs
    Affects Versions: 3.5
            Reporter: Steven Rowe
            Assignee: Steven Rowe
            Priority: Minor


{{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.

It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.

The trunk version is more modern - it refers to {{CharTermAttrubute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175616#comment-13175616 ] 

Steven Rowe edited comment on LUCENE-3666 at 12/25/11 4:56 AM:
---------------------------------------------------------------

Patch for branch_3x.

Changes:

# Added {{CharStream}}/{{-Filter}} to analysis components discussion
# {{TermAttribute}} -> {{CharTermAttribute}}
# Added {{KeywordAttribute}} to the list of out-of-the-box attributes
# {{Version}} parameter added to analysis component c-tors.
# Custom {{MyAnalyzer}} extends {{ReusableAnalyzerBase}}
# Added {{@Override}} annotation to overridden methods
# {{LengthFilter}} extends {{FilteringTokenFilter}}
                
      was (Author: steve_rowe):
    Patch for branch_3x.

Changes:

# Added {{CharStream}}/{{-Filter}} to analysis components discussion
# {{TermAttribute}} -> {{CharTermAttribute}}
# {{Version}} parameter added to analysis component c-tors.
# Custom {{MyAnalyzer}} extends {{ReusableAnalyzerBase}}
# Added {{@Override}} annotation to overridden methods
# {{LengthFilter}} extends {{FilteringTokenFilter}}
                  
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-3666:
--------------------------------

    Attachment: LUCENE-3666-branch_3x.patch

Patch for branch_3x.

Changes:

# Added {{CharStream}}/{{-Filter}} to analysis components discussion
# {{TermAttribute}} -> {{CharTermAttribute}}
# {{Version}} parameter added to analysis component c-tors.
# Custom {{MyAnalyzer}} extends {{ReusableAnalyzerBase}}
# Added {{@Override}} annotation to overridden methods
# {{LengthFilter}} extends {{FilteringTokenFilter}}
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttrubute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188493#comment-13188493 ] 

Uwe Schindler commented on LUCENE-3666:
---------------------------------------

oh small changes needed:

This example consumer code is incomplete:

{noformat}
+<PRE class="prettyprint">
+    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_XY); // or any other analyzer
+    TokenStream ts = analyzer.tokenStream("myfield",new StringReader("some text goes here"));
+    while (ts.incrementToken()) {
+      System.out.println("token: "+ts));
+    }
+</PRE>
{noformat}

- TokenStream needs to call reset() before incrementing tokens (thats the contract)
- It should call end() after incrementToken()
- It must call close finally (ideally in try/finally)

Finally TokenStream no longer is required to implement toString(), so this one may produce useless standard toString() output (in 4.0 it does print TokenStreamClass@hashcode, in 3.x for backwards compatibility it prints the same like reflectAsString).

To get Token debug outbut, use [http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/util/AttributeSource.html#reflectAsString(boolean)], e.g. {code}System.out.println("token: "+ts.reflectAsString(true)){code}.

Ideally the example code would use one attribute as example.

The example attribute impl's copyTo is using the actual Attribute (not the impl) when casting, but the attribute has no fields, only methods. The copyTo must call set setPos() method of the attribute interface.

Thats all.
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-3666:
--------------------------------

    Attachment: LUCENE-3666-branch_3x.patch
                LUCENE-3666-trunk.patch

Patches incorporating Uwe's suggested changes.

Committing shortly.
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188600#comment-13188600 ] 

Steven Rowe commented on LUCENE-3666:
-------------------------------------

Committed to branch_3x and trunk.

Thanks Uwe!
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-3666:
--------------------------------

    Attachment: LUCENE-3666-branch_3x.patch

Updated branch_3x patch to remove javadocs warnings about @Override and @Deprecated annotatations in sample code by wrapping with {@literal ...}
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-3666:
--------------------------------

    Description: 
{{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.

It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.

The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

  was:
{{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.

It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.

The trunk version is more modern - it refers to {{CharTermAttrubute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

    
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-3666:
--------------------------------

    Attachment: LUCENE-3666-branch_3x.patch

minor fixes to the branch_3x patch
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe resolved LUCENE-3666.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   3.6

Committed to branch_3x and trunk.
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe updated LUCENE-3666:
--------------------------------

    Attachment: LUCENE-3666-trunk.patch

Trunk patch.
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175804#comment-13175804 ] 

Steven Rowe commented on LUCENE-3666:
-------------------------------------

I think this is ready to commit.

I'll wait a few days before committing, though, to give people a chance to review.
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188592#comment-13188592 ] 

Steven Rowe commented on LUCENE-3666:
-------------------------------------

bq. This example consumer code is incomplete:

[snip]

The fixed version:

{code:java}
<PRE class="prettyprint">
    Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY
    Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer
    TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here"));
    OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class);
    
    try {
      ts.reset(); // Resets this stream to the beginning. (Required)
      while (ts.incrementToken()) {
        // Use {@link org.apache.lucene.util.AttributeSource#reflectAsString(boolean)}
        // for token stream debugging.
        System.out.println("token: " + ts.reflectAsString(true));

        System.out.println("token start offset: " + offsetAtt.startOffset());
        System.out.println("  token end offset: " + offsetAtt.endOffset());
      }
      ts.end();   // Perform end-of-stream operations, e.g. set the final offset.
    } finally {
      ts.close(); // Release resources associated with this stream.
    }
</PRE>
{code}

I also wrapped the other {{TokenStream}} examples with {code:java}try { ... } finally { ts.close(); }{code}

bq. The copyTo must call set setPos() method of the attribute interface.

Here's the fixed version:

{code:java}
  {@literal @Override}
  public void copyTo(AttributeImpl target) {
    ((PartOfSpeechAttribute) target).setPartOfSpeech(pos);
  }
{code}

I'll commit shortly.
                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis package summary

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188103#comment-13188103 ] 

Uwe Schindler commented on LUCENE-3666:
---------------------------------------

Here my commets as posted on IRC:

22:38	ThetaPh1	+ A CharStream adds character offset correction functionality over
22:38	ThetaPh1	+ {@link java.io.Reader}. All Tokenizers accept a CharStream instead of
22:38	ThetaPh1	+ Reader as input, which enables arbitrary character based filtering
22:38	ThetaPh1	+ before tokenization.
22:39	ThetaPh1	ah charfilters are also there
22:39	ThetaPh1	because that description is a little bit limited, charstreams on itsself are never used
22:40	sarowe	right
22:40	ThetaPh1	but there is missing some general information what CharFilters do, at least I dont see it in the patch
22:40	ThetaPh1	the reader simply say: wtf is this charstream good for?
22:40	sarowe	good point
22:40	sarowe	I'll revisit
22:41	ThetaPh1	in the following para i would replace CharStream by CharFilter
22:41	sarowe	(I know more about CharFilter guts after working on HTMLStripCharFilter replacement)
22:41	ThetaPh1	the input is in all cases a Reader
22:41	ThetaPh1	hehe yes
22:41	ThetaPh1	in my opinion the charfilters are horrible by the design
22:41	ThetaPh1	we changed it shortly before 2.9 to fix some very bad behaviour
22:41	sarowe	right, I recall that - performance fixes
22:41	ThetaPh1	but its still hard to understand whats going on
22:42	sarowe	yes, and no docs
22:42	ThetaPh1	the problem is that they wrap Readers
22:42	ThetaPh1	and instanceof checks in Tokenizer and so on
22:42	sarowe	I've added a little more docs in the JFlexHTMLStripCharFilter issue
22:42	ThetaPh1	to prevent those instanceof checks everywhere in code, Tokenizer has a correctOffset method, right?
22:43	sarowe	ok, I know about the method, didn't know that was why it was there
22:43	ThetaPh1	+ <b>Lucene 2.9 introduced a new TokenStream API. Please see the section "New TokenStream API" below for more details.</b>
22:43	ThetaPh1	we should chnage the second sentence, there is no old api anymore
22:43	sarowe	right
22:44	sarowe	in trunk, anyway
22:45	ThetaPh1	in 3.x, the same
22:45	ThetaPh1	and remove "new"
22:45	ThetaPh1	the example with LengthFilter is good
22:45	sarowe	cool
22:45	ThetaPh1	as it shows as example how its implemented (for filtering tokens based on accept())
22:46	ThetaPh1	but also how a conventional filter would look like
22:46	sarowe	right
22:47	ThetaPh1	equals and hascode no longer need to be implemented
22:47	ThetaPh1	its no longer required
22:47	sarowe	ok
22:48	ThetaPh1	+ {@literal @Override}
22:48	ThetaPh1	public void copyTo(AttributeImpl target) {
22:48	ThetaPh1	((PartOfSpeechAttributeImpl) target).pos = pos;
22:48	ThetaPh1	}
22:48	ThetaPh1	this one shpoudl not cast to *Impl
22:48	ThetaPh1	it should simply cast to the interface
22:48	sarowe	ok
22:48	ThetaPh1	its done like this in all attributes in lucene, maybe we missed that one in docs
22:49	sarowe	I'll check
22:49	ThetaPh1	the idea is that e.g. a CharTermAttribute can be copied to a good old Token (die,die,die)
22:49	ThetaPh1	so the copy operation should not rely on the type
22:49	ThetaPh1	i mean impl
22:49	sarowe	right, the interface instead
22:50	ThetaPh1	((PartOfSpeechAttributeImpl) target).setPos(pos);
22:50	ThetaPh1	something like that
22:50	ThetaPh1	a without impl
22:50	sarowe	:) right
22:50	ThetaPh1	((PartOfSpeechAttribute) target).setPos(pos);
22:50	sarowe	ok
22:50	ThetaPh1	attributes also no longer need to impl toString(), but thats not in the example
22:51	ThetaPh1	they can implement reflectWith for nice debugging output in solr
22:51	ThetaPh1	but thats too much information
22:51	sarowe	:)
22:51	ThetaPh1	just remove the hashcode/equals and toString if they are in exaple
22:51	ThetaPh1	a minimum example would be ideal
22:51	sarowe	ok
22:52	ThetaPh1	+<code>AttributeImpl</code> class and therefore implements its abstract methods <code>clear(), copyTo(), equals(), hashCode()</code>.
22:52	ThetaPh1	not sure how this is solved in 3.x
22:52	ThetaPh1	in trunk they are gone
22:52	ThetaPh1	(have to look up)
22:52	sarowe	ok
22:52	ThetaPh1	i only know that in 3.x most attributes that existed before simply implement equals/hashcode
22:52	ThetaPh1	but just for backwards reasons
22:53	sarowe	ok
22:53	ThetaPh1	one thing
22:54	ThetaPh1	you should note for CharTermAttribute that it implemens CharSequence and Appendable
22:54	ThetaPh1	i had a code review before
22:54	ThetaPh1	and have seen stupidness like calling toString() useless
22:54	sarowe	right
22:54	ThetaPh1	i have seen people doing termAtt.toString().length() < 10 in a lengthfilter-like fileter
22:54	sarowe	that's the main reason for CharTermAttr to replace TermAttr, I believe
22:55	ThetaPh1	yes
22:55	ThetaPh1	otherwise I see nothing wrong
22:55	sarowe	cool, thanks for the review

                
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
>                 Key: LUCENE-3666
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3666
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: general/javadocs
>    Affects Versions: 3.5
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues.  E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org