You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Rowe (Created) (JIRA)" <ji...@apache.org> on 2011/12/23 23:52:31 UTC
[jira] [Created] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Update org.apache.lucene.analysis package summary
-------------------------------------------------
Key: LUCENE-3666
URL: https://issues.apache.org/jira/browse/LUCENE-3666
Project: Lucene - Java
Issue Type: Improvement
Components: general/javadocs
Affects Versions: 3.5
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor
{{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
The trunk version is more modern - it refers to {{CharTermAttrubute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3666) Update
org.apache.lucene.analysis package summary
Posted by "Steven Rowe (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175616#comment-13175616 ]
Steven Rowe edited comment on LUCENE-3666 at 12/25/11 4:56 AM:
---------------------------------------------------------------
Patch for branch_3x.
Changes:
# Added {{CharStream}}/{{-Filter}} to analysis components discussion
# {{TermAttribute}} -> {{CharTermAttribute}}
# Added {{KeywordAttribute}} to the list of out-of-the-box attributes
# {{Version}} parameter added to analysis component c-tors.
# Custom {{MyAnalyzer}} extends {{ReusableAnalyzerBase}}
# Added {{@Override}} annotation to overridden methods
# {{LengthFilter}} extends {{FilteringTokenFilter}}
was (Author: steve_rowe):
Patch for branch_3x.
Changes:
# Added {{CharStream}}/{{-Filter}} to analysis components discussion
# {{TermAttribute}} -> {{CharTermAttribute}}
# {{Version}} parameter added to analysis component c-tors.
# Custom {{MyAnalyzer}} extends {{ReusableAnalyzerBase}}
# Added {{@Override}} annotation to overridden methods
# {{LengthFilter}} extends {{FilteringTokenFilter}}
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3666:
--------------------------------
Attachment: LUCENE-3666-branch_3x.patch
Patch for branch_3x.
Changes:
# Added {{CharStream}}/{{-Filter}} to analysis components discussion
# {{TermAttribute}} -> {{CharTermAttribute}}
# {{Version}} parameter added to analysis component c-tors.
# Custom {{MyAnalyzer}} extends {{ReusableAnalyzerBase}}
# Added {{@Override}} annotation to overridden methods
# {{LengthFilter}} extends {{FilteringTokenFilter}}
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttrubute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188493#comment-13188493 ]
Uwe Schindler commented on LUCENE-3666:
---------------------------------------
oh small changes needed:
This example consumer code is incomplete:
{noformat}
+<PRE class="prettyprint">
+ Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_XY); // or any other analyzer
+ TokenStream ts = analyzer.tokenStream("myfield",new StringReader("some text goes here"));
+ while (ts.incrementToken()) {
+ System.out.println("token: "+ts));
+ }
+</PRE>
{noformat}
- TokenStream needs to call reset() before incrementing tokens (thats the contract)
- It should call end() after incrementToken()
- It must call close finally (ideally in try/finally)
Finally TokenStream no longer is required to implement toString(), so this one may produce useless standard toString() output (in 4.0 it does print TokenStreamClass@hashcode, in 3.x for backwards compatibility it prints the same like reflectAsString).
To get Token debug outbut, use [http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/util/AttributeSource.html#reflectAsString(boolean)], e.g. {code}System.out.println("token: "+ts.reflectAsString(true)){code}.
Ideally the example code would use one attribute as example.
The example attribute impl's copyTo is using the actual Attribute (not the impl) when casting, but the attribute has no fields, only methods. The copyTo must call set setPos() method of the attribute interface.
Thats all.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3666:
--------------------------------
Attachment: LUCENE-3666-branch_3x.patch
LUCENE-3666-trunk.patch
Patches incorporating Uwe's suggested changes.
Committing shortly.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188600#comment-13188600 ]
Steven Rowe commented on LUCENE-3666:
-------------------------------------
Committed to branch_3x and trunk.
Thanks Uwe!
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3666:
--------------------------------
Attachment: LUCENE-3666-branch_3x.patch
Updated branch_3x patch to remove javadocs warnings about @Override and @Deprecated annotatations in sample code by wrapping with {@literal ...}
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3666:
--------------------------------
Description:
{{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
was:
{{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
The trunk version is more modern - it refers to {{CharTermAttrubute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3666:
--------------------------------
Attachment: LUCENE-3666-branch_3x.patch
minor fixes to the branch_3x patch
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Resolved] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe resolved LUCENE-3666.
---------------------------------
Resolution: Fixed
Fix Version/s: 4.0
3.6
Committed to branch_3x and trunk.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-3666:
--------------------------------
Attachment: LUCENE-3666-trunk.patch
Trunk patch.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175804#comment-13175804 ]
Steven Rowe commented on LUCENE-3666:
-------------------------------------
I think this is ready to commit.
I'll wait a few days before committing, though, to give people a chance to review.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188592#comment-13188592 ]
Steven Rowe commented on LUCENE-3666:
-------------------------------------
bq. This example consumer code is incomplete:
[snip]
The fixed version:
{code:java}
<PRE class="prettyprint">
Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY
Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer
TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here"));
OffsetAttribute offsetAtt = addAttribute(OffsetAttribute.class);
try {
ts.reset(); // Resets this stream to the beginning. (Required)
while (ts.incrementToken()) {
// Use {@link org.apache.lucene.util.AttributeSource#reflectAsString(boolean)}
// for token stream debugging.
System.out.println("token: " + ts.reflectAsString(true));
System.out.println("token start offset: " + offsetAtt.startOffset());
System.out.println(" token end offset: " + offsetAtt.endOffset());
}
ts.end(); // Perform end-of-stream operations, e.g. set the final offset.
} finally {
ts.close(); // Release resources associated with this stream.
}
</PRE>
{code}
I also wrapped the other {{TokenStream}} examples with {code:java}try { ... } finally { ts.close(); }{code}
bq. The copyTo must call set setPos() method of the attribute interface.
Here's the fixed version:
{code:java}
{@literal @Override}
public void copyTo(AttributeImpl target) {
((PartOfSpeechAttribute) target).setPartOfSpeech(pos);
}
{code}
I'll commit shortly.
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3666) Update org.apache.lucene.analysis
package summary
Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188103#comment-13188103 ]
Uwe Schindler commented on LUCENE-3666:
---------------------------------------
Here my commets as posted on IRC:
22:38 ThetaPh1 + A CharStream adds character offset correction functionality over
22:38 ThetaPh1 + {@link java.io.Reader}. All Tokenizers accept a CharStream instead of
22:38 ThetaPh1 + Reader as input, which enables arbitrary character based filtering
22:38 ThetaPh1 + before tokenization.
22:39 ThetaPh1 ah charfilters are also there
22:39 ThetaPh1 because that description is a little bit limited, charstreams on itsself are never used
22:40 sarowe right
22:40 ThetaPh1 but there is missing some general information what CharFilters do, at least I dont see it in the patch
22:40 ThetaPh1 the reader simply say: wtf is this charstream good for?
22:40 sarowe good point
22:40 sarowe I'll revisit
22:41 ThetaPh1 in the following para i would replace CharStream by CharFilter
22:41 sarowe (I know more about CharFilter guts after working on HTMLStripCharFilter replacement)
22:41 ThetaPh1 the input is in all cases a Reader
22:41 ThetaPh1 hehe yes
22:41 ThetaPh1 in my opinion the charfilters are horrible by the design
22:41 ThetaPh1 we changed it shortly before 2.9 to fix some very bad behaviour
22:41 sarowe right, I recall that - performance fixes
22:41 ThetaPh1 but its still hard to understand whats going on
22:42 sarowe yes, and no docs
22:42 ThetaPh1 the problem is that they wrap Readers
22:42 ThetaPh1 and instanceof checks in Tokenizer and so on
22:42 sarowe I've added a little more docs in the JFlexHTMLStripCharFilter issue
22:42 ThetaPh1 to prevent those instanceof checks everywhere in code, Tokenizer has a correctOffset method, right?
22:43 sarowe ok, I know about the method, didn't know that was why it was there
22:43 ThetaPh1 + <b>Lucene 2.9 introduced a new TokenStream API. Please see the section "New TokenStream API" below for more details.</b>
22:43 ThetaPh1 we should chnage the second sentence, there is no old api anymore
22:43 sarowe right
22:44 sarowe in trunk, anyway
22:45 ThetaPh1 in 3.x, the same
22:45 ThetaPh1 and remove "new"
22:45 ThetaPh1 the example with LengthFilter is good
22:45 sarowe cool
22:45 ThetaPh1 as it shows as example how its implemented (for filtering tokens based on accept())
22:46 ThetaPh1 but also how a conventional filter would look like
22:46 sarowe right
22:47 ThetaPh1 equals and hascode no longer need to be implemented
22:47 ThetaPh1 its no longer required
22:47 sarowe ok
22:48 ThetaPh1 + {@literal @Override}
22:48 ThetaPh1 public void copyTo(AttributeImpl target) {
22:48 ThetaPh1 ((PartOfSpeechAttributeImpl) target).pos = pos;
22:48 ThetaPh1 }
22:48 ThetaPh1 this one shpoudl not cast to *Impl
22:48 ThetaPh1 it should simply cast to the interface
22:48 sarowe ok
22:48 ThetaPh1 its done like this in all attributes in lucene, maybe we missed that one in docs
22:49 sarowe I'll check
22:49 ThetaPh1 the idea is that e.g. a CharTermAttribute can be copied to a good old Token (die,die,die)
22:49 ThetaPh1 so the copy operation should not rely on the type
22:49 ThetaPh1 i mean impl
22:49 sarowe right, the interface instead
22:50 ThetaPh1 ((PartOfSpeechAttributeImpl) target).setPos(pos);
22:50 ThetaPh1 something like that
22:50 ThetaPh1 a without impl
22:50 sarowe :) right
22:50 ThetaPh1 ((PartOfSpeechAttribute) target).setPos(pos);
22:50 sarowe ok
22:50 ThetaPh1 attributes also no longer need to impl toString(), but thats not in the example
22:51 ThetaPh1 they can implement reflectWith for nice debugging output in solr
22:51 ThetaPh1 but thats too much information
22:51 sarowe :)
22:51 ThetaPh1 just remove the hashcode/equals and toString if they are in exaple
22:51 ThetaPh1 a minimum example would be ideal
22:51 sarowe ok
22:52 ThetaPh1 +<code>AttributeImpl</code> class and therefore implements its abstract methods <code>clear(), copyTo(), equals(), hashCode()</code>.
22:52 ThetaPh1 not sure how this is solved in 3.x
22:52 ThetaPh1 in trunk they are gone
22:52 ThetaPh1 (have to look up)
22:52 sarowe ok
22:52 ThetaPh1 i only know that in 3.x most attributes that existed before simply implement equals/hashcode
22:52 ThetaPh1 but just for backwards reasons
22:53 sarowe ok
22:53 ThetaPh1 one thing
22:54 ThetaPh1 you should note for CharTermAttribute that it implemens CharSequence and Appendable
22:54 ThetaPh1 i had a code review before
22:54 ThetaPh1 and have seen stupidness like calling toString() useless
22:54 sarowe right
22:54 ThetaPh1 i have seen people doing termAtt.toString().length() < 10 in a lengthfilter-like fileter
22:54 sarowe that's the main reason for CharTermAttr to replace TermAttr, I believe
22:55 ThetaPh1 yes
22:55 ThetaPh1 otherwise I see nothing wrong
22:55 sarowe cool, thanks for the review
> Update org.apache.lucene.analysis package summary
> -------------------------------------------------
>
> Key: LUCENE-3666
> URL: https://issues.apache.org/jira/browse/LUCENE-3666
> Project: Lucene - Java
> Issue Type: Improvement
> Components: general/javadocs
> Affects Versions: 3.5
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Priority: Minor
> Attachments: LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-branch_3x.patch, LUCENE-3666-trunk.patch
>
>
> {{package.html}} in {{lucene/src/java/org/apache/lucene/analysis/}} is out of date.
> It looks like the contents of the branch_3x version haven't changed substantially since the Lucene 2.9 release, e.g. it refers to {{TermAttribute}} instead of {{CharTermAttribute}}.
> The trunk version is more modern - it refers to {{CharTermAttribute}} - but it also has some issues. E.g., I can see that the {{LengthFilter}} discussion doesn't refer to {{FilteringTokenFilter}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org