You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chris Male (JIRA)" <ji...@apache.org> on 2011/08/23 09:52:29 UTC

[jira] [Created] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Make TokenStream Reuse Mandatory for Analyzers
----------------------------------------------

                 Key: LUCENE-3396
                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Chris Male


In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.

I plan to attack this in the following way:

- Collapse the logic of ReusableAnalyzerBase into Analyzer
- Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
- Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
- Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-remaining-merging.patch

Patch which finalizes the remaining Analyzer changes and merges ReusableAnalyzerBase and Analyzer together.  

Command for the patch:

{code}
# Keep old Analyzer.java temporarily for comparison
svn rename lucene/src/java/org/apache/lucene/analysis/Analyzer.java lucene/src/java/org/apache/lucene/analysis/Analyzer.java.old
svn rename lucene/src/java/org/apache/lucene/analysis/ReusableAnalyzerBase.java lucene/src/java/org/apache/lucene/analysis/Analyzer.java
{code}

I have updated the javadocs to reflect the merging and have removed the assertFinal() assertion since tokenStream() and reusableTokenStream() are always final now.

All tests pass and javadocs build.  I am looking to commit this soon (with a MIGRATE.txt entry).

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

Patch updated to trunk.

We're good to go now.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114131#comment-13114131 ] 

Robert Muir commented on LUCENE-3396:
-------------------------------------

I think this one can be marked resolved?

by the way, I reviewed the patches here: thanks for putting in all this effort!
this fixes my biggest pet peeve about lucene....awesome.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102508#comment-13102508 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Committed revision 1169654.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149541#comment-13149541 ] 

Simon Willnauer commented on LUCENE-3396:
-----------------------------------------

chris, this seems to be done no? can you close it?
                
> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097076#comment-13097076 ] 

Chris Male edited comment on LUCENE-3396 at 9/5/11 9:44 AM:
------------------------------------------------------------

Hi Uwe,

I originally had ReuseStrategy with a generic type but then decided it was overkill since it only benefits implementations, not users of ReuseStrategy.  If we want the extra type safety, I'll happily make the change.

      was (Author: cmale):
    Hi Uwe,

I originally had ReuseStrategy with a generic type but then decided it was overkill.  If we want the extra type safety, I'll happily make the change.
  
> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102443#comment-13102443 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Committed revision 1169607.

Now attacking the remaining Analyzers.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male resolved LUCENE-3396.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
         Assignee: Chris Male

TokenStream reuse is now mandatory
                
> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>            Assignee: Chris Male
>             Fix For: 4.0
>
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-forgotten.patch

During the merging of changes I lost some conversions of some test Analyzers.

Patch addresses them (including the unholy TestPayloads).

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089394#comment-13089394 ] 

Robert Muir commented on LUCENE-3396:
-------------------------------------

can we create a separate sub-tasks to fix the tests (so we can backport at least a majority of the changes)?

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

Patch updated to reset now returns void.  I'll make sure to note this compat break in the CHANGES.txt.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114136#comment-13114136 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

I haven't actually committed yet.  I was hoping you'd have a chance to review before I did.  I'll now commit.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097069#comment-13097069 ] 

Simon Willnauer commented on LUCENE-3396:
-----------------------------------------

this patch looks good to me! I like the reuse strategy and how you factored out the thread local stuff. 
I think we should commit and let bake this in?

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098057#comment-13098057 ] 

Robert Muir commented on LUCENE-3396:
-------------------------------------

patch looks great: one question, should we keep the 'reset can return false and we do not reuse' ?

this seems like it might be obselete, though it would introduce an API break, I think maybe we should change it to void?


> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

Patch updated to trunk.

Generic <T> is removed from ReuseStrategy.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098580#comment-13098580 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Hmmm, I agree that we should change it to void.  If the source cannot be reset, it should throw an Exception.  We need to be able to rely on the fact that we are using reusable components.

I'll update the patch.


> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

Improved patch which does some cleanup and documentation ReuseAnalyzerBase.ReuseStrategy.

I'm seeing one test fail in TestPayloads, this class does some insanely non-reusable stuff with payloads.  I'll keep digging.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098045#comment-13098045 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

I'm going to commit this soon and then work on the remaining Analyzers.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114149#comment-13114149 ] 

Robert Muir commented on LUCENE-3396:
-------------------------------------

+1!

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097076#comment-13097076 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Hi Uwe,

I originally had ReuseStrategy with a generic type but then decided it was overkill.  If we want the extra type safety, I'll happily make the change.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

Grrr TestPayloads does some really non-reusable stuff.  The only two options I can see are either using a NoReuseStrategy implementation in the test, or instantiate the Analyzer again every time it does something crazy.

I've chosen the latter since I don't want to start the NoReuseStrategy slippery slope.

We're in good shape now (minus this stupidity).

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097083#comment-13097083 ] 

Uwe Schindler commented on LUCENE-3396:
---------------------------------------

I agree its somehow overkill. But if not on class level I would even remove the T parameter from ther getter method, because it does not really fit, it is only used there, not even on the setter. There is no type enforcement anywhere, so the extra T is just to remove the casting on the caller of the protected method, but adding a SuppressWarnings on the implementor's side. So either make all T or use Object everywhere.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089414#comment-13089414 ] 

Robert Muir commented on LUCENE-3396:
-------------------------------------

yeah: just thinking it could be a nice cleanup? Some of these are messy :)

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089417#comment-13089417 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Done. LUCENE-3397

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

This is the uber patch which converts all but 8 Analyzers over to using ReusableAnalyzerBase.

This also introduces the ReuseStrategy concept to ReusableAnalyzerBase (will need some documentation before committing).  For simplicity, the ReuseStrategy maintains its own CloseableThreadLocal.

The remaining Analyzers can be converted when I push the ReuseAnalyzerBase logic into Analyzer.

After some small tidy (and a review) I think we should commit this even though we're going to collapse ReusabeAnalyzerBase.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097072#comment-13097072 ] 

Uwe Schindler commented on LUCENE-3396:
---------------------------------------

Hi, the ReuseStrategies look fine. I am just confused about the Generics. Why not make the whole abstract ReuseStrategy T-typed? Then also ThreadLocal<T> is used and no casting anywhere. The subclasses for PerField and global then are typed to the correct class (Map<> or TSComponents).

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114151#comment-13114151 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Committed revision 1175297.

I don't want to mark this as resolved just yet.  I want to spin off another sub-task to move all consumers over to reusableTokenStream (and then rename it back to tokenStream).

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089396#comment-13089396 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Do you mean correcting the TokenStreams in tests so they are reusable?

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-remaining-analyzers.patch

Patch which converts the last of the Analyzers over to using ReusableAnalyzerBase.  At this stage, RAB is now the only extension of Analyzer.

Patch includes a few unique changes:

- Adds AnalyzerWrapper which is used by Analyzers which wrap other Analyzers.  This is necessary to allow access the TokenStreamComponents of the wrapped Analyzers, without making TSC public.

- IndexSchemaRuntimeFieldTest is made out of IndexSchemaTest due to testRuntimeFieldCreation doing some thread-unsafe changes to the Analyzers stored in  IndexSchema.  When run in IndexSchemaTest, depending on the execution order, the test fails.  When run by itself, there is no problems.  I think this is okay because the actual code being tested is documented as being thread-unsafe and the test also notes it does some tacky things.

I'm not going to look to commit this just yet, as I want to collapse Analyzer and RAB into a single class.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male updated LUCENE-3396:
-------------------------------

    Attachment: LUCENE-3396-rab.patch

Absolutely, I missed that fieldname usage.

Updated patch.  Still all green.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091818#comment-13091818 ] 

Robert Muir commented on LUCENE-3396:
-------------------------------------

just a quick glance: the MockAnalyzer needs to reuse-per-field?

This way we test the case where payloads are enabled for one field but not for another.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113124#comment-13113124 ] 

Chris Male commented on LUCENE-3396:
------------------------------------

Plan to commit this tomorrow.

> Make TokenStream Reuse Mandatory for Analyzers
> ----------------------------------------------
>
>                 Key: LUCENE-3396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3396
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch
>
>
> In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams.  This is a big chunk of work, but its time to bite the bullet.
> I plan to attack this in the following way:
> - Collapse the logic of ReusableAnalyzerBase into Analyzer
> - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field.
> - Convert all Analyzers over to using TokenStreamComponents.  I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused).
> - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org