You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2009/08/09 20:14:15 UTC

[jira] Created: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

implement reusable token streams for all Solr tokenizers / token filters
------------------------------------------------------------------------

                 Key: SOLR-1353
                 URL: https://issues.apache.org/jira/browse/SOLR-1353
             Project: Solr
          Issue Type: Bug
    Affects Versions: 1.4
            Reporter: Yonik Seeley
            Assignee: Yonik Seeley
            Priority: Blocker
             Fix For: 1.4


The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743860#action_12743860 ] 

Yonik Seeley commented on SOLR-1353:
------------------------------------

FYI, with all these changes, but with reuse turned off, I was seeing 10% slower performance than the pre-reflection code.  Some of that performance impact could have been due to more mixing of old and new style APIs, or proper clearing of attributes, etc.

> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1353
>                 URL: https://issues.apache.org/jira/browse/SOLR-1353
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 1.4
>
>         Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1353:
-------------------------------

    Attachment: SOLR-1353.patch

Patch implementing reusable analyzers.
Simple filters have been converted to use the new API.
Complex filters such as synonym and WFD have not been converted.

> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1353
>                 URL: https://issues.apache.org/jira/browse/SOLR-1353
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 1.4
>
>         Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved SOLR-1353.
--------------------------------

    Resolution: Fixed

Committed.

> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1353
>                 URL: https://issues.apache.org/jira/browse/SOLR-1353
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 1.4
>
>         Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743859#action_12743859 ] 

Yonik Seeley commented on SOLR-1353:
------------------------------------

Yes, on my simple short field test, I got about a 90% increase in performance vs the pre-reflection (but still attribute based) code.
I don't know how it compares to the code pre-attributes.

> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1353
>                 URL: https://issues.apache.org/jira/browse/SOLR-1353
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 1.4
>
>         Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743858#action_12743858 ] 

Robert Muir commented on SOLR-1353:
-----------------------------------

seems to almost double throughput... how does this compare to pre-reflection etc... is it actually any faster?

> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1353
>                 URL: https://issues.apache.org/jira/browse/SOLR-1353
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 1.4
>
>         Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741134#action_12741134 ] 

Robert Muir commented on SOLR-1353:
-----------------------------------

Yonik, at least in the case of analyzer class=xxx, I think many lucene contrib analyzers do not even implement reusableTokenStream... (so they are creating a new copy each time)!

> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
>                 Key: SOLR-1353
>                 URL: https://issues.apache.org/jira/browse/SOLR-1353
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 1.4
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.