You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2009/08/09 20:14:15 UTC
[jira] Created: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
implement reusable token streams for all Solr tokenizers / token filters
------------------------------------------------------------------------
Key: SOLR-1353
URL: https://issues.apache.org/jira/browse/SOLR-1353
Project: Solr
Issue Type: Bug
Affects Versions: 1.4
Reporter: Yonik Seeley
Assignee: Yonik Seeley
Priority: Blocker
Fix For: 1.4
The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743860#action_12743860 ]
Yonik Seeley commented on SOLR-1353:
------------------------------------
FYI, with all these changes, but with reuse turned off, I was seeing 10% slower performance than the pre-reflection code. Some of that performance impact could have been due to more mixing of old and new style APIs, or proper clearing of attributes, etc.
> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley updated SOLR-1353:
-------------------------------
Attachment: SOLR-1353.patch
Patch implementing reusable analyzers.
Simple filters have been converted to use the new API.
Complex filters such as synonym and WFD have not been converted.
> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonik Seeley resolved SOLR-1353.
--------------------------------
Resolution: Fixed
Committed.
> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743859#action_12743859 ]
Yonik Seeley commented on SOLR-1353:
------------------------------------
Yes, on my simple short field test, I got about a 90% increase in performance vs the pre-reflection (but still attribute based) code.
I don't know how it compares to the code pre-attributes.
> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743858#action_12743858 ]
Robert Muir commented on SOLR-1353:
-----------------------------------
seems to almost double throughput... how does this compare to pre-reflection etc... is it actually any faster?
> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1353) implement reusable token streams for
all Solr tokenizers / token filters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741134#action_12741134 ]
Robert Muir commented on SOLR-1353:
-----------------------------------
Yonik, at least in the case of analyzer class=xxx, I think many lucene contrib analyzers do not even implement reusableTokenStream... (so they are creating a new copy each time)!
> implement reusable token streams for all Solr tokenizers / token filters
> ------------------------------------------------------------------------
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 1.4
>
>
> The new lucene token architecture causes bad indexing performance if you don't happen to use reusable token streams.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.