You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (JIRA)" <ji...@apache.org> on 2009/06/12 15:00:09 UTC
[jira] Created: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
---------------------------------------------------------------------------------------
Key: LUCENE-1688
URL: https://issues.apache.org/jira/browse/LUCENE-1688
Project: Lucene - Java
Issue Type: Improvement
Reporter: Simon Willnauer
Priority: Minor
Fix For: 2.9, 3.0
StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
let me know what you think and I create a patch for it.
simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller updated LUCENE-1688:
--------------------------------
Attachment: LUCENE-1688.patch
To trunk. Still needs a bit of a look over.
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: LUCENE-1688.patch, StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718863#action_12718863 ]
Michael McCandless commented on LUCENE-1688:
--------------------------------------------
This sounds great Simon!
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Priority: Minor
> Fix For: 2.9, 3.0
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller resolved LUCENE-1688.
---------------------------------
Resolution: Fixed
Fix Version/s: (was: 3.0)
Lucene Fields: [New, Patch Available] (was: [New])
Thanks Simon!
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1688.patch, LUCENE-1688.patch, StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon Willnauer updated LUCENE-1688:
------------------------------------
Attachment: StopWords.patch
Attached a patch that marks the ENGLISH_STOP_WORDS as deprecated.
I cleaned up in StopAnalyzer (final anyway) a little bit)
Added a UnmodifiableCharArraySet impl as an private inner class + testcase
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719630#action_12719630 ]
Mark Miller commented on LUCENE-1688:
-------------------------------------
If no one else claims this for 2.9, I guess I'll do it.
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Assigned: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller reassigned LUCENE-1688:
-----------------------------------
Assignee: Mark Miller
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723682#action_12723682 ]
Mark Miller commented on LUCENE-1688:
-------------------------------------
all tests pass
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: LUCENE-1688.patch, StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1688) Deprecating StopAnalyzer
ENGLISH_STOP_WORDS - General replacement with an immutable Set
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller updated LUCENE-1688:
--------------------------------
Attachment: LUCENE-1688.patch
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: LUCENE-1688.patch, LUCENE-1688.patch, StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array ENGLISH_STOP_WORDS by default in various places. Internally this array is converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and replace it with an immutable implementation of CharArraySet. Inside an analyzer it does not make sense to have a mutable set anyway and we could prevent set creation each time an analyzer is created. In the case of an immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code, do not have a PUBLIC static reference to an array (which is mutable) and reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org