You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/07/26 15:08:53 UTC
[jira] Created: (LUCENE-2564) wordlistloader is inefficient
wordlistloader is inefficient
-----------------------------
Key: LUCENE-2564
URL: https://issues.apache.org/jira/browse/LUCENE-2564
Project: Lucene - Java
Issue Type: Bug
Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.1, 4.0
WordListLoader is basically used for loading up stopwords lists, stem dictionaries, etc.
Unfortunately the api returns Set<String> and sometimes even HashSet<String> or HashMap<String,String>
I think we should break it and return CharArraySets and CharArrayMaps (but leave the return value as generic Set,Map).
If someone objects to breaking it in 3.1, then we can do this only in 4.0, but i think it would be good to fix it both places.
The reason is that if someone does new FooAnalyzer() a lot (probably not uncommon) i think its doing a bunch of useless copying.
I think we should slap @lucene.internal on this API too, since thats mostly how its being used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2564) wordlistloader is inefficient
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892316#action_12892316 ]
Robert Muir commented on LUCENE-2564:
-------------------------------------
There are more problems with this loader... it uses FileReader (platform-dependent encoding).
I think we should break it to default to UTF-8, too.
> wordlistloader is inefficient
> -----------------------------
>
> Key: LUCENE-2564
> URL: https://issues.apache.org/jira/browse/LUCENE-2564
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/analyzers
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
>
> WordListLoader is basically used for loading up stopwords lists, stem dictionaries, etc.
> Unfortunately the api returns Set<String> and sometimes even HashSet<String> or HashMap<String,String>
> I think we should break it and return CharArraySets and CharArrayMaps (but leave the return value as generic Set,Map).
> If someone objects to breaking it in 3.1, then we can do this only in 4.0, but i think it would be good to fix it both places.
> The reason is that if someone does new FooAnalyzer() a lot (probably not uncommon) i think its doing a bunch of useless copying.
> I think we should slap @lucene.internal on this API too, since thats mostly how its being used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org