You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/07/26 15:08:53 UTC

[jira] Created: (LUCENE-2564) wordlistloader is inefficient

wordlistloader is inefficient
-----------------------------

                 Key: LUCENE-2564
                 URL: https://issues.apache.org/jira/browse/LUCENE-2564
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/analyzers
            Reporter: Robert Muir
            Assignee: Robert Muir
             Fix For: 3.1, 4.0


WordListLoader is basically used for loading up stopwords lists, stem dictionaries, etc.
Unfortunately the api returns Set<String> and sometimes even HashSet<String> or HashMap<String,String>

I think we should break it and return CharArraySets and CharArrayMaps (but leave the return value as generic Set,Map).

If someone objects to breaking it in 3.1, then we can do this only in 4.0, but i think it would be good to fix it both places.
The reason is that if someone does new FooAnalyzer() a lot (probably not uncommon) i think its doing a bunch of useless copying.

I think we should slap @lucene.internal on this API too, since thats mostly how its being used.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2564) wordlistloader is inefficient

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892316#action_12892316 ] 

Robert Muir commented on LUCENE-2564:
-------------------------------------

There are more problems with this loader... it uses FileReader (platform-dependent encoding).
I think we should break it to default to UTF-8, too.


> wordlistloader is inefficient
> -----------------------------
>
>                 Key: LUCENE-2564
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2564
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1, 4.0
>
>
> WordListLoader is basically used for loading up stopwords lists, stem dictionaries, etc.
> Unfortunately the api returns Set<String> and sometimes even HashSet<String> or HashMap<String,String>
> I think we should break it and return CharArraySets and CharArrayMaps (but leave the return value as generic Set,Map).
> If someone objects to breaking it in 3.1, then we can do this only in 4.0, but i think it would be good to fix it both places.
> The reason is that if someone does new FooAnalyzer() a lot (probably not uncommon) i think its doing a bunch of useless copying.
> I think we should slap @lucene.internal on this API too, since thats mostly how its being used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org