You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Commented) (JIRA)" <ji...@apache.org> on 2011/11/09 18:59:51 UTC

[jira] [Commented] (LUCENE-2564) wordlistloader is inefficient

    [ https://issues.apache.org/jira/browse/LUCENE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147192#comment-13147192 ] 

Robert Muir commented on LUCENE-2564:
-------------------------------------

patch looks good... i was just referring to Solr's resource loading of stopwords and stuff.

but we don't have to do that here, imo we should fix the issues here first.

Maybe for the javadocs on getReader we should explain that unlike the java default, it creates
a reader that will throw an exception if it detects the charset is wrong 
(so this is good for configuration files-reading like WordListLoader, but not recommended
for say documents crawled from the web or something)

                
> wordlistloader is inefficient
> -----------------------------
>
>                 Key: LUCENE-2564
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2564
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-2564.patch
>
>
> WordListLoader is basically used for loading up stopwords lists, stem dictionaries, etc.
> Unfortunately the api returns Set<String> and sometimes even HashSet<String> or HashMap<String,String>
> I think we should break it and return CharArraySets and CharArrayMaps (but leave the return value as generic Set,Map).
> If someone objects to breaking it in 3.1, then we can do this only in 4.0, but i think it would be good to fix it both places.
> The reason is that if someone does new FooAnalyzer() a lot (probably not uncommon) i think its doing a bunch of useless copying.
> I think we should slap @lucene.internal on this API too, since thats mostly how its being used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org