You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2014/08/03 08:22:12 UTC
[jira] [Comment Edited] (SOLR-912)
org.apache.solr.common.util.NamedList - Typesafe efficient variant -
ModernNamedList introduced - implementing the same API as NamedList
[ https://issues.apache.org/jira/browse/SOLR-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083884#comment-14083884 ]
Shai Erera edited comment on SOLR-912 at 8/3/14 6:21 AM:
---------------------------------------------------------
Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:
* Its {{<T>}} generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of {{<name,value>}} pairs, where name is always String, and the type of {{value}} may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like {{getInt(), getString()...}}. I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic from the API signature.
* There is what seems to be a totally redundant {{SimpleOrderedMap}} class, which has contradicting documentation in its class-jdocs:
** _a {{NamedList}} where access by key is more important than maintaining order_
** _This class does not provide efficient lookup by key_
But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.
Leaving performance aside for a second, NamedList could simply hold an internal {{Map<String,List<Object>>}} to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't know how important is this functionality though .. i.e. if we replaced it with a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me -- how important is {{getName/Val(i)}}?
Now back to performance, for a sec, in order to not always allocate a {{List<Object>}} when NamedList is used to hold only one value per parameter, we can either:
* Use Collections.singletonList() on first _add_, and change to a concrete List on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.
BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway.
Maybe we should break this down into 3 issues:
* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.
was (Author: shaie):
Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:
* Its {{<T>}} generic cause nothing but TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of {{<name,value>}} pairs, where name is always String, and the type of {{value}} may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like {{getInt(), getString()...}}. I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic from the API signature.
* There is what seems to be a totally redundant {{SimpleOrderedMap}} class, which has contradicting documentation in its ctor:
** _a {{NamedList}} where access by key is more important than maintaining order_
** _This class does not provide efficient lookup by key_
But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.
Leaving performance aside for a second, NamedList could simply hold an internal {{Map<String,List<Object>>}} to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't know how important is this functionality though .. i.e. if we replaced it with a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me -- how important is {{getName/Val(i)}}?
Now back to performance, for a sec, in order to not always allocate a {{List<Object>}} when NamedList is used to hold only one value per parameter, we can either:
* Use Collections.singletonList() on first _add_, and change to a concrete List on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.
BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway.
Maybe we should break this down into 3 issues:
* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.
> org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-912
> URL: https://issues.apache.org/jira/browse/SOLR-912
> Project: Solr
> Issue Type: Improvement
> Components: search
> Environment: Tomcat 6, JRE 6, Solr 1.3+ nightlies
> Reporter: Karthik K
> Priority: Minor
> Attachments: NLProfile.java, SOLR-912.patch, SOLR-912.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> The implementation of NamedList - while being fast - is not necessarily type-safe. I have implemented an additional implementation of the same - ModernNamedList (a type-safe variation providing the same interface as NamedList) - while preserving the semantics in terms of ordering of elements and allowing null elements for key and values (keys are always Strings , while values correspond to generics ).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org