You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2014/08/03 08:20:11 UTC

[jira] [Commented] (SOLR-912) org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList

    [ https://issues.apache.org/jira/browse/SOLR-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083884#comment-14083884 ] 

Shai Erera commented on SOLR-912:
---------------------------------

Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:

* Its {{<T>}} generic cause nothing but TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of {{<name,value>}} pairs, where name is always String, and the type of {{value}} may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like {{getInt(), getString()...}}. I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic from the API signature.

* There is what seems to be a totally redundant {{SimpleOrderedMap}} class, which has contradicting documentation in its ctor:
** _a {{NamedList}} where access by key is more important than maintaining order_
** _This class does not provide efficient lookup by key_

But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.

Leaving performance aside for a second, NamedList could simply hold an internal {{Map<String,List<Object>>}} to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't know how important is this functionality though .. i.e. if we replaced it with a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me -- how important is {{getName/Val(i)}}?

Now back to performance, for a sec, in order to not always allocate a {{List<Object>}} when NamedList is used to hold only one value per parameter, we can either:

* Use Collections.singletonList() on first _add_, and change to a concrete List on the second _add_ only.
* Use an {{Object[]}}, it's less expensive than a List object.
* Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.

BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway.

Maybe we should break this down into 3 issues:

* Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.
* Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.
* Simplify NamedList internal implementation. On the performance side -- how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.

> org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-912
>                 URL: https://issues.apache.org/jira/browse/SOLR-912
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>         Environment: Tomcat 6, JRE 6, Solr 1.3+ nightlies 
>            Reporter: Karthik K
>            Priority: Minor
>         Attachments: NLProfile.java, SOLR-912.patch, SOLR-912.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The implementation of NamedList - while being fast - is not necessarily type-safe. I have implemented an additional implementation of the same - ModernNamedList (a type-safe variation providing the same interface as NamedList) - while preserving the semantics in terms of ordering of elements and allowing null elements for key and values (keys are always Strings , while values correspond to generics ). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org