You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Thomas Neidhart (JIRA)" <ji...@apache.org> on 2015/11/27 22:48:12 UTC

[jira] [Closed] (COLLECTIONS-547) Performance issue in SetUtils::isEqualSet

     [ https://issues.apache.org/jira/browse/COLLECTIONS-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart closed COLLECTIONS-547.
---------------------------------------

> Performance issue in SetUtils::isEqualSet
> -----------------------------------------
>
>                 Key: COLLECTIONS-547
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-547
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Set
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>            Priority: Minor
>              Labels: perfomance
>
> There seems to be a performance problem with the function isEqualSet of 
> SetUtils when the first parameter is of a collection type that has a slow containsAll/contains method.
> The following is the code of the function:
> {code}
>     /**
>      * Tests two sets for equality as per the <code>equals()</code> contract
>      * in {@link java.util.Set#equals(java.lang.Object)}.
>      * <p>
>      * This method is useful for implementing <code>Set</code> when you cannot
>      * extend AbstractSet. The method takes Collection instances to enable other
>      * collection types to use the Set implementation algorithm.
>      * <p>
>      * The relevant text (slightly paraphrased as this is a static method) is:
>      * <blockquote>
>      * <p>Two sets are considered equal if they have
>      * the same size, and every member of the first set is contained in
>      * the second. This ensures that the <tt>equals</tt> method works
>      * properly across different implementations of the <tt>Set</tt>
>      * interface.</p>
>      *
>      * <p>
>      * This implementation first checks if the two sets are the same object:
>      * if so it returns <tt>true</tt>.  Then, it checks if the two sets are
>      * identical in size; if not, it returns false. If so, it returns
>      * <tt>a.containsAll((Collection) b)</tt>.</p>
>      * </blockquote>
>      *
>      * @see java.util.Set
>      * @param set1  the first set, may be null
>      * @param set2  the second set, may be null
>      * @return whether the sets are equal by value comparison
>      */
>     public static boolean isEqualSet(final Collection<?> set1, final Collection<?> set2) {
>         if (set1 == set2) {
>             return true;
>         }
>         if (set1 == null || set2 == null || set1.size() != set2.size()) {
>             return false;
>         }
>         return set1.containsAll(set2);
>     }
> {code}
> The problem is that in the last return statement, the function relies on the 
> containsAll method of the class of the set1, which can be any type of collection.
> The following test harness shows the performance degradation between 
> using a list and using a set as a first parameter, across different collection sizes.
> {code}
>     public static void 	setUtilsisEqualSetTest(boolean original) {
> 	int N=500000;
> 	ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> 	ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> 	for(int i=0;i<N;++i) {
> 	    firstArrayList.add(new Integer(i));
> 	    secondArrayList.add(new Integer((N-1)-i));
> 	    
> 	}
> 	SetUtils.isEqualSet(original?firstArrayList:(new HashSet<Integer>(firstArrayList)),secondArrayList);
> 	
> {code}
> In the following table "Original" corresponds to the time taken by 
> the existing implementation of SetUtils::isEqualSet, "Repaired" to the time 
> taken by the function invoked with the first parameter converted into a 
> set, and "Speed-up" to the speed-up obtained after the repair.
>        N	Original(s)	Repaired(s)	Speed-up(X)
>       10	          1.01		    1.02		     0.99
>      100	          1.02		    1.02		     1
>     1000	          1.04		    1.04		     1
>    10000	          1.15		    1.09		     1.05
>   100000	          9.33		    1.11		     8.40
>   500000	        > 300	            1.26		     > 238.09
> One way to deal with this would be to call the CollectionUtils::containsAll method instead of Collection::containsAll, since it has a linear time implementation instead of quadratic, and can handle the types of isEqualSet.
> Another solution would be to include a warning to the user in the documentation that the first parameter should have a fast containment method when possible.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)