You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Thomas Neidhart (JIRA)" <ji...@apache.org> on 2015/11/27 22:48:12 UTC
[jira] [Closed] (COLLECTIONS-547) Performance issue in
SetUtils::isEqualSet
[ https://issues.apache.org/jira/browse/COLLECTIONS-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Neidhart closed COLLECTIONS-547.
---------------------------------------
> Performance issue in SetUtils::isEqualSet
> -----------------------------------------
>
> Key: COLLECTIONS-547
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-547
> Project: Commons Collections
> Issue Type: Bug
> Components: Set
> Affects Versions: 4.0
> Environment: Ubuntu 14.04
> Reporter: Oswaldo Olivo
> Priority: Minor
> Labels: perfomance
>
> There seems to be a performance problem with the function isEqualSet of
> SetUtils when the first parameter is of a collection type that has a slow containsAll/contains method.
> The following is the code of the function:
> {code}
> /**
> * Tests two sets for equality as per the <code>equals()</code> contract
> * in {@link java.util.Set#equals(java.lang.Object)}.
> * <p>
> * This method is useful for implementing <code>Set</code> when you cannot
> * extend AbstractSet. The method takes Collection instances to enable other
> * collection types to use the Set implementation algorithm.
> * <p>
> * The relevant text (slightly paraphrased as this is a static method) is:
> * <blockquote>
> * <p>Two sets are considered equal if they have
> * the same size, and every member of the first set is contained in
> * the second. This ensures that the <tt>equals</tt> method works
> * properly across different implementations of the <tt>Set</tt>
> * interface.</p>
> *
> * <p>
> * This implementation first checks if the two sets are the same object:
> * if so it returns <tt>true</tt>. Then, it checks if the two sets are
> * identical in size; if not, it returns false. If so, it returns
> * <tt>a.containsAll((Collection) b)</tt>.</p>
> * </blockquote>
> *
> * @see java.util.Set
> * @param set1 the first set, may be null
> * @param set2 the second set, may be null
> * @return whether the sets are equal by value comparison
> */
> public static boolean isEqualSet(final Collection<?> set1, final Collection<?> set2) {
> if (set1 == set2) {
> return true;
> }
> if (set1 == null || set2 == null || set1.size() != set2.size()) {
> return false;
> }
> return set1.containsAll(set2);
> }
> {code}
> The problem is that in the last return statement, the function relies on the
> containsAll method of the class of the set1, which can be any type of collection.
> The following test harness shows the performance degradation between
> using a list and using a set as a first parameter, across different collection sizes.
> {code}
> public static void setUtilsisEqualSetTest(boolean original) {
> int N=500000;
> ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
> ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
> for(int i=0;i<N;++i) {
> firstArrayList.add(new Integer(i));
> secondArrayList.add(new Integer((N-1)-i));
>
> }
> SetUtils.isEqualSet(original?firstArrayList:(new HashSet<Integer>(firstArrayList)),secondArrayList);
>
> {code}
> In the following table "Original" corresponds to the time taken by
> the existing implementation of SetUtils::isEqualSet, "Repaired" to the time
> taken by the function invoked with the first parameter converted into a
> set, and "Speed-up" to the speed-up obtained after the repair.
> N Original(s) Repaired(s) Speed-up(X)
> 10 1.01 1.02 0.99
> 100 1.02 1.02 1
> 1000 1.04 1.04 1
> 10000 1.15 1.09 1.05
> 100000 9.33 1.11 8.40
> 500000 > 300 1.26 > 238.09
> One way to deal with this would be to call the CollectionUtils::containsAll method instead of Collection::containsAll, since it has a linear time implementation instead of quadratic, and can handle the types of isEqualSet.
> Another solution would be to include a warning to the user in the documentation that the first parameter should have a fast containment method when possible.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)