You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Thomas Neidhart (JIRA)" <ji...@apache.org> on 2015/05/25 21:48:17 UTC
[jira] [Updated] (COLLECTIONS-549) Linear time implementation of retainAll for CollectionUtils

     [ https://issues.apache.org/jira/browse/COLLECTIONS-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart updated COLLECTIONS-549:
----------------------------------------
    Fix Version/s:     (was: 4.1)

> Linear time implementation of retainAll for CollectionUtils
> -----------------------------------------------------------
>
>                 Key: COLLECTIONS-549
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-549
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 4.0
>         Environment: Ubuntu 14.04
>            Reporter: Oswaldo Olivo
>              Labels: Feature, Performance
>
> CollectionUtils provides a linear time implementation of containsAll using additional linear space.
> There is a linear time implementation of retainAll as follows:
> {code}
>  // Eliminates from coll1 all elements that are not in coll2.
>     // It runs in O(m+n) size, requiring additional O(m) space.
>     public static boolean efficientRetainAll(final Collection<?> coll1,final Collection<?> coll2) {
> 	// If coll2 is empty there are no elements to retain.
> 	if(coll2.isEmpty()) {
> 	    return coll1.removeAll(coll1);
> 	}
> 	// Simple case when we're supposed to retain all elements
> 	// in the first collection.
> 	if(coll1==coll2)
> 	    return false;
> 	
> 	// it1 iterates over coll1 and it2 iterares over coll2,
> 	// and seen contains all the elements before it2, allowing 
> 	// us to never revisit previous elements.
> 	// The algorithm iterates through it1, checks to see if we've 
> 	// already seen the current element of it1 via a Hashset 
> 	// efficient check, or traverses elements of it2 until we find 
> 	// it or it2 ends. At each iteration over it2 we add the
> 	// elements to seen to avoid revisiting items.
> 	Iterator<?> it1 = coll1.iterator();
> 	Iterator<?> it2 = coll2.iterator();
> 	HashSet<Object> seen = new HashSet<Object>();
> 	boolean changed=false;
> 	// Traverse all the elements in coll1.
> 	while(it1.hasNext()) {
> 	    final Object o=it1.next();
> 	    // If we've seen this element in coll2, keep it.
> 	    if(seen.contains(o))
> 		continue;
> 	    // Otherwise, check for its containment in coll2, while
> 	    // adding the elements to seen.
> 	    boolean contained=false;
> 	    while(it2.hasNext()) {
> 		final Object o2=it2.next();
> 		seen.add(o2);
> 		// Found the element in coll2.
> 		if(o2.equals(o)) {
> 		    contained=true;
> 		    break;
> 		}
> 	    }
> 	    // If the element was not found in coll2, remove it from it1.
> 	    if(!contained) {
> 		changed=true;
> 		it1.remove();
> 	    }
> 	}
> 	
> 	return changed;
>     }
> {code}
> Besides providing this in CollectionUtils for the user, this function could be 
> beneficial within other functions, where the existing retainAll exhibits a significant slowdown for large collections:
> 1) https://issues.apache.org/jira/browse/COLLECTIONS-534
> 2) https://issues.apache.org/jira/browse/COLLECTIONS-548



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)