You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Rob Styles <ro...@dynamicorange.com> on 2012/12/31 19:39:14 UTC

Pairwise Comparison of Large Datasets

Happy New Year :)

Thought some of you might find this useful.

We've developed an approach to doing pairwise comparisons on large datasets
that doesn't require visibility of the whole dataset at any time. The
approach brings together pairs for comparison using incrementing
coordinates to target a value at a cell.

http://dynamicorange.com/2012/12/31/pairwise-comparisons-of-large-datasets/

There is still work to do on making the approach more efficient and trying
to eliminate a pre-processing step. Help gratefully received.

If there's a toolset already out there for doing this I'd be happy to hear
about that too!

thanks

rob