You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/08/10 20:37:46 UTC
[jira] [Created] (SPARK-9785) HashPartitioning guarantees /
compatibleWith violate those methods' contracts
Josh Rosen created SPARK-9785:
---------------------------------
Summary: HashPartitioning guarantees / compatibleWith violate those methods' contracts
Key: SPARK-9785
URL: https://issues.apache.org/jira/browse/SPARK-9785
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.5.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker
HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but in other contexts the ordering of those expressions matters. This is illustrated by the following regression test:
{code}
test("HashPartitioning compatibility") {
val expressions = Seq(Literal(2), Literal(3))
// Consider two HashPartitionings that have the same _set_ of hash expressions but which are
// created with different orderings of those expressions:
val partitioningA = HashPartitioning(expressions, 100)
val partitioningB = HashPartitioning(expressions.reverse, 100)
// These partitionings are not considered equal:
assert(partitioningA != partitioningB)
// However, they both satisfy the same clustered distribution:
val distribution = ClusteredDistribution(expressions)
assert(partitioningA.satisfies(distribution))
assert(partitioningB.satisfies(distribution))
// Both partitionings are compatible with and guarantee each other:
assert(partitioningA.compatibleWith(partitioningB))
assert(partitioningB.compatibleWith(partitioningA))
assert(partitioningA.guarantees(partitioningB))
assert(partitioningB.guarantees(partitioningA))
// Given all of this, we would expect these partitionings to compute the same hashcode for
// any given row:
def computeHashCode(partitioning: HashPartitioning): Int = {
val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty)
hashExprProj.apply(InternalRow.empty).hashCode()
}
assert(computeHashCode(partitioningA) === computeHashCode(partitioningB))
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org