You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/08/10 20:37:46 UTC

[jira] [Created] (SPARK-9785) HashPartitioning guarantees / compatibleWith violate those methods' contracts

Josh Rosen created SPARK-9785:
---------------------------------

             Summary: HashPartitioning guarantees / compatibleWith violate those methods' contracts
                 Key: SPARK-9785
                 URL: https://issues.apache.org/jira/browse/SPARK-9785
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen
            Priority: Blocker


HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but in other contexts the ordering of those expressions matters.  This is illustrated by the following regression test:

{code}
  test("HashPartitioning compatibility") {
    val expressions = Seq(Literal(2), Literal(3))
    // Consider two HashPartitionings that have the same _set_ of hash expressions but which are
    // created with different orderings of those expressions:
    val partitioningA = HashPartitioning(expressions, 100)
    val partitioningB = HashPartitioning(expressions.reverse, 100)
    // These partitionings are not considered equal:
    assert(partitioningA != partitioningB)
    // However, they both satisfy the same clustered distribution:
    val distribution = ClusteredDistribution(expressions)
    assert(partitioningA.satisfies(distribution))
    assert(partitioningB.satisfies(distribution))
    // Both partitionings are compatible with and guarantee each other:
    assert(partitioningA.compatibleWith(partitioningB))
    assert(partitioningB.compatibleWith(partitioningA))
    assert(partitioningA.guarantees(partitioningB))
    assert(partitioningB.guarantees(partitioningA))
    // Given all of this, we would expect these partitionings to compute the same hashcode for
    // any given row:
    def computeHashCode(partitioning: HashPartitioning): Int = {
      val hashExprProj = new InterpretedMutableProjection(partitioning.expressions, Seq.empty)
      hashExprProj.apply(InternalRow.empty).hashCode()
    }
    assert(computeHashCode(partitioningA) === computeHashCode(partitioningB))
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org