You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by jose-torres <gi...@git.apache.org> on 2018/05/01 05:05:47 UTC

[GitHub] spark pull request #21189: [SPARK-24117][SQL] Unified the getSizePerRow

Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21189#discussion_r185167132
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkSuite.scala ---
    @@ -220,11 +220,11 @@ class MemorySinkSuite extends StreamTest with BeforeAndAfter {
     
         sink.addBatch(0, 1 to 3)
         plan.invalidateStatsCache()
    -    assert(plan.stats.sizeInBytes === 12)
    +    assert(plan.stats.sizeInBytes === 36)
     
         sink.addBatch(1, 4 to 6)
         plan.invalidateStatsCache()
    -    assert(plan.stats.sizeInBytes === 24)
    +    assert(plan.stats.sizeInBytes === 72)
    --- End diff --
    
    It shouldn't impact anything, but abstractly it seems strange that this unification would cause the stats to change? What are we doing differently to cause this, and how confident are we this won't happen to production sinks?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org