You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HeartSaVioR <gi...@git.apache.org> on 2018/05/31 14:55:32 UTC

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total size of states in ...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/spark/pull/21469

    [SPARK-24441][SS] Expose total size of states in HDFSBackedStateStore…

    …Provider
    
    ## What changes were proposed in this pull request?
    
    This patch exposes the estimation of size of cache (loadedMaps) in HDFSBackedStateStoreProvider as a custom metric of StateStore. While it refers loadedMaps directly, there would be only one StateStoreWriter which refers a StateStoreProvider, so the value is not exposed as well as being aggregated multiple times. Current state metrics are safe to aggregate for the same reason.
    
    ## How was this patch tested?
    
    Tested manually. Below is the snapshot of UI page which is reflected by the patch: 
    
    <img width="596" alt="screen shot 2018-05-31 at 10 13 23 pm" src="https://user-images.githubusercontent.com/1317309/40788976-4ad93d8c-652c-11e8-88f1-337be5162588.png">


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/spark SPARK-24441

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21469.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21469
    
----
commit dc11338650842a246a4bce9280d130607ceca281
Author: Jungtaek Lim <ka...@...>
Date:   2018-05-31T14:38:00Z

    [SPARK-24441][SS] Expose total size of states in HDFSBackedStateStoreProvider
    
    * expose estimation of size of cache (loadMaps) in HDFSBackedStateStoreProvider
      * as a custom metric of StateStore

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92860/testReport)** for PR 21469 at commit [`c4a6d11`](https://github.com/apache/spark/commit/c4a6d11e230759a0d068024cba04c31a0e7903a9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92544/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92860/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93869/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93869/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Thanks all for reviewing and thanks @tdas for merging this in!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @tdas Thanks for the feedback! Updated the PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91347/testReport)** for PR 21469 at commit [`dc11338`](https://github.com/apache/spark/commit/dc11338650842a246a4bce9280d130607ceca281).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Nice, LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91526/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91486/testReport)** for PR 21469 at commit [`345397d`](https://github.com/apache/spark/commit/345397dd530898697ef338ec55b81ad11ece4dc0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class StateStoreCustomAverageMetric(name: String, desc: String) extends StateStoreCustomMetric`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r206755595
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
    @@ -48,12 +49,24 @@ class StateOperatorProgress private[sql](
       def prettyJson: String = pretty(render(jsonValue))
     
       private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress =
    -    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes)
    +    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes, customMetrics)
     
       private[sql] def jsonValue: JValue = {
    -    ("numRowsTotal" -> JInt(numRowsTotal)) ~
    -    ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
    -    ("memoryUsedBytes" -> JInt(memoryUsedBytes))
    +    def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => JValue): JValue = {
    +      if (map.isEmpty) return JNothing
    +      val keys = map.keySet.asScala.toSeq.sorted
    +      keys.map { k => k -> valueToJValue(map.get(k)) : JObject }.reduce(_ ~ _)
    +    }
    +
    +    val jsonVal = ("numRowsTotal" -> JInt(numRowsTotal)) ~
    +      ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
    +      ("memoryUsedBytes" -> JInt(memoryUsedBytes))
    +
    +    if (!customMetrics.isEmpty) {
    --- End diff --
    
    Actually didn't notice that. Thanks for letting me know! Will simplify.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    (Sorry to comment after so long with such a minor change - I've been busy with spark summit)
    
    metricProviderLoaderMapSize should be metricProviderLoaderMapSizeBytes, both for clarity and consistency with memoryUsedBytes. Other than that LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91375/testReport)** for PR 21469 at commit [`6c1c30b`](https://github.com/apache/spark/commit/6c1c30b9a2722cfd1eb8ee72575f142c00365fb3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91523/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91347/testReport)** for PR 21469 at commit [`dc11338`](https://github.com/apache/spark/commit/dc11338650842a246a4bce9280d130607ceca281).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92544/testReport)** for PR 21469 at commit [`c9aada5`](https://github.com/apache/spark/commit/c9aada520889b87ace0886805910f0d56d099bd2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93257/testReport)** for PR 21469 at commit [`32d0418`](https://github.com/apache/spark/commit/32d041878b7dcd20794250853063dab4bac09118).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @jose-torres 
    Ah yes I forgot that shallow copy has been occurring, so while new map should hold necessary size of map entries but row object will be shared across versions. Thanks for pointing it out. Will update the description.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91596/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r206739252
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala ---
    @@ -81,10 +81,10 @@ class SQLMetric(val metricType: String, initValue: Long = 0L) extends Accumulato
     }
     
     object SQLMetrics {
    -  private val SUM_METRIC = "sum"
    -  private val SIZE_METRIC = "size"
    -  private val TIMING_METRIC = "timing"
    -  private val AVERAGE_METRIC = "average"
    +  val SUM_METRIC = "sum"
    +  val SIZE_METRIC = "size"
    +  val TIMING_METRIC = "timing"
    +  val AVERAGE_METRIC = "average"
    --- End diff --
    
    why this change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93927/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91625/testReport)** for PR 21469 at commit [`62fc395`](https://github.com/apache/spark/commit/62fc395d29b67a827612e4e7f4bf134f5817a33c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #95012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95012/testReport)** for PR 21469 at commit [`545081e`](https://github.com/apache/spark/commit/545081e4b7a9bb37442f7d558bc28db790742610).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92551/testReport)** for PR 21469 at commit [`c9aada5`](https://github.com/apache/spark/commit/c9aada520889b87ace0886805910f0d56d099bd2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93906/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    >I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider
    
    May be this can be reported as a custom metrics and keep it optional and that way its not tied to any specific implementation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91503/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93906/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @arunmahadevan 
    I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider (though this is only one implementation available in Apache Spark) so not sure this metric can be treated as a general one. (@tdas what do you think about this?)
    
    Btw, the cache is going to clean up when maintenance operation is in progress, so there could be more than 100 versions in map. Not sure why it shows 150x, but I couldn't find missing spot on the patch. Maybe the issue is from SizeEstimator.estimate()?
    
    One thing we need to check is how SizeEstimator.estimate() calculate the memory usage when Unsafe row objects are shared across versions. If SizeEstimator adds the size of object whenever it is referenced, it will report much higher memory usage than actual.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91596/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93927/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Also added custom metric for the count of versions stored in loadedMaps.
    
    This is a new screenshot:
    <img width="601" alt="screen shot 2018-06-05 at 10 16 16 pm" src="https://user-images.githubusercontent.com/1317309/40978481-b46ad324-690e-11e8-9b0f-e80528612a62.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    adding cc. to @zsxwing since he has been reviewing PRs for SS so far.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    cc. @tdas @jose-torres @jerryshao @HyukjinKwon @arunmahadevan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93257/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194483603
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala ---
    @@ -112,14 +122,19 @@ trait StateStoreWriter extends StatefulOperator { self: SparkPlan =>
         val storeMetrics = store.metrics
         longMetric("numTotalStateRows") += storeMetrics.numKeys
         longMetric("stateMemory") += storeMetrics.memoryUsedBytes
    -    storeMetrics.customMetrics.foreach { case (metric, value) =>
    -      longMetric(metric.name) += value
    +    storeMetrics.customMetrics.foreach {
    +      case (metric: StateStoreCustomAverageMetric, value) =>
    +        longMetric(metric.name).set(value * 1.0d)
    --- End diff --
    
    How does this get accumulated ? It seems the value last set may get propagated to the driver.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92551/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91625/testReport)** for PR 21469 at commit [`62fc395`](https://github.com/apache/spark/commit/62fc395d29b67a827612e4e7f4bf134f5817a33c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92542/testReport)** for PR 21469 at commit [`96115bc`](https://github.com/apache/spark/commit/96115bcc48cfe67c4b7bd37315963b9ba1366b3a).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93224/testReport)** for PR 21469 at commit [`5b203d4`](https://github.com/apache/spark/commit/5b203d4967eda3a09f7c8d83cf86e7ac6a427182).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194232690
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala ---
    @@ -231,7 +231,7 @@ class StreamingQueryListenerSuite extends StreamTest with BeforeAndAfter {
       test("event ordering") {
         val listener = new EventCollector
         withListenerAdded(listener) {
    -      for (i <- 1 to 100) {
    +      for (i <- 1 to 50) {
    --- End diff --
    
    Makes sense, and I agree with the implicit claim that this slowdown isn't too worrying.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92329/testReport)** for PR 21469 at commit [`62fc395`](https://github.com/apache/spark/commit/62fc395d29b67a827612e4e7f4bf134f5817a33c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Kindly ping again to @tdas 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91526/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91535/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    LGTM. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r193622940
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala ---
    @@ -231,7 +231,7 @@ class StreamingQueryListenerSuite extends StreamTest with BeforeAndAfter {
       test("event ordering") {
         val listener = new EventCollector
         withListenerAdded(listener) {
    -      for (i <- 1 to 100) {
    +      for (i <- 1 to 50) {
    --- End diff --
    
    After the patch this test starts failing: it just means there's more time needed to run this loop 100 times, and doesn't mean the logic is broken. Decreasing number works for me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91375/testReport)** for PR 21469 at commit [`6c1c30b`](https://github.com/apache/spark/commit/6c1c30b9a2722cfd1eb8ee72575f142c00365fb3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #95012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95012/testReport)** for PR 21469 at commit [`545081e`](https://github.com/apache/spark/commit/545081e4b7a9bb37442f7d558bc28db790742610).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @tdas Kindly reminder.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @tdas In case of you are not working on the patch, I'm working on the fix and will provide minor PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    LGTM. To clarify the description, we expect the memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @jose-torres is it good to go?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92542/testReport)** for PR 21469 at commit [`96115bc`](https://github.com/apache/spark/commit/96115bcc48cfe67c4b7bd37315963b9ba1366b3a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @tdas Thanks for the review! Addressed review comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Thanks @HyukjinKwon for reviewing. Addressed PR title as well as fixing nit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93224/testReport)** for PR 21469 at commit [`5b203d4`](https://github.com/apache/spark/commit/5b203d4967eda3a09f7c8d83cf86e7ac6a427182).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194592510
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala ---
    @@ -112,14 +122,19 @@ trait StateStoreWriter extends StatefulOperator { self: SparkPlan =>
         val storeMetrics = store.metrics
         longMetric("numTotalStateRows") += storeMetrics.numKeys
         longMetric("stateMemory") += storeMetrics.memoryUsedBytes
    -    storeMetrics.customMetrics.foreach { case (metric, value) =>
    -      longMetric(metric.name) += value
    +    storeMetrics.customMetrics.foreach {
    +      case (metric: StateStoreCustomAverageMetric, value) =>
    +        longMetric(metric.name).set(value * 1.0d)
    --- End diff --
    
    Not sure if SQLAppstatusListener comes into play for reporting query progress. (e.g. StreamingQueryWrapper.lastProgress)
    
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L193
    
    Based on my understanding, the SQLMetric is an Accumulator so the merged values of the accumulators across all the tasks is returned. The merge operation in SQLMetric just adds the value so it makes sense only for count or size values. We would be able to display the (min, med, max) values for now in the UI and not in the "query status". I was thinking if we make it a count metric, it may work (similar to  number of state rows). I am fine with either way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91375/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91523/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91523/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    My series of patches could be possible based on two metrics: `size for memory usage of latest version` and `size for total memory usage of loaded versions`. SPARK-24717 (#21700) enabled the possibility to tune the overall state memory usage, and if end users have either one metric they couldn't tune it.
    
    IMHO, I'm not 100% sure how much this patch provides confusion to the end users, but if the intention of `memoryUsedBytes` is for measuring overall state partition, what about replacing `memoryUsedBytes` as `size for total memory usage of loaded versions`, but also placing `size for memory usage of latest version` to custom metric?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93927/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194480087
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala ---
    @@ -247,6 +253,14 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit
       private lazy val fm = CheckpointFileManager.create(baseDir, hadoopConf)
       private lazy val sparkConf = Option(SparkEnv.get).map(_.conf).getOrElse(new SparkConf)
     
    +  private lazy val metricProviderLoaderMapSizeBytes: StateStoreCustomSizeMetric =
    +    StateStoreCustomSizeMetric("providerLoadedMapSizeBytes",
    +      "estimated size of states cache in provider")
    +
    +  private lazy val metricProviderLoaderCountOfVersionsInMap: StateStoreCustomAverageMetric =
    --- End diff --
    
    Why is "metricProviderLoaderCountOfVersionsInMap" an average metrics? The other metrics like "numTotalStateRows" and even "providerLoadedMapSizeBytes" is count metric. Shouldn't this be similar?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92542/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Looks like the size is added only once for same identity on SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in this case. There might be other valid cases, but not sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91596/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r206754359
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala ---
    @@ -81,10 +81,10 @@ class SQLMetric(val metricType: String, initValue: Long = 0L) extends Accumulato
     }
     
     object SQLMetrics {
    -  private val SUM_METRIC = "sum"
    -  private val SIZE_METRIC = "size"
    -  private val TIMING_METRIC = "timing"
    -  private val AVERAGE_METRIC = "average"
    +  val SUM_METRIC = "sum"
    +  val SIZE_METRIC = "size"
    +  val TIMING_METRIC = "timing"
    +  val AVERAGE_METRIC = "average"
    --- End diff --
    
    It was to handle exception case while aggregating custom metrics, especially filtering out average since it is not aggregated correctly. Since we remove custom average metric, we no longer need to filter out them. Will revert the change as well as relevant logic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @tdas Kindly reminder.
    @zsxwing Could you take a quick look at this and share your thought? I think the patch is ready to merge, but blocked with slightly conflict of view so more voices would be better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92329/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    I did. Fixed the import


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total size of states in ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r192308080
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala ---
    @@ -181,6 +182,12 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit
         }
       }
     
    +  def getCustomMetricsForProvider(): Map[StateStoreCustomMetric, Long] = {
    --- End diff --
    
    tiny nit:
    
    ```scala
    def getCustomMetricsForProvider(): Map[StateStoreCustomMetric, Long] = synchronized {
      Map(metricProviderLoaderMapSize -> SizeEstimator.estimate(loadedMaps))
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r206738287
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
    @@ -48,12 +49,24 @@ class StateOperatorProgress private[sql](
       def prettyJson: String = pretty(render(jsonValue))
     
       private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress =
    -    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes)
    +    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes, customMetrics)
     
       private[sql] def jsonValue: JValue = {
    -    ("numRowsTotal" -> JInt(numRowsTotal)) ~
    -    ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
    -    ("memoryUsedBytes" -> JInt(memoryUsedBytes))
    +    def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => JValue): JValue = {
    +      if (map.isEmpty) return JNothing
    +      val keys = map.keySet.asScala.toSeq.sorted
    +      keys.map { k => k -> valueToJValue(map.get(k)) : JObject }.reduce(_ ~ _)
    +    }
    +
    +    val jsonVal = ("numRowsTotal" -> JInt(numRowsTotal)) ~
    +      ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
    +      ("memoryUsedBytes" -> JInt(memoryUsedBytes))
    +
    +    if (!customMetrics.isEmpty) {
    --- End diff --
    
    You are already handling the case of map being empty in `safeMapToJValue` by adding JNothing. Doesnt JNothing values just get dropped from the json text any way?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194613720
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala ---
    @@ -112,14 +122,19 @@ trait StateStoreWriter extends StatefulOperator { self: SparkPlan =>
         val storeMetrics = store.metrics
         longMetric("numTotalStateRows") += storeMetrics.numKeys
         longMetric("stateMemory") += storeMetrics.memoryUsedBytes
    -    storeMetrics.customMetrics.foreach { case (metric, value) =>
    -      longMetric(metric.name) += value
    +    storeMetrics.customMetrics.foreach {
    +      case (metric: StateStoreCustomAverageMetric, value) =>
    +        longMetric(metric.name).set(value * 1.0d)
    --- End diff --
    
    We would be better to think about the actual benefit of exposing the value, rather than how to expose the value to somewhere. If we define it as count and do aggregation as summation, the aggregated value will be `(partition count * versions)` which might be hard for end users to find the meaning from the value.
    
    I'm afraid that exposing this to StreamingQuery as average is not trivial, especially SQLMetric is defined as `AccumulatorV2[Long, Long]` so only single Long value can be passed. Under the restriction, we couldn't define `merge` operation for `average metric`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92544/testReport)** for PR 21469 at commit [`c9aada5`](https://github.com/apache/spark/commit/c9aada520889b87ace0886805910f0d56d099bd2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @HeartSaVioR , may be then this should be reported in the "memoryUsedBytes" in the StateOperatorProgress (value reported in StreamingQueryProgress) because currently the usage reported does not reflect the memory used for the cache.
    
    Question: in the screenshot "Estimated size of states cache in provider total" is 3.3 MB whereas the "memory used by state total" is 20.6 KB with "total number of state rows" = 2. This 150x difference is expected with just 2 rows in the state? Were there 100 versions of the map in the sample output you posted?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91382/testReport)** for PR 21469 at commit [`933fb2e`](https://github.com/apache/spark/commit/933fb2e329b7e76ae5087e6ce58dd3359c78d5f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Rebased to fix conflict, and added new commit (last one: c9aada5) to represent cache hit / miss count in HDFS state provider. This is actually helpful for SPARK-24717 to determine proper value of configuration, but with this commit SPARK-24717 should be on top of this PR, so just added it here to avoid rebase hell.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92839/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91535/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @HeartSaVioR I think I agree with a second approach that you suggested. So 
    `memoryUsedBytes` => `size for total memory usage of loaded versions` and
    `customMetric` => `size for memory usage of latest version`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Now I'm thinking about removing "metricProviderLoaderCountOfVersionsInMap" and also removing StateStoreCustomAverageMetric, since the value doesn't look correct with stream-stream join which handles multiple states in an operation.
    
    We may try out SQLMetric to extend AccumulatorV2[(kind of Metric class), (kind of Metric class)] which Metric class can handle merging multiple values correctly according to the kind of metric, but this is beyond the scope of PR, so just removing "metricProviderLoaderCountOfVersionsInMap" would be clearer for this patch for now.
    
    What do you think, @arunmahadevan @jose-torres ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92839/testReport)** for PR 21469 at commit [`c4a6d11`](https://github.com/apache/spark/commit/c4a6d11e230759a0d068024cba04c31a0e7903a9).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91486/testReport)** for PR 21469 at commit [`345397d`](https://github.com/apache/spark/commit/345397dd530898697ef338ec55b81ad11ece4dc0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21469


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93906/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93869/testReport)** for PR 21469 at commit [`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @arunmahadevan 
    Added custom metrics in state store to streaming query status as well. You can see `providerLoadedMapSize` is added to `stateOperators.customMetrics` in below output.
    
    I have to exclude `providerLoadedMapCountOfVersions` from the list, since average metric is implemented a bit tricky and doesn't look like easy to aggregate for streaming query status. 
    We may want to reimplement SQLMetric and subclasses to make sure everything works correctly without any tricky approach, but that doesn't look like trivial to address and I think this is out of scope on this PR.
    
    ```
    18/06/06 22:51:23 INFO MicroBatchExecution: Streaming query made progress: {
      "id" : "7564a0b7-e3b2-4d53-b246-b774ab04e586",
      "runId" : "8dd34784-080c-4f86-afaf-ac089902252d",
      "name" : null,
      "timestamp" : "2018-06-06T13:51:15.467Z",
      "batchId" : 4,
      "numInputRows" : 547,
      "inputRowsPerSecond" : 67.15776550030694,
      "processedRowsPerSecond" : 65.94333936106088,
      "durationMs" : {
        "addBatch" : 7944,
        "getBatch" : 1,
        "getEndOffset" : 0,
        "queryPlanning" : 61,
        "setOffsetRange" : 5,
        "triggerExecution" : 8295,
        "walCommit" : 158
      },
      "eventTime" : {
        "avg" : "2018-06-06T13:51:10.313Z",
        "max" : "2018-06-06T13:51:14.250Z",
        "min" : "2018-06-06T13:51:07.098Z",
        "watermark" : "2018-06-06T13:50:36.676Z"
      },
      "stateOperators" : [ {
        "numRowsTotal" : 20,
        "numRowsUpdated" : 16,
        "memoryUsedBytes" : 26679,
        "customMetrics" : {
          "providerLoadedMapSize" : 181911
        }
      } ],
      "sources" : [ {
        "description" : "KafkaV2[Subscribe[apachelogs-v2]]",
        "startOffset" : {
          "apachelogs-v2" : {
            "2" : 489056,
            "4" : 489053,
            "1" : 489055,
            "3" : 489051,
            "0" : 489053
          }
        },
        "endOffset" : {
          "apachelogs-v2" : {
            "2" : 489056,
            "4" : 489053,
            "1" : 489055,
            "3" : 489051,
            "0" : 489053
          }
        },
        "numInputRows" : 547,
        "inputRowsPerSecond" : 67.15776550030694,
        "processedRowsPerSecond" : 65.94333936106088
      } ],
      "sink" : {
        "description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@60999714"
      }
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92839/testReport)** for PR 21469 at commit [`c4a6d11`](https://github.com/apache/spark/commit/c4a6d11e230759a0d068024cba04c31a0e7903a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r206738919
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
    @@ -48,12 +49,24 @@ class StateOperatorProgress private[sql](
       def prettyJson: String = pretty(render(jsonValue))
     
       private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress =
    -    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes)
    +    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes, customMetrics)
     
       private[sql] def jsonValue: JValue = {
    -    ("numRowsTotal" -> JInt(numRowsTotal)) ~
    -    ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
    -    ("memoryUsedBytes" -> JInt(memoryUsedBytes))
    +    def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => JValue): JValue = {
    --- End diff --
    
    T is always Long. Why make a generic function for that? This does not even need a separate function. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92551/testReport)** for PR 21469 at commit [`c9aada5`](https://github.com/apache/spark/commit/c9aada520889b87ace0886805910f0d56d099bd2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194585044
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala ---
    @@ -112,14 +122,19 @@ trait StateStoreWriter extends StatefulOperator { self: SparkPlan =>
         val storeMetrics = store.metrics
         longMetric("numTotalStateRows") += storeMetrics.numKeys
         longMetric("stateMemory") += storeMetrics.memoryUsedBytes
    -    storeMetrics.customMetrics.foreach { case (metric, value) =>
    -      longMetric(metric.name) += value
    +    storeMetrics.customMetrics.foreach {
    +      case (metric: StateStoreCustomAverageMetric, value) =>
    +        longMetric(metric.name).set(value * 1.0d)
    --- End diff --
    
    If my understanding is right, the metric object (return of `longMetric()`) is different between each task, so the object will be different for each batch and each task. (TaskMetric is serialized and deserialized so it can't be shared between tasks.)
    
    And actually the metric values are not aggregated into an SQLMetric object. The values are just aggregated and represented in SQLAppStatusListener.
    
    https://github.com/apache/spark/blob/f5af86ea753c446df59a0a8c16c685224690d633/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L162-L174
    
    https://github.com/apache/spark/blob/f5af86ea753c446df59a0a8c16c685224690d633/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L147-L160
    
    <img width="489" alt="screen shot 2018-06-12 at 9 17 14 am" src="https://user-images.githubusercontent.com/1317309/41263432-024efe4a-6e22-11e8-92f9-24d1f73776a9.png">



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91535/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @tdas Yeah, I can check with master branch if you would like to let me handle, and please go ahead if you would like to handle it by yourself.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r206755538
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
    @@ -48,12 +49,24 @@ class StateOperatorProgress private[sql](
       def prettyJson: String = pretty(render(jsonValue))
     
       private[sql] def copy(newNumRowsUpdated: Long): StateOperatorProgress =
    -    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes)
    +    new StateOperatorProgress(numRowsTotal, newNumRowsUpdated, memoryUsedBytes, customMetrics)
     
       private[sql] def jsonValue: JValue = {
    -    ("numRowsTotal" -> JInt(numRowsTotal)) ~
    -    ("numRowsUpdated" -> JInt(numRowsUpdated)) ~
    -    ("memoryUsedBytes" -> JInt(memoryUsedBytes))
    +    def safeMapToJValue[T](map: ju.Map[String, T], valueToJValue: T => JValue): JValue = {
    --- End diff --
    
    I've first trying to leverage `StreamingQueryProgress.safeMapToJValue` but can't find proper place to move to be co-used, so I simply copied it. Will simplify the code block and inline.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91382/testReport)** for PR 21469 at commit [`933fb2e`](https://github.com/apache/spark/commit/933fb2e329b7e76ae5087e6ce58dd3359c78d5f1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    @jose-torres No problem. I expect there would be some inactive moment in Spark community during spark summit. Addressed comment regarding renaming.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    I am having a second thoughts about this. Exposing the entire memory usage of all the loaded maps as another custom metric .... just adds more confusion. Rather the point of the the main state metric `memoryUsedBytes` is to capture how much memory is occupied because of the one partition of the state, and that implicitly should cover all the loaded versions of that state partition. So I strongly feel that instead of adding a custom metric, we should change the existing `memoryUsedBytes` to capture all the memory. 
    
    I am fine adding the custom metrics hit and miss counts. No questions about that. 
    
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92329/testReport)** for PR 21469 at commit [`62fc395`](https://github.com/apache/spark/commit/62fc395d29b67a827612e4e7f4bf134f5817a33c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #93257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93257/testReport)** for PR 21469 at commit [`32d0418`](https://github.com/apache/spark/commit/32d041878b7dcd20794250853063dab4bac09118).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)** for PR 21469 at commit [`7ec3242`](https://github.com/apache/spark/commit/7ec32427cf0fda82d2f936fa0aab62e274a8c034).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Sure, I don't mind if we remove that metric.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Shall we make the PR title complete? Looks truncated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91382/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91486/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91347/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21469#discussion_r194563959
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala ---
    @@ -247,6 +253,14 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit
       private lazy val fm = CheckpointFileManager.create(baseDir, hadoopConf)
       private lazy val sparkConf = Option(SparkEnv.get).map(_.conf).getOrElse(new SparkConf)
     
    +  private lazy val metricProviderLoaderMapSizeBytes: StateStoreCustomSizeMetric =
    +    StateStoreCustomSizeMetric("providerLoadedMapSizeBytes",
    +      "estimated size of states cache in provider")
    +
    +  private lazy val metricProviderLoaderCountOfVersionsInMap: StateStoreCustomAverageMetric =
    --- End diff --
    
    This should be average to show `min, med, max` in SQL metrics UI, as I pasted capture of UI before. Summing them all doesn't give meaningful value.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)** for PR 21469 at commit [`af57f26`](https://github.com/apache/spark/commit/af57f26239ef635cb9837e9ebcf37fbdaa215480).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93224/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95012/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #92860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92860/testReport)** for PR 21469 at commit [`c4a6d11`](https://github.com/apache/spark/commit/c4a6d11e230759a0d068024cba04c31a0e7903a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)** for PR 21469 at commit [`af57f26`](https://github.com/apache/spark/commit/af57f26239ef635cb9837e9ebcf37fbdaa215480).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Unfortunately this PR broke the master build. Looks like some import that probably got removed in the other PR I merged, which didnt create any direct conflict.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)** for PR 21469 at commit [`7ec3242`](https://github.com/apache/spark/commit/7ec32427cf0fda82d2f936fa0aab62e274a8c034).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    **[Test build #91526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91526/testReport)** for PR 21469 at commit [`3c80cad`](https://github.com/apache/spark/commit/3c80cad32c056a24a7f5ffd7ab0ae3f7e096a62d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91509/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21469
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91625/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org