You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/04/15 22:23:06 UTC

[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/12425

    [SPARK-14669] [SQL] Fix some SQL metrics in codegen and added more

    ## What changes were proposed in this pull request?
    
    1. Fix the "spill size" of TungstenAggregate and Sort
    2. Rename "data size" to "peak memory" to match the actual meaning (also consistent with task metrics)
    3. Added "data size" for ShuffleExchange and BroadcastExchange
    4. Added some timing for Sort, Aggregate and BroadcastExchange (this requires another patch to work)
    
    
    ## How was this patch tested?
    
    Existing tests.
    ![metrics](https://cloud.githubusercontent.com/assets/40902/14573908/21ad2f00-030d-11e6-9e2c-c544f30039ea.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark fix_metrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12425
    
----
commit a9b29e2c36ea39653896ffc0f1d6727640ccf9d0
Author: Davies Liu <da...@databricks.com>
Date:   2016-04-15T20:13:32Z

    fix sql metrics

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213563301
  
    **[Test build #56712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56712/consoleFull)** for PR 12425 at commit [`faeb593`](https://github.com/apache/spark/commit/faeb593713774a92731cbab5edebba9ca72714f6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213563539
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210629360
  
    cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210667740
  
    **[Test build #55957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55957/consoleFull)** for PR 12425 at commit [`1378dd2`](https://github.com/apache/spark/commit/1378dd21469ce722e4f0bd2feb10c394de59900f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r60629466
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -26,6 +26,7 @@ import com.google.common.io.ByteStreams
     
     import org.apache.spark.serializer.{DeserializationStream, SerializationStream, Serializer, SerializerInstance}
     import org.apache.spark.sql.catalyst.expressions.UnsafeRow
    +import org.apache.spark.sql.execution.metric.LongSQLMetric
    --- End diff --
    
    unused import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210670270
  
    **[Test build #55970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55970/consoleFull)** for PR 12425 at commit [`696aafe`](https://github.com/apache/spark/commit/696aafee07ace0fb8142295e9954bdcd00e29061).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213263958
  
    **[Test build #56654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56654/consoleFull)** for PR 12425 at commit [`cc65830`](https://github.com/apache/spark/commit/cc65830c4fb7a673b0e70bbf74672e3106cc7490).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210674091
  
    Should we also add a metric for `dataSize`, since peak memory usage might not be quite the same? Though maybe adding up peak memory and spill size is a good approximation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r60632404
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala ---
    @@ -57,8 +59,13 @@ case class ShuffledHashJoin(
         ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil
     
       private def buildHashedRelation(iter: Iterator[InternalRow]): HashedRelation = {
    +    val buildDataSize = longMetric("buildDataSize")
    +    val buildTime = longMetric("buildTime")
    +    val start = System.currentTimeMillis()
         val context = TaskContext.get()
         val relation = HashedRelation(iter, buildKeys, taskMemoryManager = context.taskMemoryManager())
    +    buildTime += System.currentTimeMillis() - start
    --- End diff --
    
    nit: nanoTime


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213264623
  
    **[Test build #56654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56654/consoleFull)** for PR 12425 at commit [`cc65830`](https://github.com/apache/spark/commit/cc65830c4fb7a673b0e70bbf74672e3106cc7490).
     * This patch **fails build dependency tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210638565
  
    **[Test build #55957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55957/consoleFull)** for PR 12425 at commit [`1378dd2`](https://github.com/apache/spark/commit/1378dd21469ce722e4f0bd2feb10c394de59900f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210741272
  
    I slightly prefer to have `dataSize` in the following stage so all the relevant metrics are together, but having it in Exchange seems ok too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213525188
  
    **[Test build #56712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56712/consoleFull)** for PR 12425 at commit [`faeb593`](https://github.com/apache/spark/commit/faeb593713774a92731cbab5edebba9ca72714f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210697626
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213301778
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210667894
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213264516
  
    @zsxwing Addressed you comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210697510
  
    **[Test build #55970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55970/consoleFull)** for PR 12425 at commit [`696aafe`](https://github.com/apache/spark/commit/696aafee07ace0fb8142295e9954bdcd00e29061).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r59963085
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala ---
    @@ -78,8 +82,15 @@ case class ShuffleExchange(
        * the returned ShuffleDependency will be the input of shuffle.
        */
       private[sql] def prepareShuffleDependency(): ShuffleDependency[Int, InternalRow, InternalRow] = {
    -    ShuffleExchange.prepareShuffleDependency(
    -      child.execute(), child.output, newPartitioning, serializer)
    +    val dataSize = longMetric("dataSize")
    +    val rdd = child.execute().mapPartitionsInternal { iter =>
    +      val localDataSize = dataSize.localValue
    +      iter.map { row =>
    +        localDataSize.add(row.asInstanceOf[UnsafeRow].getSizeInBytes)
    --- End diff --
    
    Isn't this iteration over each row a significant added overhead? Seems it would be better to count the data size in bulk instead where the sort is done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12425


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213264910
  
    **[Test build #56656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56656/consoleFull)** for PR 12425 at commit [`1076c75`](https://github.com/apache/spark/commit/1076c7514b6206106a11350a22d910fe8258e12d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r60280314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala ---
    @@ -78,8 +82,15 @@ case class ShuffleExchange(
        * the returned ShuffleDependency will be the input of shuffle.
        */
       private[sql] def prepareShuffleDependency(): ShuffleDependency[Int, InternalRow, InternalRow] = {
    -    ShuffleExchange.prepareShuffleDependency(
    -      child.execute(), child.output, newPartitioning, serializer)
    +    val dataSize = longMetric("dataSize")
    +    val rdd = child.execute().mapPartitionsInternal { iter =>
    +      val localDataSize = dataSize.localValue
    +      iter.map { row =>
    +        localDataSize.add(row.asInstanceOf[UnsafeRow].getSizeInBytes)
    --- End diff --
    
    I also worried the overhead added here, or remove the iterator here and count the size in UsafeRowSerializer (tried in the beginning, less clear than current one)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210631350
  
    **[Test build #55954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55954/consoleFull)** for PR 12425 at commit [`a9b29e2`](https://github.com/apache/spark/commit/a9b29e2c36ea39653896ffc0f1d6727640ccf9d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213301572
  
    **[Test build #56656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56656/consoleFull)** for PR 12425 at commit [`1076c75`](https://github.com/apache/spark/commit/1076c7514b6206106a11350a22d910fe8258e12d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-212714696
  
    cc @sameeragarwal Could you also take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r60632391
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala ---
    @@ -57,8 +59,13 @@ case class ShuffledHashJoin(
         ClusteredDistribution(leftKeys) :: ClusteredDistribution(rightKeys) :: Nil
     
       private def buildHashedRelation(iter: Iterator[InternalRow]): HashedRelation = {
    +    val buildDataSize = longMetric("buildDataSize")
    +    val buildTime = longMetric("buildTime")
    +    val start = System.currentTimeMillis()
    --- End diff --
    
    nit: nanoTime


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r59948041
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala ---
    @@ -87,10 +89,12 @@ case class Sort(
           // Remember spill data size of this task before execute this operator so that we can
           // figure out how many bytes we spilled for this operator.
           val spillSizeBefore = metrics.memoryBytesSpilled
    +      val beforeSort = System.currentTimeMillis()
    --- End diff --
    
    Should we use nanoTime() instead of currentTimeMillis(), which is not guaranteed to be monotonic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210697630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55970/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r60775468
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -26,6 +26,7 @@ import com.google.common.io.ByteStreams
     
     import org.apache.spark.serializer.{DeserializationStream, SerializationStream, Serializer, SerializerInstance}
     import org.apache.spark.sql.catalyst.expressions.UnsafeRow
    +import org.apache.spark.sql.execution.metric.LongSQLMetric
    --- End diff --
    
    nit: remove this import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210736839
  
    @ericl Exchange has `dataSize`, should that be enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213264631
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56654/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213569712
  
    Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213563540
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56712/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213301781
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56656/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210631829
  
    **[Test build #55954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55954/consoleFull)** for PR 12425 at commit [`a9b29e2`](https://github.com/apache/spark/commit/a9b29e2c36ea39653896ffc0f1d6727640ccf9d0).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210631834
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55954/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210631831
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213264628
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-213535231
  
    LGTM. +1 on having tests arounds metrics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12425#issuecomment-210667895
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55957/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14669] [SQL] Fix some SQL metrics in co...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12425#discussion_r60630187
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala ---
    @@ -87,10 +89,12 @@ case class Sort(
           // Remember spill data size of this task before execute this operator so that we can
           // figure out how many bytes we spilled for this operator.
           val spillSizeBefore = metrics.memoryBytesSpilled
    +      val beforeSort = System.nanoTime()
     
           val sortedIterator = sorter.sort(iter.asInstanceOf[Iterator[UnsafeRow]])
     
    -      dataSize += sorter.getPeakMemoryUsage
    +      sortingTime += (System.nanoTime() - beforeSort) >> 20
    --- End diff --
    
    ">> 20"? I think it should be `/ 1000000`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org