You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2018/11/21 14:59:55 UTC

[GitHub] spark pull request #23105: [SPARK-26140] Pull TempShuffleReadMetrics creatio...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/23105

    [SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader

    ## What changes were proposed in this pull request?
    This patch defines an internal Spark interface for reporting shuffle metrics and uses that in shuffle reader. Before this patch, shuffle metrics is tied to a specific implementation (using a thread local temporary data structure and accumulators). After this patch, callers that define their own shuffle RDDs can create a custom metrics implementation.
    
    With this patch, we would be able to create a better metrics for the SQL layer, e.g. reporting shuffle metrics in the SQL UI, for each exchange operator.
    
    ## How was this patch tested?
    No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-26140

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23105
    
----
commit da253b57c14bc0174f0330ae6fa5d3a61647269b
Author: Reynold Xin <rx...@...>
Date:   2018-11-21T14:56:23Z

    [SPARK-26140] Pull TempShuffleReadMetrics creation out of shuffle reader

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable passing in a custom shuffle metrics...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    **[Test build #99129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99129/testReport)** for PR 23105 at commit [`29ef763`](https://github.com/apache/spark/commit/29ef763088782907ba70ce8408f494f750291e94).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable passing in a custom shuffle ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r235420647
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala ---
    @@ -122,34 +123,3 @@ class ShuffleReadMetrics private[spark] () extends Serializable {
         }
       }
     }
    -
    -/**
    - * A temporary shuffle read metrics holder that is used to collect shuffle read metrics for each
    - * shuffle dependency, and all temporary metrics will be merged into the [[ShuffleReadMetrics]] at
    - * last.
    - */
    -private[spark] class TempShuffleReadMetrics {
    --- End diff --
    
    this was moved to TempShuffleReadMetrics


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r236036258
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.shuffle
    +
    +/**
    + * An interface for reporting shuffle read metrics, for each shuffle. This interface assumes
    + * all the methods are called on a single-threaded, i.e. concrete implementations would not need
    + * to synchronize.
    + *
    + * All methods have additional Spark visibility modifier to allow public, concrete implementations
    + * that still have these methods marked as private[spark].
    + */
    +private[spark] trait ShuffleReadMetricsReporter {
    --- End diff --
    
    https://github.com/apache/spark/pull/23128 :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23105


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99128/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99131/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    **[Test build #99131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99131/testReport)** for PR 23105 at commit [`1a5ce24`](https://github.com/apache/spark/commit/1a5ce2475936f4109a566407330feb9e9635f601).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    cc @jiangxb1987 @squito 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    **[Test build #99129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99129/testReport)** for PR 23105 at commit [`29ef763`](https://github.com/apache/spark/commit/29ef763088782907ba70ce8408f494f750291e94).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r235834136
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.shuffle
    +
    +/**
    + * An interface for reporting shuffle read metrics, for each shuffle. This interface assumes
    + * all the methods are called on a single-threaded, i.e. concrete implementations would not need
    + * to synchronize.
    + *
    + * All methods have additional Spark visibility modifier to allow public, concrete implementations
    + * that still have these methods marked as private[spark].
    + */
    +private[spark] trait ShuffleReadMetricsReporter {
    --- End diff --
    
    how do we plan to use this interface later on? It's not used in this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r235834225
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala ---
    @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager {
           handle: ShuffleHandle,
           startPartition: Int,
           endPartition: Int,
    -      context: TaskContext): ShuffleReader[K, C]
    +      context: TaskContext,
    +      metrics: ShuffleMetricsReporter): ShuffleReader[K, C]
    --- End diff --
    
    IIUC, we should pass a read metrics reporter here, as this method is `getReader` which is called by the reducers to read shuffle files.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r235834088
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleMetricsReporter.scala ---
    @@ -0,0 +1,33 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.shuffle
    +
    +/**
    + * An interface for reporting shuffle information, for each shuffle. This interface assumes
    --- End diff --
    
    `for each shuffle` -> `for each reducer of a shuffle`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable passing in a custom shuffle metrics...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable passing in a custom shuffle metrics...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5235/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5238/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable passing in a custom shuffle metrics...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable passing in a custom shuffle metrics...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    **[Test build #99128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99128/testReport)** for PR 23105 at commit [`da253b5`](https://github.com/apache/spark/commit/da253b57c14bc0174f0330ae6fa5d3a61647269b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    **[Test build #99131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99131/testReport)** for PR 23105 at commit [`1a5ce24`](https://github.com/apache/spark/commit/1a5ce2475936f4109a566407330feb9e9635f601).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r235950427
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala ---
    @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager {
           handle: ShuffleHandle,
           startPartition: Int,
           endPartition: Int,
    -      context: TaskContext): ShuffleReader[K, C]
    +      context: TaskContext,
    +      metrics: ShuffleMetricsReporter): ShuffleReader[K, C]
    --- End diff --
    
    It is a read metrics here actually. In the write PR this is renamed ShuffleReadMetricsReporter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23105#discussion_r236020103
  
    --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.shuffle
    +
    +/**
    + * An interface for reporting shuffle read metrics, for each shuffle. This interface assumes
    + * all the methods are called on a single-threaded, i.e. concrete implementations would not need
    + * to synchronize.
    + *
    + * All methods have additional Spark visibility modifier to allow public, concrete implementations
    + * that still have these methods marked as private[spark].
    + */
    +private[spark] trait ShuffleReadMetricsReporter {
    --- End diff --
    
    @xuanyuanking just submitted a PR on how to use it :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable passing in a custom shuffle metrics...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5234/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    **[Test build #99128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99128/testReport)** for PR 23105 at commit [`da253b5`](https://github.com/apache/spark/commit/da253b57c14bc0174f0330ae6fa5d3a61647269b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23105: [SPARK-26140] Enable custom metrics implementation in sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23105
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99129/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org