You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by devaraj-kavali <gi...@git.apache.org> on 2018/04/14 00:13:29 UTC

[GitHub] spark pull request #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

GitHub user devaraj-kavali opened a pull request:

    https://github.com/apache/spark/pull/21071

    [SPARK-21962][CORE] Distributed Tracing in Spark

    ## What changes were proposed in this pull request?
    
    This PR integrates with HTrace, it sends traces for the application and tasks when the span receivers are configured. The trace configurations can be updated along with spark configurations by adding prefix 'spark.htrace.' to the HTrace configurations like below,
    
    `spark.htrace.span.receiver.classes`	org.apache.htrace.core.LocalFileSpanReceiver;org.apache.htrace.impl.HTracedSpanReceiver;org.apache.htrace.impl.ZipkinSpanReceiver
    `spark.htrace.htraced.receiver.address`	IP:PORT
    `spark.htrace.local.file.span.receiver.path`	/path/local-span-file
    `spark.htrace.sampler.classes`	org.apache.htrace.core.AlwaysSampler
    
    And also it provides an additional configuration to receive the parent span with the config name `spark.app.spanId`, if the `spark.app.spanId` configuration exist then it takes it as parent span, otherwise it starts a new span for each application.
    
    ## How was this patch tested?
    
    I have verified using the existing tests with the added test and also verified manually in all these below deployment modes with different tracers individually and together.
    
    1. Local and local-cluster
    2. Standalone Client and Cluster modes
    3. Yarn Client and Cluster modes
    4. Mesos Client and Cluster modes

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/devaraj-kavali/spark SPARK-21962

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21071
    
----
commit 254e4ed38411d45cc8c2ba8cdace069da219c359
Author: Devaraj K <de...@...>
Date:   2018-04-14T00:06:36Z

    [SPARK-21962][CORE] Distributed Tracing in Spark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    + @rdblue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    Some metrics to convince ourselves that using the null scope has no performance impact would be great.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    @devaraj-kavali, do you have any measurements to quantify how this impacts overall performance? We would want to know before releasing this for use because using HTrace means having it on all the time to be able to analyze slow-downs. Like @steveloughran suggests, it should also be optional (unless there is no measurable cost to having it on all the time).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    @gatorsmile we need to have this for K8S as well, will include it in SPIP.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    This probably deserves its own SPIP. Also unclear whether we should just support htrace, or have an extension api so users can plug in whatever they want.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    yap... HTrace is [retired](http://mail-archives.apache.org/mod_mbox/htrace-dev/201804.mbox/%3Cpony-b7497055821405926d63668ab1112e0f108e2346-2561e81afc434e2d237bbeb5b5921941503445e4%40dev.htrace.apache.org%3E).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali closed the pull request at:

    https://github.com/apache/spark/pull/21071


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    @devaraj-kavali can you close this PR first?
    
    Looks like there isn't any reason to really use htrace anymore ...



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21071#discussion_r181810726
  
    --- Diff: core/src/main/scala/org/apache/spark/trace/SparkAppTracer.scala ---
    @@ -0,0 +1,41 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.trace
    +
    +import org.apache.htrace.core.{HTraceConfiguration, Tracer}
    +
    +import org.apache.spark.SparkConf
    +
    +object SparkAppTracer {
    --- End diff --
    
    best to mark as private [spark]


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    @devaraj-kavali How about K8S?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    @rxin +1 for each of your sentences.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    I like this, but you'll need people with authority to trigger the builds and reviews.
    
    There's some discussion kicked off last week on the ASF incubator about the fact that htrace has been dormant for a while and should it be retired...I think I and others would be happy to pull it into hadoop-core itself, given there's dependencies going in to HDFS & places. That's as an API for tracing to feed into things like Zipkin; nothing grand. There's a risk that the spark project will hold back on this until that's been clarified.
    
    Minor code issues
    * this turns HTrace on always; do you think it should be optional
    * needs some tests to  handle malformed span-ids coming in?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    Thanks @rxin and @markhamstra for your comments, I will come up with SPIP design draft and start the discussion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    cc @jiangxb1987 @JoshRosen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the issue:

    https://github.com/apache/spark/pull/21071
  
    Thanks @steveloughran and @rdblue for looking into this.
    
    bq. this turns HTrace on always; do you think it should be optional
    It operates on NullScope which doesn't do anything when there are no SpanReceivers and Samplers configured, I think it would be ok or otherwise we can add a configuration to enable or disable.
    
    bq. needs some tests to handle malformed span-ids coming in?
    sure, I will add more tests to handle these scenarios.
    
    @rdblue, 
    I understand it should be optimal, I verified with the few tests and don't see any measurable impact with these changes. I can measure the performance with more tests and post if we all agree on this approach.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org