You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Colin P. McCabe (JIRA)" <ji...@apache.org> on 2018/11/30 01:25:00 UTC

[jira] [Commented] (HADOOP-15566) Remove HTrace support

    [ https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704118#comment-16704118 ] 

Colin P. McCabe commented on HADOOP-15566:
------------------------------------------

Hi folks,

I just saw this JIRA while searching for something else.  I was one of the guys who worked on HTrace, both on the Hadoop integration side and on the HTrace project itself.  It is definitely sad that it didn't make it out of the incubator.  There is clearly a need for this kind of work in Hadoop and in other projects.

I don't have a strong opinion about which other tracing API should be used in Hadoop.  I would caution everyone that Hadoop's compatibility shackles are heavy -- very heavy indeed.  Just to give an example, a typical Hadoop installation might have HDFS, HBase, and Phoenix installed.  These projects all have separate developers, PMCs, and release cycles, but expect to be able to share the same CLASSPATH happily.  Projects often push back very hard on trying to update library dependencies, especially in "minor" releases.  To add to that, people often stay on older stable versions of Hadoop for years.

In theory, Hadoop vendors offer a snaphot of the full Hadoop stack, carefully configured so that things work together.  In practice, libraries are not always harmonized as well as we would like.  Some users want to mix and match versions of things, or not even use a vendor distribution at all.  This makes setting up end-to-end tracing pretty difficult.

There were some efforts to add better CLASSPATH isolation to Hadoop.  I haven't kept up with those, so I don't know how much this situation has improved.

I do think that the idea of keeping HTrace around as a shim API might make sense for Hadoop.  This would mean that adding support for a new version of the OpenTracing or Zipkin library would only require updating that shim code in hadoop-common, rather than trying to coordinate changes across a dozen Hadoop projects.

Also, HTrace already has code to export spans to Zipkin, if that helps.  I think it would be relatively straightforward to write the same thing for opentracing as well.

> Remove HTrace support
> ---------------------
>
>                 Key: HADOOP-15566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15566
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 3.1.0
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: security
>         Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making further releases. The Hadoop project currently has various hooks with HTrace. It seems in some cases (eg HDFS-13702) these hooks have had measurable performance overhead. Given these two factors, I think we should consider removing the HTrace integration. If there is someone willing to do the work, replacing it with OpenTracing might be a better choice since there is an active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org