You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@htrace.apache.org by "Colin P. McCabe" <cm...@apache.org> on 2015/03/03 04:23:06 UTC

Getting started with Apache HTrace development

A few people have asked how to get started with HTrace development.  It's a
good question and we don't have a great README up about it so I thought I
would
write something.

HTrace is all about tracing distributed systems.  So the best way to get
started is to plug htrace into your favorite distributed system and see what
cool things happen or what bugs pop up.  Since I'm an HDFS developer, that's
the distributed system that I'm most familiar with.  So I will do a quick
writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
important use-case that I would like to write about later, but one step at a
time.)

Just a quick note: a lot of this software is relatively new.  So there may
be
bugs or integration pain points that you encounter.

There has not yet been a stable release of Hadoop that contained Apache
HTrace.
There have been releases that contained the pre-Apache version of HTrace,
but
that's no fun.  If we want to do development, we want to be able to run the
latest version of the code.  So we will have to build it ourselves.

Building HTrace is not too bad.  First we install the dependencies:

cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel

If you have a different Linux distro this command will vary slightly, of
course.  On Macs, "brew" is a good option.
Next we use Maven to build the source:

 > cmccabe@keter:~/> git clone
https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
 > cmccabe@keter:~/> cd incubator-htrace
 > cmccabe@keter:~/> git checkout master
 > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true
-Drat.skip

OK.  So htrace is built and installed to the local ~/.m2 directory.

We should see it under the .m2:
cmccabe@keter:~/> find ~/.m2 | grep htrace-core
...
 > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
 >
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
 >
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
...

The version you built should be 3.2.0-SNAPSHOT.

Next, we check out Hadoop:

 > cmccabe@keter:~/> git clone
https://git-wip-us.apache.org/repos/asf/hadoop.git
 > cmccabe@keter:~/> cd hadoop
 > cmccabe@keter:~/> git checkout branch-2

So we are basically building a pre-release version of Hadoop 2.7, currently
known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
rather
than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
applied this diff to hadoop-project/pom.xml

 > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
 > index 569b292..5b7e466 100644
 > --- a/hadoop-project/pom.xml
 > +++ b/hadoop-project/pom.xml
 > @@ -785,7 +785,7 @@
 >        <dependency>
 >          <groupId>org.apache.htrace</groupId>
 >          <artifactId>htrace-core</artifactId>
 > -        <version>3.1.0-incubating</version>
 > +        <version>3.2.0-incubating-SNAPSHOT</version>
 >        </dependency>
 >        <dependency>
 >          <groupId>org.jdom</groupId>

Next, I built Hadoop:

cmccabe@keter:~/> mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true

You should get a package with Hadoop jars named like so:

...
./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
...

This package should also contain an htrace-3.2.0-SNAPSHOT jar.

OK, so how can we start seeing some trace spans?  The easiest way is to
configure LocalFileSpanReceiver.

Add this to your hdfs-site.xml:

 > <property>
 >   <name>hadoop.htrace.spanreceiver.classes</name>
 >   <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
 > </property>
 > <property>
 >   <name>hadoop.htrace.sampler</name>
 >   <value>AlwaysSampler</value>
 > </property>

When you run the Hadoop daemons, you should see them writing to files named
/tmp/${PROCESS_ID} (for each different process).  If this doesn't happen,
try
cranking up your log4j level to TRACE to see why the SpanReceiver could not
be
created.

You should see something like this in the log4j logs:

 > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
type org.apache.htrace.impl.LocalFileSpanReceiver
 >        at
org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
 >        at
org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
 >        at
org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
 >        at
org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)

Running htraced is easy.  You simply run the binary:

 > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
-Dlog.level=TRACE -Ddata.store.clear

You should see messages like this:

 > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
-Dlog.level=TRACE -Ddata.store.clear
 > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf
 > 2015-03-02T19:08:33-08:00 D: data.store.clear = true
 > 2015-03-02T19:08:33-08:00 D: log.level = TRACE
 > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
/tmp/htrace1/db
 > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db:
Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is
false)
 > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
/tmp/htrace1/db
 > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
/tmp/htrace1/db.
 > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
/tmp/htrace2/db
 > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db:
Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is
false)
 > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
/tmp/htrace2/db
 > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
/tmp/htrace2/db.
...

Similar to Hadoop daemons, htraced can be configured either through an XML
file
named htraced-conf.xml (found in a location pointed to by
HTRACED_CONF_DIR), or
by passing -Dkey=value flags on the command line.

Let's check out the htrace command.

 > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace serverInfo
 > HTraced server version 3.2.0-incubating-SNAPSHOT
(5c0a712c7dd4263f5e2a88d4c61a0facab25953f)

"serverInfo" queries the htraced server via REST and get back a response.
For help using the htrace command, we can run:

 > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help
 > usage: ./htrace-core/src/go/build/htrace [<flags>] <command> [<flags>]
[<args> ...]
 >
 > The Apache HTrace command-line tool. This tool retrieves and modifies
settings and other data on a running htraced daemon.
 >
 > If we find an htraced-conf.xml configuration file in the list of
directories specified in HTRACED_CONF_DIR, we will use that configuration;
otherwise, the defaults will be used.
 >
 > Flags:
 >   --help       Show help.
 >   --Dmy.key="my.value"
 >                Set configuration key 'my.key' to 'my.value'. Replace
'my.key' with any key you want to set.
 >   --addr=ADDR  Server address.
 >   --verbose    Verbose.
 >
 > Commands:
 >   help [<command>]
 >     Show help for a command.
 > ...

We can load spans into the htraced daemon from a text file using
./build/htraced loadSpans [file-path], and dump the span information using
./build/htraced dumpAll.

Now, at this point, we would like our htraced client (Hadoop) to send spans
directly to htraced, rather than dumping them to a local file.
To make this work, we will need to put the htrace-htraced jar on the hadoop
CLASSPATH.

There is probably a better way to do it by setting HADOOP_CLASSPATH, but
this
simple script just puts the jar on every part of the Hadoop CLASSPATH I
could
think of where it might need to be:

 > #!/bin/bash
 >
 > # Copy the installed version of htrace-core to the correct hadoop jar
locations
 > cat << EOF | xargs -n 1 cp
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
 > EOF
 >
 > # Copy the installed version of htrace-htraced to the correct hadoop jar
locations
 > cat << EOF | xargs -n 1 cp
/home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
 >
/home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
 > EOF

At this point, I changed hdfs-site.xml so that
hadoop.htrace.spanreceiver.classes was set to the htraced span receiver:
 > <property>
 >   <name>hadoop.htrace.spanreceiver.classes</name>
 >   <value>org.apache.htrace.impl.HTracedRESTReceiver</value>
 > </property>
 > <property>
 >   <name>htraced.rest.url</name>
 >   <value>http://lumbergh.initech.com:9095/</value>
 > </property>

Obviously set the htraced.rest.url to the host on which you are running
htraced.
This setup should work for sending spans to htraced.

To see the web UI, point your web browser at
http://lumbergh.initech.com:9095/ (or whatever the host name is for you
where htraced is running).

I hope this helps some folks out.  Hopefully building Hadoop and massaging
the classpath is not too bad.  This install process will improve in the
future, as more projects get stable releases with HTrace. There has also
been some discussion of making docker images, which might help new
developers get started.

best,
Colin

Re: Getting started with Apache HTrace development

Posted by Stack <st...@duboce.net>.
On Mon, Mar 9, 2015 at 2:34 PM, Colin McCabe <cm...@alumni.cmu.edu> wrote:

> On Fri, Mar 6, 2015 at 9:21 AM, Stack <st...@duboce.net> wrote:
> > On Thu, Mar 5, 2015 at 5:41 PM, Lewis John Mcgibbney <
> > lewis.mcgibbney@gmail.com> wrote:
> >
> >> Is the website on CMS?
> >>
> >>
> > No sir. svnpubsub.
> >
> > We could add a page under incubator wiki or add a page here
> > http://wiki.apache.org/hadoop/htrace or ask infra for
> > http://wiki.apache.org/htrace or not use the wiki at all but add the
> setup
> > instruction as a site page (I prefer the latter; wiki rots... doc rots
> too
> > but there is more obligation to keep it current).
> >
> > St.Ack
>
> Can't we have both?  Good docs AND wiki?
>
>
No. Smile. Doc it once only.



> It just feels a little weird adding text about how to build Hadoop to
> a document inside HTrace.  They're separate projects, after all.  I
> would feel a lot more comfortable with that stuff on the wiki, like it
> is in Hadoop.
>
>
Are you actually citing Hadoop doc as an example to follow (smile)?



> I guess I don't feel super-strongly about it but I feel like there
> would be a lot more intro material if we could just throw things up on
> a wiki.  Then again, we could have a section of the project docs
> dedicated to getting started and/or integrating with other projects?
> markdown supports links as well.
>
>
Yeah. Let me make site building turnkey so not onerous adding doc.

St.Ack

Re: Getting started with Apache HTrace development

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
On Fri, Mar 6, 2015 at 9:21 AM, Stack <st...@duboce.net> wrote:
> On Thu, Mar 5, 2015 at 5:41 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Is the website on CMS?
>>
>>
> No sir. svnpubsub.
>
> We could add a page under incubator wiki or add a page here
> http://wiki.apache.org/hadoop/htrace or ask infra for
> http://wiki.apache.org/htrace or not use the wiki at all but add the setup
> instruction as a site page (I prefer the latter; wiki rots... doc rots too
> but there is more obligation to keep it current).
>
> St.Ack

Can't we have both?  Good docs AND wiki?

It just feels a little weird adding text about how to build Hadoop to
a document inside HTrace.  They're separate projects, after all.  I
would feel a lot more comfortable with that stuff on the wiki, like it
is in Hadoop.

I guess I don't feel super-strongly about it but I feel like there
would be a lot more intro material if we could just throw things up on
a wiki.  Then again, we could have a section of the project docs
dedicated to getting started and/or integrating with other projects?
markdown supports links as well.

Colin

Re: Getting started with Apache HTrace development

Posted by Stack <st...@duboce.net>.
On Thu, Mar 5, 2015 at 5:41 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Is the website on CMS?
>
>
No sir. svnpubsub.

We could add a page under incubator wiki or add a page here
http://wiki.apache.org/hadoop/htrace or ask infra for
http://wiki.apache.org/htrace or not use the wiki at all but add the setup
instruction as a site page (I prefer the latter; wiki rots... doc rots too
but there is more obligation to keep it current).

St.Ack

Re: Getting started with Apache HTrace development

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Is the website on CMS?

On Thu, Mar 5, 2015 at 5:18 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Can we set up a wiki?  Stuff like this needs to be updated
> periodically and it would be nice to have something like the hadoop
> wiki.  Of course there may be some out of date stuff from time to
> time, but it's better than nothing...
>
> On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney
> <le...@gmail.com> wrote:
> > This is dynamite and I think it would be very helpful to have it linked
> to
> > from the website.
> > Although the install and config doesn't appear too bulky, there are a
> > number of steps and this would be non trivial for someone who is not
> > familiarized with Hadoop xml based runtime configuration.
> > I'm finishing off a patch for Chukwa right now then I will be building
> > HTtace into my Nutxh 2.x search stack. My aim is to write something
> similar
> > for that deployment as R would also be very helpful to see tracing for
> Gora
> > data stores as well.
>
> Awesome.
>
> best,
> Colin
>
> >
> > On Monday, March 2, 2015, Colin P. McCabe <cm...@apache.org> wrote:
> >
> >> A few people have asked how to get started with HTrace development.
> It's a
> >> good question and we don't have a great README up about it so I thought
> I
> >> would
> >> write something.
> >>
> >> HTrace is all about tracing distributed systems.  So the best way to get
> >> started is to plug htrace into your favorite distributed system and see
> >> what
> >> cool things happen or what bugs pop up.  Since I'm an HDFS developer,
> >> that's
> >> the distributed system that I'm most familiar with.  So I will do a
> quick
> >> writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
> >> important use-case that I would like to write about later, but one step
> at
> >> a
> >> time.)
> >>
> >> Just a quick note: a lot of this software is relatively new.  So there
> may
> >> be
> >> bugs or integration pain points that you encounter.
> >>
> >> There has not yet been a stable release of Hadoop that contained Apache
> >> HTrace.
> >> There have been releases that contained the pre-Apache version of
> HTrace,
> >> but
> >> that's no fun.  If we want to do development, we want to be able to run
> the
> >> latest version of the code.  So we will have to build it ourselves.
> >>
> >> Building HTrace is not too bad.  First we install the dependencies:
> >>
> >> cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel
> >>
> >> If you have a different Linux distro this command will vary slightly, of
> >> course.  On Macs, "brew" is a good option.
> >> Next we use Maven to build the source:
> >>
> >>  > cmccabe@keter:~/> git clone
> >> https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
> >>  > cmccabe@keter:~/> cd incubator-htrace
> >>  > cmccabe@keter:~/> git checkout master
> >>  > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true
> >> -Drat.skip
> >>
> >> OK.  So htrace is built and installed to the local ~/.m2 directory.
> >>
> >> We should see it under the .m2:
> >> cmccabe@keter:~/> find ~/.m2 | grep htrace-core
> >> ...
> >>  >
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
> >>  >
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
> >>  >
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
> >> ...
> >>
> >> The version you built should be 3.2.0-SNAPSHOT.
> >>
> >> Next, we check out Hadoop:
> >>
> >>  > cmccabe@keter:~/> git clone
> >> https://git-wip-us.apache.org/repos/asf/hadoop.git
> >>  > cmccabe@keter:~/> cd hadoop
> >>  > cmccabe@keter:~/> git checkout branch-2
> >>
> >> So we are basically building a pre-release version of Hadoop 2.7,
> currently
> >> known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
> >> rather
> >> than the stable 3.1.0 release which it would ordinarily use in
> branch-2.  I
> >> applied this diff to hadoop-project/pom.xml
> >>
> >>  > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
> >>  > index 569b292..5b7e466 100644
> >>  > --- a/hadoop-project/pom.xml
> >>  > +++ b/hadoop-project/pom.xml
> >>  > @@ -785,7 +785,7 @@
> >>  >        <dependency>
> >>  >          <groupId>org.apache.htrace</groupId>
> >>  >          <artifactId>htrace-core</artifactId>
> >>  > -        <version>3.1.0-incubating</version>
> >>  > +        <version>3.2.0-incubating-SNAPSHOT</version>
> >>  >        </dependency>
> >>  >        <dependency>
> >>  >          <groupId>org.jdom</groupId>
> >>
> >> Next, I built Hadoop:
> >>
> >> cmccabe@keter:~/> mvn package -Pdist -DskipTests
> -Dmaven.javadoc.skip=true
> >>
> >> You should get a package with Hadoop jars named like so:
> >>
> >> ...
> >>
> >>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
> >>
> >>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
> >> ...
> >>
> >> This package should also contain an htrace-3.2.0-SNAPSHOT jar.
> >>
> >> OK, so how can we start seeing some trace spans?  The easiest way is to
> >> configure LocalFileSpanReceiver.
> >>
> >> Add this to your hdfs-site.xml:
> >>
> >>  > <property>
> >>  >   <name>hadoop.htrace.spanreceiver.classes</name>
> >>  >   <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
> >>  > </property>
> >>  > <property>
> >>  >   <name>hadoop.htrace.sampler</name>
> >>  >   <value>AlwaysSampler</value>
> >>  > </property>
> >>
> >> When you run the Hadoop daemons, you should see them writing to files
> named
> >> /tmp/${PROCESS_ID} (for each different process).  If this doesn't
> happen,
> >> try
> >> cranking up your log4j level to TRACE to see why the SpanReceiver could
> not
> >> be
> >> created.
> >>
> >> You should see something like this in the log4j logs:
> >>
> >>  > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span
> receiver of
> >> type org.apache.htrace.impl.LocalFileSpanReceiver
> >>  >        at
> >> org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
> >>  >        at
> >>
> >>
> org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
> >>  >        at
> >>
> >>
> org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
> >>  >        at
> >>
> >>
> org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)
> >>
> >> Running htraced is easy.  You simply run the binary:
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> >> -Dlog.level=TRACE -Ddata.store.clear
> >>
> >> You should see messages like this:
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> >> -Dlog.level=TRACE -Ddata.store.clear
> >>  > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf
> >>  > 2015-03-02T19:08:33-08:00 D: data.store.clear = true
> >>  > 2015-03-02T19:08:33-08:00 D: log.level = TRACE
> >>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> >> /tmp/htrace1/db
> >>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db:
> >> Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is
> >> false)
> >>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> >> /tmp/htrace1/db
> >>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> >> /tmp/htrace1/db.
> >>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> >> /tmp/htrace2/db
> >>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db:
> >> Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is
> >> false)
> >>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> >> /tmp/htrace2/db
> >>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> >> /tmp/htrace2/db.
> >> ...
> >>
> >> Similar to Hadoop daemons, htraced can be configured either through an
> XML
> >> file
> >> named htraced-conf.xml (found in a location pointed to by
> >> HTRACED_CONF_DIR), or
> >> by passing -Dkey=value flags on the command line.
> >>
> >> Let's check out the htrace command.
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace
> >> serverInfo
> >>  > HTraced server version 3.2.0-incubating-SNAPSHOT
> >> (5c0a712c7dd4263f5e2a88d4c61a0facab25953f)
> >>
> >> "serverInfo" queries the htraced server via REST and get back a
> response.
> >> For help using the htrace command, we can run:
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help
> >>  > usage: ./htrace-core/src/go/build/htrace [<flags>] <command>
> [<flags>]
> >> [<args> ...]
> >>  >
> >>  > The Apache HTrace command-line tool. This tool retrieves and modifies
> >> settings and other data on a running htraced daemon.
> >>  >
> >>  > If we find an htraced-conf.xml configuration file in the list of
> >> directories specified in HTRACED_CONF_DIR, we will use that
> configuration;
> >> otherwise, the defaults will be used.
> >>  >
> >>  > Flags:
> >>  >   --help       Show help.
> >>  >   --Dmy.key="my.value"
> >>  >                Set configuration key 'my.key' to 'my.value'. Replace
> >> 'my.key' with any key you want to set.
> >>  >   --addr=ADDR  Server address.
> >>  >   --verbose    Verbose.
> >>  >
> >>  > Commands:
> >>  >   help [<command>]
> >>  >     Show help for a command.
> >>  > ...
> >>
> >> We can load spans into the htraced daemon from a text file using
> >> ./build/htraced loadSpans [file-path], and dump the span information
> using
> >> ./build/htraced dumpAll.
> >>
> >> Now, at this point, we would like our htraced client (Hadoop) to send
> spans
> >> directly to htraced, rather than dumping them to a local file.
> >> To make this work, we will need to put the htrace-htraced jar on the
> hadoop
> >> CLASSPATH.
> >>
> >> There is probably a better way to do it by setting HADOOP_CLASSPATH, but
> >> this
> >> simple script just puts the jar on every part of the Hadoop CLASSPATH I
> >> could
> >> think of where it might need to be:
> >>
> >>  > #!/bin/bash
> >>  >
> >>  > # Copy the installed version of htrace-core to the correct hadoop jar
> >> locations
> >>  > cat << EOF | xargs -n 1 cp
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  > EOF
> >>  >
> >>  > # Copy the installed version of htrace-htraced to the correct hadoop
> jar
> >> locations
> >>  > cat << EOF | xargs -n 1 cp
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  > EOF
> >>
> >> At this point, I changed hdfs-site.xml so that
> >> hadoop.htrace.spanreceiver.classes was set to the htraced span receiver:
> >>  > <property>
> >>  >   <name>hadoop.htrace.spanreceiver.classes</name>
> >>  >   <value>org.apache.htrace.impl.HTracedRESTReceiver</value>
> >>  > </property>
> >>  > <property>
> >>  >   <name>htraced.rest.url</name>
> >>  >   <value>http://lumbergh.initech.com:9095/</value>
> >>  > </property>
> >>
> >> Obviously set the htraced.rest.url to the host on which you are running
> >> htraced.
> >> This setup should work for sending spans to htraced.
> >>
> >> To see the web UI, point your web browser at
> >> http://lumbergh.initech.com:9095/ (or whatever the host name is for you
> >> where htraced is running).
> >>
> >> I hope this helps some folks out.  Hopefully building Hadoop and
> massaging
> >> the classpath is not too bad.  This install process will improve in the
> >> future, as more projects get stable releases with HTrace. There has also
> >> been some discussion of making docker images, which might help new
> >> developers get started.
> >>
> >> best,
> >> Colin
> >>
> >
> >
> > --
> > *Lewis*
>



-- 
*Lewis*

Re: Getting started with Apache HTrace development

Posted by "Colin P. McCabe" <cm...@apache.org>.
Can we set up a wiki?  Stuff like this needs to be updated
periodically and it would be nice to have something like the hadoop
wiki.  Of course there may be some out of date stuff from time to
time, but it's better than nothing...

On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> This is dynamite and I think it would be very helpful to have it linked to
> from the website.
> Although the install and config doesn't appear too bulky, there are a
> number of steps and this would be non trivial for someone who is not
> familiarized with Hadoop xml based runtime configuration.
> I'm finishing off a patch for Chukwa right now then I will be building
> HTtace into my Nutxh 2.x search stack. My aim is to write something similar
> for that deployment as R would also be very helpful to see tracing for Gora
> data stores as well.

Awesome.

best,
Colin

>
> On Monday, March 2, 2015, Colin P. McCabe <cm...@apache.org> wrote:
>
>> A few people have asked how to get started with HTrace development.  It's a
>> good question and we don't have a great README up about it so I thought I
>> would
>> write something.
>>
>> HTrace is all about tracing distributed systems.  So the best way to get
>> started is to plug htrace into your favorite distributed system and see
>> what
>> cool things happen or what bugs pop up.  Since I'm an HDFS developer,
>> that's
>> the distributed system that I'm most familiar with.  So I will do a quick
>> writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
>> important use-case that I would like to write about later, but one step at
>> a
>> time.)
>>
>> Just a quick note: a lot of this software is relatively new.  So there may
>> be
>> bugs or integration pain points that you encounter.
>>
>> There has not yet been a stable release of Hadoop that contained Apache
>> HTrace.
>> There have been releases that contained the pre-Apache version of HTrace,
>> but
>> that's no fun.  If we want to do development, we want to be able to run the
>> latest version of the code.  So we will have to build it ourselves.
>>
>> Building HTrace is not too bad.  First we install the dependencies:
>>
>> cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel
>>
>> If you have a different Linux distro this command will vary slightly, of
>> course.  On Macs, "brew" is a good option.
>> Next we use Maven to build the source:
>>
>>  > cmccabe@keter:~/> git clone
>> https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
>>  > cmccabe@keter:~/> cd incubator-htrace
>>  > cmccabe@keter:~/> git checkout master
>>  > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true
>> -Drat.skip
>>
>> OK.  So htrace is built and installed to the local ~/.m2 directory.
>>
>> We should see it under the .m2:
>> cmccabe@keter:~/> find ~/.m2 | grep htrace-core
>> ...
>>  >
>> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
>>  >
>>
>> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
>>  >
>>
>> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
>> ...
>>
>> The version you built should be 3.2.0-SNAPSHOT.
>>
>> Next, we check out Hadoop:
>>
>>  > cmccabe@keter:~/> git clone
>> https://git-wip-us.apache.org/repos/asf/hadoop.git
>>  > cmccabe@keter:~/> cd hadoop
>>  > cmccabe@keter:~/> git checkout branch-2
>>
>> So we are basically building a pre-release version of Hadoop 2.7, currently
>> known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
>> rather
>> than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
>> applied this diff to hadoop-project/pom.xml
>>
>>  > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
>>  > index 569b292..5b7e466 100644
>>  > --- a/hadoop-project/pom.xml
>>  > +++ b/hadoop-project/pom.xml
>>  > @@ -785,7 +785,7 @@
>>  >        <dependency>
>>  >          <groupId>org.apache.htrace</groupId>
>>  >          <artifactId>htrace-core</artifactId>
>>  > -        <version>3.1.0-incubating</version>
>>  > +        <version>3.2.0-incubating-SNAPSHOT</version>
>>  >        </dependency>
>>  >        <dependency>
>>  >          <groupId>org.jdom</groupId>
>>
>> Next, I built Hadoop:
>>
>> cmccabe@keter:~/> mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true
>>
>> You should get a package with Hadoop jars named like so:
>>
>> ...
>>
>> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
>>
>> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
>> ...
>>
>> This package should also contain an htrace-3.2.0-SNAPSHOT jar.
>>
>> OK, so how can we start seeing some trace spans?  The easiest way is to
>> configure LocalFileSpanReceiver.
>>
>> Add this to your hdfs-site.xml:
>>
>>  > <property>
>>  >   <name>hadoop.htrace.spanreceiver.classes</name>
>>  >   <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
>>  > </property>
>>  > <property>
>>  >   <name>hadoop.htrace.sampler</name>
>>  >   <value>AlwaysSampler</value>
>>  > </property>
>>
>> When you run the Hadoop daemons, you should see them writing to files named
>> /tmp/${PROCESS_ID} (for each different process).  If this doesn't happen,
>> try
>> cranking up your log4j level to TRACE to see why the SpanReceiver could not
>> be
>> created.
>>
>> You should see something like this in the log4j logs:
>>
>>  > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
>> type org.apache.htrace.impl.LocalFileSpanReceiver
>>  >        at
>> org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
>>  >        at
>>
>> org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
>>  >        at
>>
>> org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
>>  >        at
>>
>> org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)
>>
>> Running htraced is easy.  You simply run the binary:
>>
>>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
>> -Dlog.level=TRACE -Ddata.store.clear
>>
>> You should see messages like this:
>>
>>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
>> -Dlog.level=TRACE -Ddata.store.clear
>>  > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf
>>  > 2015-03-02T19:08:33-08:00 D: data.store.clear = true
>>  > 2015-03-02T19:08:33-08:00 D: log.level = TRACE
>>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
>> /tmp/htrace1/db
>>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db:
>> Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is
>> false)
>>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
>> /tmp/htrace1/db
>>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
>> /tmp/htrace1/db.
>>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
>> /tmp/htrace2/db
>>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db:
>> Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is
>> false)
>>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
>> /tmp/htrace2/db
>>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
>> /tmp/htrace2/db.
>> ...
>>
>> Similar to Hadoop daemons, htraced can be configured either through an XML
>> file
>> named htraced-conf.xml (found in a location pointed to by
>> HTRACED_CONF_DIR), or
>> by passing -Dkey=value flags on the command line.
>>
>> Let's check out the htrace command.
>>
>>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace
>> serverInfo
>>  > HTraced server version 3.2.0-incubating-SNAPSHOT
>> (5c0a712c7dd4263f5e2a88d4c61a0facab25953f)
>>
>> "serverInfo" queries the htraced server via REST and get back a response.
>> For help using the htrace command, we can run:
>>
>>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help
>>  > usage: ./htrace-core/src/go/build/htrace [<flags>] <command> [<flags>]
>> [<args> ...]
>>  >
>>  > The Apache HTrace command-line tool. This tool retrieves and modifies
>> settings and other data on a running htraced daemon.
>>  >
>>  > If we find an htraced-conf.xml configuration file in the list of
>> directories specified in HTRACED_CONF_DIR, we will use that configuration;
>> otherwise, the defaults will be used.
>>  >
>>  > Flags:
>>  >   --help       Show help.
>>  >   --Dmy.key="my.value"
>>  >                Set configuration key 'my.key' to 'my.value'. Replace
>> 'my.key' with any key you want to set.
>>  >   --addr=ADDR  Server address.
>>  >   --verbose    Verbose.
>>  >
>>  > Commands:
>>  >   help [<command>]
>>  >     Show help for a command.
>>  > ...
>>
>> We can load spans into the htraced daemon from a text file using
>> ./build/htraced loadSpans [file-path], and dump the span information using
>> ./build/htraced dumpAll.
>>
>> Now, at this point, we would like our htraced client (Hadoop) to send spans
>> directly to htraced, rather than dumping them to a local file.
>> To make this work, we will need to put the htrace-htraced jar on the hadoop
>> CLASSPATH.
>>
>> There is probably a better way to do it by setting HADOOP_CLASSPATH, but
>> this
>> simple script just puts the jar on every part of the Hadoop CLASSPATH I
>> could
>> think of where it might need to be:
>>
>>  > #!/bin/bash
>>  >
>>  > # Copy the installed version of htrace-core to the correct hadoop jar
>> locations
>>  > cat << EOF | xargs -n 1 cp
>>
>> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>>  > EOF
>>  >
>>  > # Copy the installed version of htrace-htraced to the correct hadoop jar
>> locations
>>  > cat << EOF | xargs -n 1 cp
>>
>> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>>  >
>>
>> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>>  > EOF
>>
>> At this point, I changed hdfs-site.xml so that
>> hadoop.htrace.spanreceiver.classes was set to the htraced span receiver:
>>  > <property>
>>  >   <name>hadoop.htrace.spanreceiver.classes</name>
>>  >   <value>org.apache.htrace.impl.HTracedRESTReceiver</value>
>>  > </property>
>>  > <property>
>>  >   <name>htraced.rest.url</name>
>>  >   <value>http://lumbergh.initech.com:9095/</value>
>>  > </property>
>>
>> Obviously set the htraced.rest.url to the host on which you are running
>> htraced.
>> This setup should work for sending spans to htraced.
>>
>> To see the web UI, point your web browser at
>> http://lumbergh.initech.com:9095/ (or whatever the host name is for you
>> where htraced is running).
>>
>> I hope this helps some folks out.  Hopefully building Hadoop and massaging
>> the classpath is not too bad.  This install process will improve in the
>> future, as more projects get stable releases with HTrace. There has also
>> been some discussion of making docker images, which might help new
>> developers get started.
>>
>> best,
>> Colin
>>
>
>
> --
> *Lewis*

Re: Getting started with Apache HTrace development

Posted by Lewis John Mcgibbney <le...@gmail.com>.
This is dynamite and I think it would be very helpful to have it linked to
from the website.
Although the install and config doesn't appear too bulky, there are a
number of steps and this would be non trivial for someone who is not
familiarized with Hadoop xml based runtime configuration.
I'm finishing off a patch for Chukwa right now then I will be building
HTtace into my Nutxh 2.x search stack. My aim is to write something similar
for that deployment as R would also be very helpful to see tracing for Gora
data stores as well.

On Monday, March 2, 2015, Colin P. McCabe <cm...@apache.org> wrote:

> A few people have asked how to get started with HTrace development.  It's a
> good question and we don't have a great README up about it so I thought I
> would
> write something.
>
> HTrace is all about tracing distributed systems.  So the best way to get
> started is to plug htrace into your favorite distributed system and see
> what
> cool things happen or what bugs pop up.  Since I'm an HDFS developer,
> that's
> the distributed system that I'm most familiar with.  So I will do a quick
> writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
> important use-case that I would like to write about later, but one step at
> a
> time.)
>
> Just a quick note: a lot of this software is relatively new.  So there may
> be
> bugs or integration pain points that you encounter.
>
> There has not yet been a stable release of Hadoop that contained Apache
> HTrace.
> There have been releases that contained the pre-Apache version of HTrace,
> but
> that's no fun.  If we want to do development, we want to be able to run the
> latest version of the code.  So we will have to build it ourselves.
>
> Building HTrace is not too bad.  First we install the dependencies:
>
> cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel
>
> If you have a different Linux distro this command will vary slightly, of
> course.  On Macs, "brew" is a good option.
> Next we use Maven to build the source:
>
>  > cmccabe@keter:~/> git clone
> https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
>  > cmccabe@keter:~/> cd incubator-htrace
>  > cmccabe@keter:~/> git checkout master
>  > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true
> -Drat.skip
>
> OK.  So htrace is built and installed to the local ~/.m2 directory.
>
> We should see it under the .m2:
> cmccabe@keter:~/> find ~/.m2 | grep htrace-core
> ...
>  >
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
>  >
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
>  >
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
> ...
>
> The version you built should be 3.2.0-SNAPSHOT.
>
> Next, we check out Hadoop:
>
>  > cmccabe@keter:~/> git clone
> https://git-wip-us.apache.org/repos/asf/hadoop.git
>  > cmccabe@keter:~/> cd hadoop
>  > cmccabe@keter:~/> git checkout branch-2
>
> So we are basically building a pre-release version of Hadoop 2.7, currently
> known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
> rather
> than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
> applied this diff to hadoop-project/pom.xml
>
>  > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
>  > index 569b292..5b7e466 100644
>  > --- a/hadoop-project/pom.xml
>  > +++ b/hadoop-project/pom.xml
>  > @@ -785,7 +785,7 @@
>  >        <dependency>
>  >          <groupId>org.apache.htrace</groupId>
>  >          <artifactId>htrace-core</artifactId>
>  > -        <version>3.1.0-incubating</version>
>  > +        <version>3.2.0-incubating-SNAPSHOT</version>
>  >        </dependency>
>  >        <dependency>
>  >          <groupId>org.jdom</groupId>
>
> Next, I built Hadoop:
>
> cmccabe@keter:~/> mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true
>
> You should get a package with Hadoop jars named like so:
>
> ...
>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
> ...
>
> This package should also contain an htrace-3.2.0-SNAPSHOT jar.
>
> OK, so how can we start seeing some trace spans?  The easiest way is to
> configure LocalFileSpanReceiver.
>
> Add this to your hdfs-site.xml:
>
>  > <property>
>  >   <name>hadoop.htrace.spanreceiver.classes</name>
>  >   <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
>  > </property>
>  > <property>
>  >   <name>hadoop.htrace.sampler</name>
>  >   <value>AlwaysSampler</value>
>  > </property>
>
> When you run the Hadoop daemons, you should see them writing to files named
> /tmp/${PROCESS_ID} (for each different process).  If this doesn't happen,
> try
> cranking up your log4j level to TRACE to see why the SpanReceiver could not
> be
> created.
>
> You should see something like this in the log4j logs:
>
>  > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
> type org.apache.htrace.impl.LocalFileSpanReceiver
>  >        at
> org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
>  >        at
>
> org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
>  >        at
>
> org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
>  >        at
>
> org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)
>
> Running htraced is easy.  You simply run the binary:
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> -Dlog.level=TRACE -Ddata.store.clear
>
> You should see messages like this:
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> -Dlog.level=TRACE -Ddata.store.clear
>  > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf
>  > 2015-03-02T19:08:33-08:00 D: data.store.clear = true
>  > 2015-03-02T19:08:33-08:00 D: log.level = TRACE
>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> /tmp/htrace1/db
>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db:
> Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is
> false)
>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> /tmp/htrace1/db
>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> /tmp/htrace1/db.
>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> /tmp/htrace2/db
>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db:
> Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is
> false)
>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> /tmp/htrace2/db
>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> /tmp/htrace2/db.
> ...
>
> Similar to Hadoop daemons, htraced can be configured either through an XML
> file
> named htraced-conf.xml (found in a location pointed to by
> HTRACED_CONF_DIR), or
> by passing -Dkey=value flags on the command line.
>
> Let's check out the htrace command.
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace
> serverInfo
>  > HTraced server version 3.2.0-incubating-SNAPSHOT
> (5c0a712c7dd4263f5e2a88d4c61a0facab25953f)
>
> "serverInfo" queries the htraced server via REST and get back a response.
> For help using the htrace command, we can run:
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help
>  > usage: ./htrace-core/src/go/build/htrace [<flags>] <command> [<flags>]
> [<args> ...]
>  >
>  > The Apache HTrace command-line tool. This tool retrieves and modifies
> settings and other data on a running htraced daemon.
>  >
>  > If we find an htraced-conf.xml configuration file in the list of
> directories specified in HTRACED_CONF_DIR, we will use that configuration;
> otherwise, the defaults will be used.
>  >
>  > Flags:
>  >   --help       Show help.
>  >   --Dmy.key="my.value"
>  >                Set configuration key 'my.key' to 'my.value'. Replace
> 'my.key' with any key you want to set.
>  >   --addr=ADDR  Server address.
>  >   --verbose    Verbose.
>  >
>  > Commands:
>  >   help [<command>]
>  >     Show help for a command.
>  > ...
>
> We can load spans into the htraced daemon from a text file using
> ./build/htraced loadSpans [file-path], and dump the span information using
> ./build/htraced dumpAll.
>
> Now, at this point, we would like our htraced client (Hadoop) to send spans
> directly to htraced, rather than dumping them to a local file.
> To make this work, we will need to put the htrace-htraced jar on the hadoop
> CLASSPATH.
>
> There is probably a better way to do it by setting HADOOP_CLASSPATH, but
> this
> simple script just puts the jar on every part of the Hadoop CLASSPATH I
> could
> think of where it might need to be:
>
>  > #!/bin/bash
>  >
>  > # Copy the installed version of htrace-core to the correct hadoop jar
> locations
>  > cat << EOF | xargs -n 1 cp
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  > EOF
>  >
>  > # Copy the installed version of htrace-htraced to the correct hadoop jar
> locations
>  > cat << EOF | xargs -n 1 cp
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  > EOF
>
> At this point, I changed hdfs-site.xml so that
> hadoop.htrace.spanreceiver.classes was set to the htraced span receiver:
>  > <property>
>  >   <name>hadoop.htrace.spanreceiver.classes</name>
>  >   <value>org.apache.htrace.impl.HTracedRESTReceiver</value>
>  > </property>
>  > <property>
>  >   <name>htraced.rest.url</name>
>  >   <value>http://lumbergh.initech.com:9095/</value>
>  > </property>
>
> Obviously set the htraced.rest.url to the host on which you are running
> htraced.
> This setup should work for sending spans to htraced.
>
> To see the web UI, point your web browser at
> http://lumbergh.initech.com:9095/ (or whatever the host name is for you
> where htraced is running).
>
> I hope this helps some folks out.  Hopefully building Hadoop and massaging
> the classpath is not too bad.  This install process will improve in the
> future, as more projects get stable releases with HTrace. There has also
> been some discussion of making docker images, which might help new
> developers get started.
>
> best,
> Colin
>


-- 
*Lewis*