You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@htrace.apache.org by Mike Miller <mm...@apache.org> on 2017/04/28 17:54:31 UTC

Problematic Trace in Multithreaded Environment

Hello Apache Friends,

I am trying to track down a possible issue that Accumulo is having with a
test in our Shell code that does some tracing.  The test is
ShellServerIT.trace
<https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/ShellServerIT.java#L1612>
and
it will periodically fail looking for the trace "sendMutations" on line
1624.

I can't figure out if this is an issue with Htrace version 3.1.0-incubation
or something crazy we are doing with multithreading.  When the test is run
through Maven, the trace seems to always be unrooted:
The following spans are not rooted (probably due to a parent span of length
0ms):
    2+136    master@0.0.0.0 PopulateMetadata
    1+137    tserver@0.0.0.0 wal
    1+137    tserver@0.0.0.0 update
    4+140    master@0.0.0.0 FinishCreateTable
    1+1340   master@0.0.0.0
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter
1
    1+1340   master@0.0.0.0 sendMutations

But when I perform the same trace through the Accumulo shell, entering the
commands manually, it seems to work OK:
    5+33230          master@0.0.0.0 close
    1+33230            master@0.0.0.0 BinMutations 1
    4+33231              master@0.0.0.0
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter
1
    4+33231                master@0.0.0.0 sendMutations

The code that is doing the trace is in our TabletServerBatchWriter here
<https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L858>
.

I have thought of two possible reasons we are having problems...
1 - We are tracing in 2 separate thread pools and htrace gets confused:
In org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
BinMutations trace:
binningThreadPool.execute(Trace.wrap(new Runnable() {
And sendMutations trace:
sendThreadPool.submit(Trace.wrap(new SendTask(server)));

2 - We change the name of the current thread just before starting the
"sendMutations" trace:
In TabletServerBatchWriter.SendTask.send()
<https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L856>


Any ideas/insight would be greatly appreciated.  Thank you!
- Mike

Re: Problematic Trace in Multithreaded Environment

Posted by Colin McCabe <cm...@apache.org>.
On Fri, Apr 28, 2017, at 10:54, Mike Miller wrote:
> Hello Apache Friends,
> 
> I am trying to track down a possible issue that Accumulo is having with a
> test in our Shell code that does some tracing.  The test is
> ShellServerIT.trace
> <https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/ShellServerIT.java#L1612>
> and
> it will periodically fail looking for the trace "sendMutations" on line
> 1624.
> 
> I can't figure out if this is an issue with Htrace version
> 3.1.0-incubation
> or something crazy we are doing with multithreading.  When the test is
> run
> through Maven, the trace seems to always be unrooted:
> The following spans are not rooted (probably due to a parent span of
> length
> 0ms):
>     2+136    master@0.0.0.0 PopulateMetadata
>     1+137    tserver@0.0.0.0 wal
>     1+137    tserver@0.0.0.0 update
>     4+140    master@0.0.0.0 FinishCreateTable
>     1+1340   master@0.0.0.0
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter
> 1
>     1+1340   master@0.0.0.0 sendMutations
> 
> But when I perform the same trace through the Accumulo shell, entering
> the
> commands manually, it seems to work OK:
>     5+33230          master@0.0.0.0 close
>     1+33230            master@0.0.0.0 BinMutations 1
>     4+33231              master@0.0.0.0
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter
> 1
>     4+33231                master@0.0.0.0 sendMutations
> 
> The code that is doing the trace is in our TabletServerBatchWriter here
> <https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L858>
> .
> 
> I have thought of two possible reasons we are having problems...
> 1 - We are tracing in 2 separate thread pools and htrace gets confused:
> In org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
> BinMutations trace:
> binningThreadPool.execute(Trace.wrap(new Runnable() {
> And sendMutations trace:
> sendThreadPool.submit(Trace.wrap(new SendTask(server)));
> 
> 2 - We change the name of the current thread just before starting the
> "sendMutations" trace:
> In TabletServerBatchWriter.SendTask.send()
> <https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L856>
> 
> 
> Any ideas/insight would be greatly appreciated.  Thank you!
> - Mike

Hi Mike,

Changes in thread name will not be an issue, because HTrace uses
thread-local data to maintain its internal state.  Similarly, parent
spans with length 0, although not great to look at, still do count as
parents.

I think what trips up a lot of people tracing with thread pools is
failing to close traces.  Take a look and make sure you are not
experiencing this too.  In later versions of HTrace, we added warnings
for this kind of thing when we detected it.  Another thing to look at
would be which actual threads are performing which operations -- i.e. if
there are "handoffs" between threads, do the spans get detached and
reattached too?

best,
Colin