You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by milleruntime <gi...@git.apache.org> on 2017/04/28 14:22:09 UTC

[GitHub] accumulo pull request #255: ACCUMULO-4365: Fix trace test in ShellServerIT

GitHub user milleruntime opened a pull request:

    https://github.com/apache/accumulo/pull/255

    ACCUMULO-4365: Fix trace test in ShellServerIT

    This test has been frustrating for some time now... After spending some time looking into a potential bug (possibly one similar to ACCUMULO-4191) I don't think a bug actually exists.
    
    This test looked for "sendMutations" traces generated from the insert command but ACCUMULO-1755 changed the way mutations were handled in TabletServerBatchWriter. The multithreaded nature of mutations are now too unpredictable and time sensitive to reliably look for "sendMutations" traces here. 
    
    Aside from ignoring the test, this is the best solution I can come up with...

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/milleruntime/accumulo ACCUMULO-4365

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #255
    
----
commit 18f99f38ecfc54c2b062a8f9c08477fc65f91a11
Author: Mike Miller <mm...@apache.org>
Date:   2017-04-28T14:12:12Z

    ACCUMULO-4365: Fix trace test in ShellServerIT

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Thanks @billierinaldi good idea with the tracer.span.min.ms parameter.  I will give a try see what we get!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Shell trace output:
    <pre>
    Trace started at 2017/04/28 10:49:35.021
    Time  Start  Service@Location       Name
    25863+0      shell@canx2um-78286w.infosec.tycho.ncsc.mil shell:root
        1+2        tserver@0.0.0.0 listLocalUsers
        3+4499     master@0.0.0.0 beginFateOperation
        1+4508     shell@canx2um-78286w.infosec.tycho.ncsc.mil client:executeFateOperation
        4+4512     master@0.0.0.0 executeFateOperation
        3+4517       master@0.0.0.0 CreateTable
        1+4520       master@0.0.0.0 CreateTable
       13+4525         master@0.0.0.0 SetupPermissions
        3+4541           master@0.0.0.0 PopulateZookeeper
       11+4544           master@0.0.0.0 PopulateZookeeper
        3+4559             master@0.0.0.0 ChooseDir
       13+4565               master@0.0.0.0 CreateDir
       12+4566                 master@0.0.0.0 mkdir
        5+4571                   master@0.0.0.0 ClientNamenodeProtocol#mkdirs
        3+4581                 master@0.0.0.0 PopulateMetadata
        1+4582                   tserver@0.0.0.0 update
        1+4582                     tserver@0.0.0.0 wal
        4+4586                   master@0.0.0.0 FinishCreateTable
        1+4518     shell@canx2um-78286w.infosec.tycho.ncsc.mil client:waitForFateOperation
       88+4519     master@0.0.0.0 waitForFateOperation
        1+4610     shell@canx2um-78286w.infosec.tycho.ncsc.mil client:finishFateOperation
        3+4611     master@0.0.0.0 finishFateOperation
        1+4623     tserver@0.0.0.0 listLocalUsers
       56+8014     shell@canx2um-78286w.infosec.tycho.ncsc.mil close
       46+8015       shell@canx2um-78286w.infosec.tycho.ncsc.mil BinMutations 1
       44+8016         shell@canx2um-78286w.infosec.tycho.ncsc.mil binMutations
        2+8043           shell@canx2um-78286w.infosec.tycho.ncsc.mil client:startScan
        1+8046           tserver@0.0.0.0 startScan
        1+8058           tserver@0.0.0.0 startScan
        1+8058             tserver@0.0.0.0 metadata tablets read ahead 3
        9+8061         shell@canx2um-78286w.infosec.tycho.ncsc.mil org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter 1
        9+8061           shell@canx2um-78286w.infosec.tycho.ncsc.mil sendMutations
        1+8066             shell@canx2um-78286w.infosec.tycho.ncsc.mil client:update
        3+8067             tserver@0.0.0.0 update
        1+8068               tserver@0.0.0.0 wal
        1+8069               tserver@0.0.0.0 commit
        1+8071     tserver@0.0.0.0 listLocalUsers
        4+11506    tserver@0.0.0.0 getTableConfiguration
        4+11513    tserver@0.0.0.0 getTableConfiguration
        4+11519    tserver@0.0.0.0 getTableConfiguration
        4+11524    tserver@0.0.0.0 getTableConfiguration
        5+11540    shell@canx2um-78286w.infosec.tycho.ncsc.mil scan
        5+11540      shell@canx2um-78286w.infosec.tycho.ncsc.mil scan:location
        2+11542        tserver@0.0.0.0 startScan
        2+11542          tserver@0.0.0.0 tablet read ahead 1
        3+17872    master@0.0.0.0 beginFateOperation
        4+17876    master@0.0.0.0 executeFateOperation
        3+17881      master@0.0.0.0 DeleteTable
        3+17884      master@0.0.0.0 DeleteTable
        5+17999        master@0.0.0.0 CleanUp
        3+18001          master@0.0.0.0 scan
        3+18001            master@0.0.0.0 scan:location
        2+18002              tserver@0.0.0.0 startScan
        2+18002                tserver@0.0.0.0 metadata tablets read ahead 8
        4+18058        master@0.0.0.0 CleanUp
        3+18059          master@0.0.0.0 scan
        3+18059            master@0.0.0.0 scan:location
        1+18059              master@0.0.0.0 client:startScan
        1+18060              tserver@0.0.0.0 startScan
        1+18060                tserver@0.0.0.0 metadata tablets read ahead 1
       40+18062        master@0.0.0.0 CleanUp
        3+18063          master@0.0.0.0 batch scanner 25- 1
        1+18064            tserver@0.0.0.0 startMultiScan
        1+18064              tserver@0.0.0.0 metadata tablets read ahead 2
        2+18067          master@0.0.0.0 scan
        2+18067            master@0.0.0.0 scan:location
        1+18067              tserver@0.0.0.0 startScan
        1+18067                tserver@0.0.0.0 metadata tablets read ahead 3
        4+18069          master@0.0.0.0 close
        2+18076          master@0.0.0.0 scan
        2+18076            master@0.0.0.0 scan:location
        1+18077              tserver@0.0.0.0 startScan
        1+18077                tserver@0.0.0.0 metadata tablets read ahead 4
        5+18079          master@0.0.0.0 delete
        3+18080            master@0.0.0.0 ClientNamenodeProtocol#delete
      229+17880    master@0.0.0.0 waitForFateOperation
        2+18110    master@0.0.0.0 finishFateOperation
        1+23194    tserver@0.0.0.0 listLocalUsers
    The following spans are not rooted (probably due to a parent span of length 0ms):
        4+18069  master@0.0.0.0 org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter 1
        4+18069  master@0.0.0.0 sendMutations
        3+18070  tserver@0.0.0.0 wal
        3+18070  tserver@0.0.0.0 update
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    FAILURE running with Maven:
    <pre>
    Trace started at 2017/04/28 11:06:53.041
    Time  Start  Service@Location       Name
     2251+0      shell@canx2um-78286w.infosec.tycho.ncsc.mil shell:root
        2+1        master@0.0.0.0 beginFateOperation
        2+3        master@0.0.0.0 executeFateOperation
        1+6          master@0.0.0.0 CreateTable
        1+7          master@0.0.0.0 CreateTable
        5+9            master@0.0.0.0 SetupPermissions
        1+16             master@0.0.0.0 PopulateZookeeper
        5+17             master@0.0.0.0 PopulateZookeeper
        2+24               master@0.0.0.0 ChooseDir
       37+6        master@0.0.0.0 waitForFateOperation
        2+43       master@0.0.0.0 finishFateOperation
     1009+49       shell@canx2um-78286w.infosec.tycho.ncsc.mil close
        5+49         shell@canx2um-78286w.infosec.tycho.ncsc.mil BinMutations 1
        2+49           shell@canx2um-78286w.infosec.tycho.ncsc.mil binMutations
        1+50             tserver@0.0.0.0 startScan
        1+50               tserver@0.0.0.0 metadata tablets read ahead 5
        2+1058     tserver@0.0.0.0 getTableConfiguration
        2+1061     tserver@0.0.0.0 getTableConfiguration
        2+1064     tserver@0.0.0.0 getTableConfiguration
        3+1066     tserver@0.0.0.0 getTableConfiguration
        2+1100     shell@canx2um-78286w.infosec.tycho.ncsc.mil scan
        2+1100       shell@canx2um-78286w.infosec.tycho.ncsc.mil scan:location
        2+1100         tserver@0.0.0.0 startScan
        1+1101           tserver@0.0.0.0 tablet read ahead 8
        2+1103     master@0.0.0.0 beginFateOperation
        2+1105     master@0.0.0.0 executeFateOperation
        2+1108       master@0.0.0.0 DeleteTable
        1+1110       master@0.0.0.0 DeleteTable
        2+1221         master@0.0.0.0 CleanUp
        1+1222           master@0.0.0.0 scan
        1+1222             master@0.0.0.0 scan:location
        1+1222               tserver@0.0.0.0 startScan
        1+1222                 tserver@0.0.0.0 metadata tablets read ahead 3
       19+1223         master@0.0.0.0 CleanUp 
        1+1223           master@0.0.0.0 batch scanner 525- 1
        1+1225           master@0.0.0.0 close
        1+1226           master@0.0.0.0 scan
        1+1226             master@0.0.0.0 scan:location
      138+1107     master@0.0.0.0 waitForFateOperation
        1+1246     master@0.0.0.0 finishFateOperation
    The following spans are not rooted (probably due to a parent span of length 0ms):
        1+29     master@0.0.0.0 PopulateMetadata
        1+29     tserver@0.0.0.0 prep
        1+29     tserver@0.0.0.0 update
        3+32     master@0.0.0.0 FinishCreateTable
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Don't forget to close the JIRA if this is done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    The reason this is borked due to the thread pools is probably due to the fact that the parent span id is stored in a thread local variable. If we can create spans by explicitly passing the parent (constructor or static method), that would probably fix the issue. The more complicated fix is to use a trace-aware threadpool or executor service.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Created a quick attempt at a static parent span to pass to the threads in 7444888ff8582bc31d0c61444103cc4dc9ff362a
    
    Doesn't seem to work though.... whomp whomp


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Also, GitHub appears broke right now... so if you see the same comment repeated 4x sometime later, it's because I tried to submit it as a review comment a few times first, but it didn't work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > Could it be because we have the traces in 2 separate thread pools and htrace gets confused:
    
    > Or because we change the name of the thread (wtf??) in TabletServerBatchWriter.SendTask.send() 
    
    What little I knew about HTrace and it's implementation is lost in the wind, but both seem plausible. May require some new spelunking into their codebase (heck, I don't even know what version we're using of theirs anymore).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Merged into 1.7, 1.8 and master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Aha!
    
    > The following spans are not rooted (probably due to a parent span of length 0ms):
    
    I don't think this test is even validating the correct thing. That span is supposed to be under..
    
    ```
     1010+168      shell@canx2um-78286w.infosec.tycho.ncsc.mil close
        2+169        shell@canx2um-78286w.infosec.tycho.ncsc.mil BinMutations 1
        2+169          shell@canx2um-78286w.infosec.tycho.ncsc.mil binMutations
    ```
    
    The test is really just verifying that we non-rooted spans are still logged (which is definitely not the intent). It seems more like ACCUMULO-1755 broke the span hierarchy in the TSBW.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > If we can create spans by explicitly passing the parent (constructor or static method)
    
    I'm not quite sure exactly what you mean... instead of this:
    `Span span = Trace.start("sendMutations");
    ...
    span.stop();`
    We would have something like:
    `Span child = Span.trace(parent, "sendMutations"); 
    ...
    child.stop() `
    Wouldn't the child still be local?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    I think we got it!  Looking good with 22c9481
    
    I will run the IT a few more times just to make sure this is squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime closed the pull request at:

    https://github.com/apache/accumulo/pull/255


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Now why is it rooted correctly when run manually in the shell????
    <pre>
    shell@canx2um-78286w.infosec.tycho.ncsc.mil close
       46+8015       shell@canx2um-78286w.infosec.tycho.ncsc.mil BinMutations 1
       44+8016         shell@canx2um-78286w.infosec.tycho.ncsc.mil binMutations
        2+8043           shell@canx2um-78286w.infosec.tycho.ncsc.mil client:startScan
        1+8046           tserver@0.0.0.0 startScan
        1+8058           tserver@0.0.0.0 startScan
        1+8058             tserver@0.0.0.0 metadata tablets read ahead 3
        9+8061         shell@canx2um-78286w.infosec.tycho.ncsc.mil org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter 1
        9+8061           shell@canx2um-78286w.infosec.tycho.ncsc.mil sendMutations
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > Now why is it rooted correctly when run manually in the shell????
    
    So, the hierarchy definitely seems goofed to me. MutationWriter+sendMutations are contained beneath BinMutations (i'd expected MutationWriter to be at the same level as BinMutations, beneath `close`). Maybe BinMutations should just be some "BinAndWrite" span (guessing that's the new thread which was introduced).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by billierinaldi <gi...@git.apache.org>.
Github user billierinaldi commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    I figure there are two possibilities, 1) the sendMutations span isn't always created, which would be bad, or 2) the span is created, but it's dropped sometimes because it's of length 0ms. It could be the latter, since I see sendMutations was 1ms for one of the sample traces posted. We could eliminate the second possibility by configuring the span receiver not to drop 0ms spans for this test. This can be done by setting tracer.span.min.ms to 0 (default value is 1) in the accumulo conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Could it be because we have the traces in 2 separate thread pools and htrace gets confused:
    In org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
    BinMutations:
    `binningThreadPool.execute(Trace.wrap(new Runnable() {`
    And sendMutations:
    `sendThreadPool.submit(Trace.wrap(new SendTask(server)));`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Or because we change the name of the thread (wtf??) in TabletServerBatchWriter.SendTask.send() https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L856
    before starting the "sendMutations" trace??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > The multithreaded nature of mutations are now too unpredictable and time sensitive to reliably look for "sendMutations" traces here.
    
    Do you have more specifics as to why this is?
    
    ACCUMULO-1755 added that single-thread that will bin and then send mutations. It seems odd that we would lose the sendMutations span (as that code hasn't really changed, just how it gets invoked).
    
    Do you have the actual trace hierarchy handy from a test failure? Maybe it would ring a bell for @billierinaldi. I recall debugging weird tracing with her in a starbucks once upon a time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    Instead of long javadoc, could just leave old behavior in, commented-out with comment line above it explaining that the span doesn't always appear (isn't always rooted properly).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    With the span min set to 0, I am consistently seeing the Test pass and the output:
    <pre>    0+2268               master@0.0.0.0 org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter 1
        0+2268                 master@0.0.0.0 sendMutations</pre>
    
    The span taking <1ms would explain why this test was previously failing a lot on my machine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > Do you have more specifics as to why this is?
    
    No, only experience with getting different traces and your and Shawn Walkers comments on [PR94](https://github.com/apache/accumulo/pull/94)
    
    > ACCUMULO-1755 added that single-thread that will bin and then send mutations. It seems odd that we would lose the sendMutations span (as that code hasn't really changed, just how it gets invoked).
    
    I don't think sendMutations get lost, just that its a race condition we don't have control over.  Running the trace commands in the shell works fine. And you guys seemed to put a lot of design into ACCUMULO-1755.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    SUCCESS running with Maven:
    <pre>
    Trace started at 2017/04/28 11:01:58.382
    Time  Start  Service@Location       Name
     2375+0      shell@canx2um-78286w.infosec.tycho.ncsc.mil shell:root
        2+101      master@0.0.0.0 beginFateOperation
        2+107      master@0.0.0.0 executeFateOperation
        2+110        master@0.0.0.0 CreateTable
        1+109      shell@canx2um-78286w.infosec.tycho.ncsc.mil client:waitForFateOperation
       47+113      master@0.0.0.0 waitForFateOperation
        2+160      master@0.0.0.0 finishFateOperation
     1010+168      shell@canx2um-78286w.infosec.tycho.ncsc.mil close
        2+169        shell@canx2um-78286w.infosec.tycho.ncsc.mil BinMutations 1
        2+169          shell@canx2um-78286w.infosec.tycho.ncsc.mil binMutations
        1+169            tserver@0.0.0.0 startScan
        3+1178     tserver@0.0.0.0 getTableConfiguration
        3+1181     tserver@0.0.0.0 getTableConfiguration
        2+1184     tserver@0.0.0.0 getTableConfiguration
        3+1186     tserver@0.0.0.0 getTableConfiguration
        2+1219     shell@canx2um-78286w.infosec.tycho.ncsc.mil scan
        2+1219       shell@canx2um-78286w.infosec.tycho.ncsc.mil scan:location
        1+1220         tserver@0.0.0.0 startScan
        1+1220           tserver@0.0.0.0 tablet read ahead 8
        2+1222     master@0.0.0.0 beginFateOperation
        2+1224     master@0.0.0.0 executeFateOperation
        3+1227       master@0.0.0.0 DeleteTable
        1+1230       master@0.0.0.0 DeleteTable
        3+1342         master@0.0.0.0 CleanUp
        2+1343           master@0.0.0.0 scan
        2+1343             master@0.0.0.0 scan:location
       21+1345         master@0.0.0.0 CleanUp
        1+1345           master@0.0.0.0 batch scanner 516- 1
        1+1347           master@0.0.0.0 scan
        1+1347             master@0.0.0.0 scan:location
        1+1347               tserver@0.0.0.0 startScan
        1+1348           master@0.0.0.0 close
        1+1349           master@0.0.0.0 scan
        1+1349             master@0.0.0.0 scan:location
        1+1349               tserver@0.0.0.0 startScan
        1+1349                 tserver@0.0.0.0 metadata tablets read ahead 3
      142+1227     master@0.0.0.0 waitForFateOperation
        2+1369     master@0.0.0.0 finishFateOperation
    The following spans are not rooted (probably due to a parent span of length 0ms):
        8+114    master@0.0.0.0 SetupPermissions
        2+124    master@0.0.0.0 PopulateZookeeper
        8+126    master@0.0.0.0 PopulateZookeeper
        2+136    master@0.0.0.0 ChooseDir
        1+142    master@0.0.0.0 PopulateMetadata
        1+142    tserver@0.0.0.0 prep
        1+142    tserver@0.0.0.0 update
        4+145    master@0.0.0.0 FinishCreateTable
        1+1348   master@0.0.0.0 sendMutations
        1+1348   master@0.0.0.0 org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter 1
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    So this is the guy causing us problems in Tracer.java:
    `private static final ThreadLocal<Span> currentSpan = new ThreadLocal<Span>() {..}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > I don't think sendMutations get lost, just that its a race condition we don't have control over. 
    
    Correct me if I'm wrong: but if we just don't understand the problem, I'm not in favor of removing this check. If we know definitively what the issue is and accept why it is something we cannot fix, that's a different story.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #255: ACCUMULO-4365: Fix trace test in ShellServerIT

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/255
  
    > So this is the guy causing us problems in Tracer.java:
    
    Ah, so that ThreadLocal is designed so that we can transparently pick up the "current" thread. When we spawn a new thread, we need to pass along the current spanId to that child. I'm pretty sure we already have wrappers for `Runnable` to do this -- maybe they just weren't used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---