You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by arielvalentin <gi...@git.apache.org> on 2014/11/09 22:56:16 UTC

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

GitHub user arielvalentin opened a pull request:

    https://github.com/apache/accumulo/pull/18

    ACCUMULO-3321 Logging ThriftScanner exceptions at WARN instead of debug

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/arielvalentin/accumulo ACCUMULO-3321

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18
    
----
commit 50947c8d29c8917f497d93503019a042c12ea180
Author: arielvalentin <ar...@arielvalentin.com>
Date:   2014-11-09T21:55:12Z

    ACCUMULO-3321 Logging ThriftScanner exceptions at WARN instead of debug

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62443647
  
    > there's also the question of knowing to do that.
    
    Thats a con.  Also, the implementation could move to another package in different major versions.
    
    Reading over the conversations made me think about the possibility of using something like dropwizard metrics here.  I have started using that on another project.  Not sure how well it would work, but it would be neat to reach out to an accumulo client process and grab stats on tservers reads, writes, and io errors via jmx.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by busbey <gi...@git.apache.org>.
Github user busbey commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62419110
  
    Over here: https://github.com/apache/accumulo/blob/master/server/base/src/main/java/org/apache/accumulo/server/master/balancer/TabletBalancer.java#L95
    
    line ~95 - 177
    
    The code can't be used directly as-is. but the same pattern could be factored out and used in both.
    
    Also worth looking at ACCUMULO-3311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by arielvalentin <gi...@git.apache.org>.
Github user arielvalentin commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62422047
  
    @busbey Thanks for the hint. @joshelser had mentioned a different ticket [ticket|https://issues.apache.org/jira/browse/ACCUMULO-1268] that tries to address some design changes that the team wants to make to the client that would make it a bit more robust. I feel like the direction you are both preferring to head may fall under the umbrella of the "robust" client, e.g. sophisticated logging thresholds, monitoring, resiliency, etc... 
    Am I reading too much into your comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by arielvalentin <gi...@git.apache.org>.
Github user arielvalentin closed the pull request at:

    https://github.com/apache/accumulo/pull/18


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by arielvalentin <gi...@git.apache.org>.
Github user arielvalentin commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62402671
  
    @joshelser I agree that because the client may recover from the is strictly not an error case, however I feel it is good to have breadcrumbs available when trying to track down problems. In my specific case we run into some really slow scans/writes and I don't know what the root cause is because I am not logging at debug level. I look at the monitor UI and Accumulo is humming along but then have to bump the client side log level in order to see that the first attempt fails on every scan but the second attempt works just fine. 
    I find that the client side logging of first attempt failures at warn level, as opposed to debug, to be very helpful because it allows our monitoring tools to alert our administrators that our application may be having connectivity issues. That allows them the ability to write reports and see how often these failures occur and potentially track down the root cause. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62444055
  
    What I was thinking about the metrics routes is that it makes the info available to anyone who seeks w/o spamming those who are not interested.  I suppose debug log also does this in a way, but its harder to discover and not as stable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62422710
  
    What are the pros and cons of putting the following in your log4j config vs this change?
    
    org.apache.accumulo.core.client.impl.ThriftScanner=DEBUG


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by arielvalentin <gi...@git.apache.org>.
Github user arielvalentin commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62435090
  
    @keith-turner Normally I would be uncomfortable with it but I think your suggestion will work for me in the meantime. After taking another look at the code the ThriftScanner isn't logging much at all right now; all the messages I want to see are at DEBUG level; and although it shares its logger with the OptTimer, that component only logs TRACE messages. 
    
    Looks like creating a new logger and setting it's level to DEBUG could work for now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62422466
  
    Short-term, I think improving the logging to help you debug cases like this in the future is a good idea. Lot's of ways to actually do this, with a few suggestions here. Long-term, this is a very prevalent problem throughout the client implementation that would likely be better solved in some uniform manner :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by busbey <gi...@git.apache.org>.
Github user busbey commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62410572
  
    Can we reuse the log message back off from the balancer warnings about "I can't balance because foo" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62410025
  
    That's a good point too. Ultimately, I'm worried about inundating the client with WARN level logging. Maybe warning on the first occurrence of the inability to get the transport and then re-warn every Nth occurrence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/18#discussion_r20067167
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/client/impl/ThriftScanner.java ---
    @@ -115,12 +115,12 @@ public static boolean getBatchFromServer(Instance instance, Credentials credenti
         } catch (TApplicationException tae) {
           throw new AccumuloServerException(server, tae);
         } catch (TooManyFilesException e) {
    -      log.debug("Tablet (" + extent + ") has too many files " + server + " : " + e);
    +      log.warn("Tablet (" + extent + ") has too many files " + server + " : " + e);
         } catch (ThriftSecurityException e) {
           log.warn("Security Violation in scan request to " + server + ": " + e);
           throw new AccumuloSecurityException(e.user, e.code, e);
         } catch (TException e) {
    -      log.debug("Error getting transport to " + server + " : " + e);
    +      log.warn("Error getting transport to " + server + " : " + e);
    --- End diff --
    
    This is tricky. In the case that a tserver fails, this would cause logging on the client. Ideally, the system will recover from a tserver failure and the client wouldn't even have to know that a failure happened. Might it make more sense to leave this at debug for the first N occurrences for a server and then elevate it to warn at the N+1th time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by arielvalentin <gi...@git.apache.org>.
Github user arielvalentin commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62414640
  
    @busbey I am afraid I am not sure what you are referring to. Would you be able to elaborate a bit for me?
    @joshelser I empathize with that concern because I don't have an instinct as to how much log output this will generate. Would you feel more comfortable with a more sophisticated pattern, where it logs something like "50 failures have occurred in the past N attempts"? I haven't familiarized myself with the code base enough to know if that is possible.     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62440337
  
    @keith-turner while there is something to be said about just increasing the log-level via log4j, there's also the question of knowing to do that. For one, logging the inability to repeatedly get a transport to a server is not a "debug" level message, and, without looking at code, I would say it would be hard to even know that ThriftScanner's level needed to be increased in the first place.
    
    I think that we have some more we could do in the current API impl to make better log messages (at the proper level), but maybe we could also think about putting together a client-debugging doc? (need to re-read the user manual and see how much is already covered).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request: ACCUMULO-3321 Logging ThriftScanner excepti...

Posted by arielvalentin <gi...@git.apache.org>.
Github user arielvalentin commented on the pull request:

    https://github.com/apache/accumulo/pull/18#issuecomment-62459363
  
    Although I haven't used it before I am curious about how circuit breakers like [hystrix](https://github.com/Netflix/Hystrix) might help in situations like this. It captures performance statistics like the ones I am interested in, e.g. 
    
    ![Hystrix dashboard] (https://raw.githubusercontent.com/wiki/Netflix/Hystrix/images/hystrix-dashboard-single-row.png)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---