You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Kevin Risden (JIRA)" <ji...@apache.org> on 2019/02/02 20:51:00 UTC

[jira] [Commented] (SOLR-7215) non reproducible Suite failures due to excessive sysout due to HDFS lease renewal WARN logs due to connection refused -- even if test doesn't use HDFS (ie: threads leaking between tests)

    [ https://issues.apache.org/jira/browse/SOLR-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759194#comment-16759194 ] 

Kevin Risden commented on SOLR-7215:
------------------------------------

Not sure what the status is here. I would guess this is either

a) not an issue any more (lots has changed with HDFS thread cleanup)
b) fixed later due to HDFS thread cleaup
c) still any issue but isn't clear that this has happened recently.

Planning to resolve this since I haven't seen this and last comment was 3+ years ago.

SOLR-9515 with Hadoop 3 upgrade was recent so trying to cleanup old HDFS related JIRAs if it isn't clear they still happen.

> non reproducible Suite failures due to excessive sysout due to HDFS lease renewal WARN logs due to connection refused -- even if test doesn't use HDFS (ie: threads leaking between tests)
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7215
>                 URL: https://issues.apache.org/jira/browse/SOLR-7215
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: tests-report.txt_suite-failure-due-to-sysout.txt.zip
>
>
> On my local machine, i've noticed lately a lot of sporadic, non reproducible, failures like these...
> {noformat}
>   2> NOTE: reproduce with: ant test  -Dtestcase=ScriptEngineTest -Dtests.seed=E254A7E69EC7212A -Dtests.slow=true -Dtests.locale=sv -Dtests.timezone=SystemV/CST6 -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> [14:34:23.749] ERROR   0.00s J1 | ScriptEngineTest (suite) <<<
>    > Throwable #1: java.lang.AssertionError: The test or suite printed 10984 bytes to stdout and stderr, even though the limit was set to 8192 bytes. Increase the limit with @Limit, ignore it completely with @SuppressSysoutChecks or run with -Dtests.verbose=true
>    > 	at __randomizedtesting.SeedInfo.seed([E254A7E69EC7212A]:0)
>    > 	at org.apache.lucene.util.TestRuleLimitSysouts.afterIfSuccessful(TestRuleLimitSysouts.java:212)
> {noformat}
> Invariably, looking at the logs of test that fail for this reason, i see multiple instances of these WARN msgs...
> {noformat}
>   2> 601361 T3064 oahh.LeaseRenewer.run WARN Failed to renew lease for [DFSClient_NONMAPREDUCE_-253604438_2947] for 92 seconds.  Will retry shortly ... java.net.ConnectException: Call From frisbee/127.0.1.1 to localhost:40618 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   2> 	at sun.reflect.GeneratedConstructorAccessor268.newInstance(Unknown Source)
>   2> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  ...
> {noformat}
> ...the full stack traces of these exceptions typically being 36 lines long (not counting the supressed "... 17 more" at the end)
> doing some basic crunching of the "tests-report.txt" file from a recent run of all "solr-core" tests (that caused the above failure) leads to some pretty damn disconcerting numbers...
> {noformat}
> hossman@frisbee:~/tmp$ wc -l tests-report.txt_suite-failure-due-to-sysout.txt
> 1049177 tests-report.txt_suite-failure-due-to-sysout.txt
> hossman@frisbee:~/tmp$ grep "Suite: org.apache.solr" tests-report.txt_suite-failure-due-to-sysout.txt | wc -l
> 465
> hossman@frisbee:~/tmp$ grep "LeaseRenewer.run WARN Failed to renew lease" tests-report.txt_suite-failure-due-to-sysout.txt | grep http://wiki.apache.org/hadoop/ConnectionRefused | wc -l
> 1988
> hossman@frisbee:~/tmp$ calc
> 1988 * 36
> 71568
> {noformat}
> So running 465 Solr test suites, we got ~2 thousand of these "Failed to renew lease" WARNings.  Of the ~1 million total lines of log messages from all tests, ~70 thousand (~7%) are coming from these WARNing mesages -- which can evidently be safetly ignored?
> Something seems broken here.
> Someone who understands this area of the code should either:
> * investigate & fix the code/test not to have these lease renewal problems
> * tweak our test logging configs to supress these WARN messages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org