You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2019/04/04 09:45:00 UTC
[jira] [Comment Edited] (SOLR-13276) Adding Http2 equivalent classes of CloudSolrClient and HttpClusterStateProvider

    [ https://issues.apache.org/jira/browse/SOLR-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798857#comment-16798857 ] 

Cao Manh Dat edited comment on SOLR-13276 at 4/4/19 9:44 AM:
-------------------------------------------------------------

Thanks Hoss, I tried to reproduce the log but even on a Windows machine, it is hard to reproduce it.
It seems that even SolrCloudTest  do see the same failure, attached the log. So this seems that the failure does not introduced by changes made by this issue.

Through the attached log, I suspect the cause of problem is IndexFetcher is kicked off when CoreContainer is shutting down, so the core is not be able to released.
from: {{thetaphi_Lucene-Solr-8.x-Windows_69.log.txt}}

Two nodes are shutting down
{code}
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1698101756
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] o.a.s.c.ZkController Remove node as live in ZooKeeper:/live_nodes/127.0.0.1:61571_solr
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1055741610
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] o.a.s.c.ZkController Remove node as live in ZooKeeper:/live_nodes/127.0.0.1:61566_solr
{code}

After that, indexFetcher failed to close a core which lead to the leak error.
{code}
[junit4]   2> 151088 ERROR (indexFetcher-1096-thread-1) [    ] o.a.s.c.CachingDirectoryFactory Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\build\solr-solrj\test\J0\temp\solr.client.solrj.impl.CloudHttp2SolrClientTest_6DA5B1A938CC311D-001\tempDir-006\node1\.\replicaTypesTestColl_shard2_replica_p10\data\index;done=true>>
{code}
Therefore I think that SOLR-13339 may be able to solve this failure.


was (Author: caomanhdat):
Thanks Hoss, I tried to reproduce the log but even on a Windows machine, it is hard to reproduce it.
It seems that even SolrCloudTest  do see the same failure, attached the log. So this seems that the failure does not introduced by changes made by this issue.

Through the attached log, I suspect the cause of problem is IndexFetcher is kicked off when CoreContainer is shutting down, so the core is not be able to released.
from: {{thetaphi_Lucene-Solr-8.x-Windows_69.log.txt}}

Two nodes are shutting down
{code}
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1698101756
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-3) [    ] o.a.s.c.ZkController Remove node as live in ZooKeeper:/live_nodes/127.0.0.1:61571_solr
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1055741610
   [junit4]   2> 126085 INFO  (jetty-closer-1876-thread-2) [    ] o.a.s.c.ZkController Remove node as live in ZooKeeper:/live_nodes/127.0.0.1:61566_solr
{code}

After that, indexFetcher failed to close a core which lead to the leak error.
{code}
[junit4]   2> 151088 ERROR (indexFetcher-1096-thread-1) [    ] o.a.s.c.CachingDirectoryFactory Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=C:\Users\jenkins\workspace\Lucene-Solr-8.x-Windows\solr\build\solr-solrj\test\J0\temp\solr.client.solrj.impl.CloudHttp2SolrClientTest_6DA5B1A938CC311D-001\tempDir-006\node1\.\replicaTypesTestColl_shard2_replica_p10\data\index;done=true>>
{code}
Therefore I think that SOLR-13336 may be able to solve this failure.

> Adding Http2 equivalent classes of CloudSolrClient and HttpClusterStateProvider 
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-13276
>                 URL: https://issues.apache.org/jira/browse/SOLR-13276
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>             Fix For: 8.1
>
>         Attachments: SOLR-13276.patch, SOLR-13276.patch, SOLR-13276.patch, thetaphi-Lucene-Solr-master-Windows-7810.txt, thetaphi_Lucene-Solr-8.x-Windows_69.log.txt, thetaphi_Lucene-Solr-master-Windows_7754.log.txt
>
>
> Before we can move on and wipe out the usage of apache httpclient inside Solr-core. We need to create Http/2 equivalent classes of CloudSolrClient and HttpClusterStateProvider 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org