You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Aman Deep Singh <am...@gmail.com> on 2017/06/20 12:08:22 UTC

Could not load collection from ZK:

I'm facing a issue in solr sometimes zookeeper failes to load the solr
collection stating

org.apache.solr.common.SolrException: Could not load collection from ZK:

My current setup details is

   1. 5 Nodes  with 4 cores  ,7.6 GB RAM each which contains solr node and
   zookeeper
   2. No sharding is used
   3. index size is around 2.5 GB
   4. solrnode RAM-4GB
   5. Average load -10k RPM
   6. Indexing maximum 1000 docs/Minute  (around 100 batch of 5 to 10 docs
   each)

The GC logs analysis is also normal
Node 1-
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4yLnppcC0tMTEtNTctMw==
Node 2-
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4zLmN1cnJlbnQuemlwLS03LTQ2LTU2
Node 3-
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4zLmN1cnJlbnQuemlwLS04LTIzLTM5
Node 4-
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4yLnppcC0tOC0yMC01NQ==
Node 5-
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy41LmN1cnJlbnQuemlwLS04LTE5LTE0


Admin UI image https://pasteboard.co/1Nd7ArAf0.png

Any idea what causing this problem and how to overcame from this.

Thanks,
Aman Deep Singh

Re: Could not load collection from ZK:

Posted by Alessandro Benedetti <a....@sease.io>.
hi Aman,
I had similar issues in the past and the reason was attributed to :

SOLR-8868 <https://issues.apache.org/jira/browse/SOLR-8868>  

Which unfortunately is not solved yet.

Did you manage to find a different cause in your case?

hope that helps.

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Could not load collection from ZK:

Posted by Matteo Grolla <ma...@gmail.com>.
Hi everybody,
    I'm facing the same problem on solr 7.3.
Probably requesting a longer session to zk (the default 10s seems too
short) will solve the problem but I'm puzzled by the fact that this error
is reported by solrj as a SolrException with status code 400 (BAD_REQUEST).
in ZkStateReader

   public static DocCollection getCollectionLive(ZkStateReader zkStateReader,
String coll) {

  try {
    return zkStateReader.fetchCollectionState(coll, null);
  } catch (KeeperException e) {
    throw new SolrException(ErrorCode.BAD_REQUEST, "Could not load
collection from ZK: " + coll, e);
  } catch (InterruptedException e) {
    Thread.currentThread().interrupt();
    throw new SolrException(ErrorCode.BAD_REQUEST, "Could not load
collection from ZK: " + coll, e);
  }
}


Retrying the reques could solve the problem, but a client should't retry a
BAD_REQUEST.
Why isn't this reported as a 503 (SERVICE_UNAVAILABLE) ?
I think solrj should distinguish the cases:
A: communication problem with zk -> 503
B: user asked a non existing collection ->400

Thanks


Il giorno ven 25 mag 2018 alle ore 05:02 Aman Singh <
amandeep.cool99@gmail.com> ha scritto:

> Hi Shawn & Alessandro,
> We have tried to increase the heap also but we were facing the same issue
> but after removing the ZK from the solr server to their dedicated server
> this problem goes away, Yes when we are facing  this issue the GC activity
> was high around 60-70% out of 400%.
> Regards,
> Aman Deep singh
>
> On 25/05/18, 5:08 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:
>
>     On 6/20/2017 9:46 AM, Aman Deep Singh wrote:
>     > Sorry Shawn,
>     > It didn't copy entire stacktrace I put the stacktrace at
>     > https://www.dropbox.com/s/zf8b87m24ei2ils/solr%20exception2?dl=0
>     >
>     > Note: I have shaded the solr library under com.gdn.solr620  so all
> solr
>     > class will be appear as com.gdn.solr620.org.apache.solr.*
>
>     Wow, I really dropped the ball here.  The thread is nearly a year old.
>     I somehow missed the reply.  I am sorry about that!  Thank Alessandro
>     for reviving the thread and making it clear that I never replied.
>
>     This is the innermost cause:
>
>     Caused by:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
>     KeeperErrorCode = Session expired for
>     /collections/productCollection/state.json
>
>     Either there are network issues talking to ZooKeeper, or something else
>     caused a timeout.  Solr's default ZK client timeout when it is not
>     configured is 15 seconds.  In recent versions, the example
>     configurations have an explicit setting of 30 seconds.  Solr's
>     zkClientTimeout is used to set ZooKeeper's sessionTimeout, and that's
>     what is exceeded when a session expires.
>
>     When this kind of error happens, it means something has gone VERY wrong
>     -- 15 seconds is a REALLY long time when programs are trying to talk to
>     each other.
>
>     One common cause of problems like this is extreme GC pauses.  Typically
>     a pause problem capable of causing a ZK timeout would be due to the
> heap
>     being too small, but it's always possible that it could happen because
>     the heap is VERY large.
>
>     Errors on the client side may not be as informative as corresponding
>     errors in the solr.log file on the server(s).  It would be a good idea
>     to check solr.log for errors as well.
>
>     Thanks,
>     Shawn
>
>
>
>
>

Re: Could not load collection from ZK:

Posted by Aman Singh <am...@gmail.com>.
Hi Shawn & Alessandro,
We have tried to increase the heap also but we were facing the same issue but after removing the ZK from the solr server to their dedicated server this problem goes away, Yes when we are facing  this issue the GC activity was high around 60-70% out of 400%.
Regards,
Aman Deep singh

On 25/05/18, 5:08 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:

    On 6/20/2017 9:46 AM, Aman Deep Singh wrote:
    > Sorry Shawn,
    > It didn't copy entire stacktrace I put the stacktrace at
    > https://www.dropbox.com/s/zf8b87m24ei2ils/solr%20exception2?dl=0
    >
    > Note: I have shaded the solr library under com.gdn.solr620  so all solr
    > class will be appear as com.gdn.solr620.org.apache.solr.*
    
    Wow, I really dropped the ball here.  The thread is nearly a year old. 
    I somehow missed the reply.  I am sorry about that!  Thank Alessandro
    for reviving the thread and making it clear that I never replied.
    
    This is the innermost cause:
    
    Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
    KeeperErrorCode = Session expired for
    /collections/productCollection/state.json
    
    Either there are network issues talking to ZooKeeper, or something else
    caused a timeout.  Solr's default ZK client timeout when it is not
    configured is 15 seconds.  In recent versions, the example
    configurations have an explicit setting of 30 seconds.  Solr's
    zkClientTimeout is used to set ZooKeeper's sessionTimeout, and that's
    what is exceeded when a session expires.
    
    When this kind of error happens, it means something has gone VERY wrong
    -- 15 seconds is a REALLY long time when programs are trying to talk to
    each other.
    
    One common cause of problems like this is extreme GC pauses.  Typically
    a pause problem capable of causing a ZK timeout would be due to the heap
    being too small, but it's always possible that it could happen because
    the heap is VERY large.
    
    Errors on the client side may not be as informative as corresponding
    errors in the solr.log file on the server(s).  It would be a good idea
    to check solr.log for errors as well.
    
    Thanks,
    Shawn
    
    



Re: Could not load collection from ZK:

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/20/2017 9:46 AM, Aman Deep Singh wrote:
> Sorry Shawn,
> It didn't copy entire stacktrace I put the stacktrace at
> https://www.dropbox.com/s/zf8b87m24ei2ils/solr%20exception2?dl=0
>
> Note: I have shaded the solr library under com.gdn.solr620  so all solr
> class will be appear as com.gdn.solr620.org.apache.solr.*

Wow, I really dropped the ball here.  The thread is nearly a year old. 
I somehow missed the reply.  I am sorry about that!  Thank Alessandro
for reviving the thread and making it clear that I never replied.

This is the innermost cause:

Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/collections/productCollection/state.json

Either there are network issues talking to ZooKeeper, or something else
caused a timeout.  Solr's default ZK client timeout when it is not
configured is 15 seconds.  In recent versions, the example
configurations have an explicit setting of 30 seconds.  Solr's
zkClientTimeout is used to set ZooKeeper's sessionTimeout, and that's
what is exceeded when a session expires.

When this kind of error happens, it means something has gone VERY wrong
-- 15 seconds is a REALLY long time when programs are trying to talk to
each other.

One common cause of problems like this is extreme GC pauses.  Typically
a pause problem capable of causing a ZK timeout would be due to the heap
being too small, but it's always possible that it could happen because
the heap is VERY large.

Errors on the client side may not be as informative as corresponding
errors in the solr.log file on the server(s).  It would be a good idea
to check solr.log for errors as well.

Thanks,
Shawn


Re: Could not load collection from ZK:

Posted by Aman Deep Singh <am...@gmail.com>.
Sorry Shawn,
It didn't copy entire stacktrace I put the stacktrace at
https://www.dropbox.com/s/zf8b87m24ei2ils/solr%20exception2?dl=0

Note: I have shaded the solr library under com.gdn.solr620  so all solr
class will be appear as com.gdn.solr620.org.apache.solr.*



On Tue, Jun 20, 2017 at 8:09 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/20/2017 8:25 AM, Aman Deep Singh wrote:
> > This error is coming in the application which is using solrj to
> communicate
> > to the solr
> > full stacktrace is
> >
> > Request processing failed; nested exception is
> com.gdn.solr620.org.apache.
> > solr.common.SolrException: Could not load collection from ZK:
> > productCollection
> <snip>
> > Top command images are at
> >
> https://www.dropbox.com/sh/vxorykk8tmb6amb/AABYIcFuRyfSnlkS6I-Tr5HNa?dl=0
>
> Are there any "Caused by" clauses that come after that stacktrace?  It
> doesn't seem to be complete.  There are no Solr classes shown, so I
> can't see where in Solr code the exception occurred.  If it happened in
> Solr code, then there should be more to the error message.
>
> Thanks,
> Shawn
>
>

Re: Could not load collection from ZK:

Posted by Aman Deep Singh <am...@gmail.com>.
This error is coming in the application which is using solrj to communicate
to the solr
full stacktrace is

Request processing failed; nested exception is com.gdn.solr620.org.apache.
solr.common.SolrException: Could not load collection from ZK:
productCollection
org.springframework.web.util.NestedServletException: Request processing
failed; nested exception is com.gdn.solr620.org.apache.solr.common.
SolrException: Could not load collection from ZK: productCollection at org.
springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet
.java:973) ~[spring-webmvc-4.1.0.RELEASE.jar:4.1.0.RELEASE] at org.
springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:
863) ~[spring-webmvc-4.1.0.RELEASE.jar:4.1.0.RELEASE] at javax.servlet.http.
HttpServlet.service(HttpServlet.java:648) ~[servlet-api.jar:na] at org.
springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:
837) ~[spring-webmvc-4.1.0.RELEASE.jar:4.1.0.RELEASE] at javax.servlet.http.
HttpServlet.service(HttpServlet.java:729) ~[servlet-api.jar:na] at org.
apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:230) [catalina.jar:8.5.4] at org.apache.catalina
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) [
catalina.jar:8.5.4] at org.apache.tomcat.websocket.server.WsFilter.doFilter(
WsFilter.java:52) ~[tomcat-websocket.jar:8.5.4] at org.apache.catalina.core.
ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) [
catalina.jar:8.5.4] at org.apache.catalina.core.ApplicationFilterChain.
doFilter(ApplicationFilterChain.java:165) [catalina.jar:8.5.4] at org.
springframework.web.filter.CharacterEncodingFilter.doFilterInternal(
CharacterEncodingFilter.java:88) ~[spring-web-4.1.0.RELEASE.jar:4.1.0.
RELEASE] at org.springframework.web.filter.OncePerRequestFilter.doFilter(
OncePerRequestFilter.java:107) [spring-web-4.1.0.RELEASE.jar:4.1.0.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:192) [catalina.jar:8.5.4] at org.apache.catalina
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) [
catalina.jar:8.5.4] at com.gdn.x.seoul.web.ui.filter.
ContentSecurityPolicyFilter.doFilter(ContentSecurityPolicyFilter.java:64) ~[
classes/:na] at org.apache.catalina.core.ApplicationFilterChain.
internalDoFilter(ApplicationFilterChain.java:192) [catalina.jar:8.5.4] at
org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:165) [catalina.jar:8.5.4] at com.gdn.x.seoul.web
.ui.filter.RedirectionFilter.doFilter(RedirectionFilter.java:61) [classes/:
na] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:192) [catalina.jar:8.5.4] at org.apache.catalina
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) [
catalina.jar:8.5.4] at com.gdn.x.seoul.web.util.AccessLogFilter.doFilter(
AccessLogFilter.java:83) [seoul-common-web-4.11.3.jar:na] at org.apache.
catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain
.java:192) [catalina.jar:8.5.4] at org.apache.catalina.core.
ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) [catalina.
jar:8.5.4] at org.springframework.security.web.FilterChainProxy$
VirtualFilterChain.doFilter(FilterChainProxy.java:330) [spring-security-web-
3.2.5.RELEASE.jar:3.2.5.RELEASE] at org.springframework.security.web.access.
intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:
118) [spring-security-web-3.2.5.RELEASE.jar:3.2.5.RELEASE] at org.
springframework.security.web.access.intercept.FilterSecurityInterceptor.
doFilter(FilterSecurityInterceptor.java:84) [spring-security-web-3.2.5.
RELEASE.jar:3.2.5.RELEASE] at org.springframework.security.web.
FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [
spring-security-web-3.2.5.RELEASE.jar:3.2.5.RELEASE] at org.springframework.
security.web.access.ExceptionTranslationFilter.doFilter(
ExceptionTranslationFilter.java:113) [spring-security-web-3.2.5.RELEASE.jar:
3.2.5.RELEASE] at org.springframework.security.web.FilterChainProxy$
VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-
3.2.5.RELEASE.jar:3.2.5.RELEASE] at org.springframework.security.web.
authentication.AnonymousAuthenticationFilter.doFilter(
AnonymousAuthenticationFilter.java:113) [spring-security-web-3.2.5.RELEASE.
jar:3.2.5.RELEASE] at org.springframework.security.web.FilterChainProxy$
VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-
3.2.5.RELEASE.jar:3.2.5.RELEASE] at org.springframework.security.web.
servletapi.SecurityContextHolderAwareRequestFilter.doFilter(
SecurityContextHolderAwareRequestFilter.java:154) [spring-security-web-3.2.5
.RELEASE.jar:3.2.5.RELEASE]

Top command images are at
https://www.dropbox.com/sh/vxorykk8tmb6amb/AABYIcFuRyfSnlkS6I-Tr5HNa?dl=0

Re: Could not load collection from ZK:

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/20/2017 6:08 AM, Aman Deep Singh wrote:
> I'm facing a issue in solr sometimes zookeeper failes to load the solr
> collection stating
>
> org.apache.solr.common.SolrException: Could not load collection from ZK:

This is not the full error message.  It will be dozens of lines long,
and may contain one or more "Caused by" sections with separate
stacktraces.  We need the whole thing, with complete text of all lines
included.  If you are looking at the admin UI to see these messages, you
need to look in the solr.log file instead.  It looks like you've used
the service installer script, so you should find that in /var/solr/logs.

> My current setup details is
>
>    1. 5 Nodes  with 4 cores  ,7.6 GB RAM each which contains solr node and
>    zookeeper
>    2. No sharding is used
>    3. index size is around 2.5 GB
>    4. solrnode RAM-4GB
>    5. Average load -10k RPM
>    6. Indexing maximum 1000 docs/Minute  (around 100 batch of 5 to 10 docs
>    each)
>
> The GC logs analysis is also normal
> Node 1-
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4yLnppcC0tMTEtNTctMw==
> Node 2-
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4zLmN1cnJlbnQuemlwLS03LTQ2LTU2
> Node 3-
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4zLmN1cnJlbnQuemlwLS04LTIzLTM5
> Node 4-
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy4yLnppcC0tOC0yMC01NQ==
> Node 5-
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDYvMjAvLS1zb2xyX2djLmxvZy41LmN1cnJlbnQuemlwLS04LTE5LTE0

Overall the GC stats look good.  There are a few long GCs, but the "stop
the world" stats are pretty good.

> Admin UI image https://pasteboard.co/1Nd7ArAf0.png

This screenshot indicates that this system is using swap.  This is
usually an indication that there's an overall memory shortage on the
system ... but if there is no *active* swapping, there's a *small*
chance that this is not a problem.  What I would like to see is a
screenshot with the "top" program running (not htop or any other
variant, I'm after "top" itself), with shift-M pressed to sort the list
by memory usage.

Thanks,
Shawn