You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "rymarm (via GitHub)" <gi...@apache.org> on 2023/04/27 21:01:33 UTC

[PR] DRILL-8426: Fix endless retrying zk set data for a large query (drill)

rymarm opened a new pull request, #2796:
URL: https://github.com/apache/drill/pull/2796

   # [DRILL-8426](https://issues.apache.org/jira/browse/DRILL-8426): Fix endless retrying zk set data for a large query
   
   ## Description
   Zookeeper closes a connection with a client if he tries to set data with a size bigger than `jute.maxbuffer`. By default, it is equal to 1MB.
   
   Drill persists it's running queries in zookeeper. If you issue a large query (bigger than the value of `jute.maxbuffer` on the zookeeper server) Drill will try to persist it and [get a ConnectinLoss exception](https://github.com/mapr/private-zookeeper/blob/1071ebf7f2936414443fab95055775643c6988db/zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java#L533), curator (client library that Drill uses to communicate with Zookeeper) will try to retry the set command based on [RetryPolicy](https://github.com/apache/curator/blob/34055bbaeda55f06b8cd47b99c08d69c4edde72e/curator-client/src/main/java/org/apache/curator/RetryPolicy.java#L53). Drill uses [RetryNTimes](https://github.com/apache/drill/blob/2204d5f51ed33befe234019e0faa321f02cfc61e/exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java#L112) policy which in Drill is set so to keep retrying for [7200 times](https://github.com/apache/drill/blob/2204d5f51ed33befe234019e0faa321f0
 2cfc61e/exec/java-exec/src/main/resources/drill-module.conf#L91). And while Drill retrying to persist large query to zookeeper, he with each try will losing connection with zookeeper (server will cutting off connection, because data has to big size) and it will keeping for around 1 hour. After this, the client that issued the big query will not receive any error or any result, cause the final exception is not properly processed.
   
   What I change:
   1. Drill will compare the size of data with the client `jute.maxbuffer` value and if it is bigger, then throw IllegalArgumentException that will be wrapped into `UserException.executionError`. It is still doesn't safe Drill from trying to persist to big data into zookeeper, because a user can manually change the value of `jute.maxbuffer` on the client or the server side and then may have inconsistent values (a client `jute.maxbuffer` value is not equal to a server `jute.maxbuffer`). [But as said in zookeeper documentation](https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html), if the user has changed `jute.maxbuffer` value, then the user should change it on all the zookeeper servers and clients. So in the general case - this check will be enough.
   2. Make Foreman properly process exception that may be raised from `queryStateProcessor.moveToState`.
   3. Reduce `drill.exec.zk.retry.count` from 7200 to 15
   4. Add info logs, if the zookeeper client will raise exception during set operation, so the user was aware what the data size was and what value of `jute.maxbuffer` Drill has.
   
   
   ## Documentation
   Add some information to [troubleshooting page](https://drill.apache.org/docs/troubleshooting/), what to do if you catched such an exception and Drill was not responding for a long time?
   ```
   Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /drill/running/1bb91a06-3afe-8152-f3ce-048dd3bef992
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
           at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1672)
           at org.apache.curator.framework.imps.CreateBuilderImpl$18.call(CreateBuilderImpl.java:1216)
           at org.apache.curator.framework.imps.CreateBuilderImpl$18.call(CreateBuilderImpl.java:1193)
           at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
           at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1190)
           at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:605)
           at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:595)
           at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:48)
           at org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:294)
           ... 10 common frames omitted
   ```
   
   
   ## Testing
   Manual test, I tried to execute a huge query like this:
   ```
   select full_name from cp.`employee.json` where full_name in ('Sheri Nowmer', 'Sheri Nowmer', ........)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] DRILL-8426: Fix endless retrying zk set data for a large query (drill)

Posted by "cgivre (via GitHub)" <gi...@apache.org>.
cgivre commented on PR #2796:
URL: https://github.com/apache/drill/pull/2796#issuecomment-1526729702

   @jnturton Do we want to backport this to the next stable branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] DRILL-8426: Fix endless retrying zk set data for a large query (drill)

Posted by "jnturton (via GitHub)" <gi...@apache.org>.
jnturton merged PR #2796:
URL: https://github.com/apache/drill/pull/2796


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org