You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/02/20 01:09:36 UTC

[GitHub] [incubator-pinot] kkrugler opened a new issue #6597: No error logged when broker becomes unresponsive due to Netty transport issue

kkrugler opened a new issue #6597:
URL: https://github.com/apache/incubator-pinot/issues/6597


   On a five server cluster with one controller & one broker, running Pinot 0.6.0, the following query caused the cluster to no longer process queries: `select distinctcount(column) from table`, where the column in question had very high cardinality (> 1B unique values out of 5B total records).
   
   There were no errors logged for the controller, broker, or 5 server processes. Once the cluster became unresponsive, a new request (e.g. `select * from table limit 20`) would be logged by the broker and the servers, but the broker logging indicated it did not think it received a response:
   
   ```
   2021/02/19 22:21:53.860 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-59] requestId=41163,table=crawldata_OFFLINE,timeMs=10000,docs=0/0,entries=0/0,segments(queried/processed/matched/consuming/unavailable):0/0/0/0/0,consumingFreshnessTimeMs=0,servers=0/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);116.202.83.208_O=0,-1,0,0;168.119.147.123_O=0,-1,0,0;168.119.147.125_O=1,-1,0,0;168.119.147.124_O=1,-1,0,0;116.202.52.154_O=1,-1,0,0,query=select * from crawldata limit 20
   ```
   
   @Jackie-Jiang indicated he thought that the very large response from servers (on the order of 200M unique strings, each 64 characters long, per server) caused an issue with the transport layer, but Netty didn't log any error:
   
   > Based on the log you posted, server side processed the second query without any issue, but broker didn't receive the response, and that's why I suspect something is broken in the transport layer
   
   > We rely on netty to transport data, maybe we hit some limitation in netty, but netty didn’t trigger the exception callback
   
   I restarted the broker process, and the cluster once again was working, which indicates that the issue was indeed due to some invalid state for the broker.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org