You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/05/14 10:47:20 UTC

[GitHub] [pulsar] baynes opened a new issue #6961: Borderline too big messages hangs topic

baynes opened a new issue #6961:
URL: https://github.com/apache/pulsar/issues/6961


   **Describe the bug**
   
   If I send a message <= 5242781 bytes then it is handled OK.
   
   If I send a message >= 5242881 bytes then it is rejected as MessageTooBig.
   
   If I send a message between 5242782 and 5242880 bytes then the Pulsar broker starts continually using 100% of a CPU, the message does not appear on the topic and further attempts to send messages to that topic time out.
   
   **To Reproduce**
   My test code looks like this:
   ```
   import pulsar
   
   client = pulsar.Client("pulsar://localhost:6650")
   producer = client.create_producer('persistent://public/default/large')
   
   
   #for i in range(5243000,5242770,-1):
   for i in range(5242770,5243000):
       print( "%d" % i )
       try:
           producer.send( b"." * i)
           #def callback(res, msg):
               #print('Message {} published: {}'.format( msg, str(res) ) )
           #producer.send_async( b"." * i, callback )
       except Exception as e:
           print( "Failed {}".format(e))
   
   client.close()
   
   ```
   Run the above code and when it reaches the magic value the broker will keep using CPU and following sends wait and then timeout.
   
   You can swap the two loop statements to approach the problem from the other size. 
   
   I also experimented with send_async - this shows the same behavior though not so obvious and also seems to cause the broker to use unlimited memory if you keep tying to send.
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Screenshots**
   
   Output from above code:
   ```
   $ python3 tools/producer.py
   2020-05-14 11:41:22.077 INFO  ConnectionPool:85 | Created connection for pulsar://localhost:6650
   2020-05-14 11:41:22.079 INFO  ClientConnection:330 | [[::1]:34916 -> [::1]:6650] Connected to broker
   2020-05-14 11:41:22.099 INFO  HandlerBase:53 | [persistent://public/default/large, ] Getting connection from pool
   2020-05-14 11:41:22.194 INFO  ProducerImpl:151 | [persistent://public/default/large, ] Created producer on broker [[::1]:34916 -> [::1]:6650] 
   5242770
   5242771
   5242772
   5242773
   5242774
   5242775
   5242776
   5242777
   5242778
   5242779
   5242780
   5242781
   5242782
   Failed Pulsar error: TimeOut
   5242783
   ```
   I killed the process at that point.
   
   **Desktop (please complete the following information):**
    - OS: Centos 7
   
   **Additional context**
   Using Python pulsar-client 2.5.1 installed with pip3 with Python 3.6.8 and Pulsar 2.5.0 broker.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on issue #6961: Borderline too big messages hangs topic

Posted by GitBox <gi...@apache.org>.
jiazhai commented on issue #6961:
URL: https://github.com/apache/pulsar/issues/6961#issuecomment-629900774


   Hi @baynes What is the broker side log shows? 
   In broker.conf, there is an item: 
   ```
   # Max size of messages.
   maxMessageSize=5242880
   ```
   Usually you need to config it to support large size.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] baynes commented on issue #6961: Borderline too big messages hangs topic

Posted by GitBox <gi...@apache.org>.
baynes commented on issue #6961:
URL: https://github.com/apache/pulsar/issues/6961#issuecomment-638052311


   > 
   > 
   > Hi @baynes What is the broker side log shows?
   
   There is an awful lot in the log. This bit keeps repeating and looks relevant. [This is with pulsar 2.5.0]
   
   ```
   09:36:34.692 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channel connected  [id: 0x55d7b04e, L:/127.0.0.1:3181 - R:/127.0.0.1:52372]
   09:36:34.692 [bookie-io-1-1] ERROR org.apache.bookkeeper.proto.BookieRequestHandler - Unhandled exception occurred in I/O thread or handler
   io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 5242880: 5242881 - discarded
   	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:513) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:491) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.exceededFrameLength(LengthFieldBasedFrameDecoder.java:378) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:421) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:334) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281) ~[io.netty-netty-codec-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [io.netty-netty-transport-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
   09:36:34.693 [bookie-io-1-1] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x55d7b04e, L:/127.0.0.1:3181 ! R:/127.0.0.1:52372]
   09:36:34.693 [bookkeeper-io-41-7] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x7da0db02, L:/127.0.0.1:52372 ! R:/127.0.0.1:3181]
   09:36:34.693 [Thread-1] INFO  org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@4607cbe2{/metrics,null,UNAVAILABLE}
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (696, 0): Bookie handle is not available
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:127.0.0.1:3181>], fallback to choose bookie randomly from the cluster.
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:127.0.0.1:3181>], fallback to choose bookie randomly from the cluster.
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
   09:36:34.695 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=696,loopId=70cd2f85) Exception updating
   org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandomInternal(RackawareEnsemblePlacementPolicyImpl.java:772) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandom(RackawareEnsemblePlacementPolicyImpl.java:690) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:579) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:203) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:539) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:224) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:467) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:117) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:306) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1920) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:122) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:111) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1939) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1888) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:377) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.writeComplete(PerChannelBookieClient.java:2116) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.lambda$errorOut$0(PerChannelBookieClient.java:2139) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue$1.safeRun(PerChannelBookieClient.java:1638) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
   09:36:34.696 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:696, change-id:0000000001)][attempt:1] Exception changing ensemble
   org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandomInternal(RackawareEnsemblePlacementPolicyImpl.java:772) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandom(RackawareEnsemblePlacementPolicyImpl.java:690) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:579) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:203) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:539) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:224) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:467) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:117) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:306) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1920) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:122) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:111) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1939) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1888) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:377) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.writeComplete(PerChannelBookieClient.java:2116) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.lambda$errorOut$0(PerChannelBookieClient.java:2139) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue$1.safeRun(PerChannelBookieClient.java:1638) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.43.Final.jar:4.1.43.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
   09:36:34.696 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.LedgerHandle - Closing ledger 696 due to NotEnoughBookiesException: Not enough non-faulty bookies available
   09:36:34.696 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: L696 E0
   0
   
   ```
   
   
   
   > In broker.conf, there is an item:
   > 
   > ```
   > # Max size of messages.
   > maxMessageSize=5242880
   > ```
   > 
   > Usually you need to config it to support large size.
   
   The default limit is quite large enough. for our needs.  We are happy for exceptional messages to be rejected. We just don't want something exceptional taking the system out completely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org