You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/04 14:04:40 UTC

[GitHub] [pulsar] xiaotongwang1 opened a new issue #9484: pulsar performance help

xiaotongwang1 opened a new issue #9484:
URL: https://github.com/apache/pulsar/issues/9484


   ### VM Server config 
   
   1、9 broker  with 16C 128G 
   2、9 bookie  with  16C128G and  9 SSD disk * 500G data1,data2,data3 for journal,and data4、data5、data6、data7、data8、data9 for ledgers
   3、5 zookeeper with 8C32G 
   
   ### TPS Now
   
   1、1 topic with 500 partitions
   2、1500 producer thread
   3、1 subscribeName and init 50 pulsar Consumers 
   4、message size is 100 
   5、producer TPS is 457K, avg time 9 ms ,max time 500+ms
   6、consumer TPS 16916 with some may error log
   `2021-02-04 21:55:09,611 [pulsar-client-io-2-1] WARN  (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50689 lookup request timedout after ms 30000
   org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50689 lookup request timedout after ms 30000
   	at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
   	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
   2021-02-04 21:55:09,612 [pulsar-client-io-2-1] WARN  (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50698 lookup request timedout after ms 30000
   org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50698 lookup request timedout after ms 30000
   	at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
   	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
   2021-02-04 21:55:09,712 [pulsar-external-listener-3-1] WARN  (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms
   2021-02-04 21:55:09,713 [pulsar-external-listener-3-1] WARN  (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms
   2021-02-04 21:55:30,242 [pulsar-client-io-2-1] WARN  (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50707 lookup request timedout after ms 30000
   org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50707 lookup request timedout after ms 30000
   	at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
   	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
   2021-02-04 21:55:30,432 [pulsar-external-listener-3-1] WARN  (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 188 ms
   2021-02-04 21:55:59,778 [pulsar-client-io-2-1] WARN  (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50716 lookup request timedout after ms 30000
   org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50716 lookup request timedout after ms 30000
   	at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
   	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
   2021-02-04 21:55:59,879 [pulsar-external-listener-3-1] WARN  (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms
   `
   
   
   ![image](https://user-images.githubusercontent.com/6071261/106903055-7becaa00-6734-11eb-8b34-c267bba161c8.png)
   
   
   ### Broker JVM config 
   
   some jvm properties
   
   -Djute.maxbuffer=10485760 
   -Djava.net.preferIPv4Stack=true 
   -Dpulsar.allocator.exit_on_oom=true 
   -Dio.netty.recycler.maxCapacity.default=1000
   -Dio.netty.recycler.linkCapacity=1024 
   -Xms10g 
   -Xmx10g 
   -XX:MaxDirectMemorySize=20g 
   -XX:+UseG1GC 
   -XX:MaxGCPauseMillis=10 
   -XX:+ParallelRefProcEnabled 
   -XX:+UnlockExperimentalVMOptions 
   -XX:+DoEscapeAnalysis 
   -XX:ParallelGCThreads=32 
   -XX:ConcGCThreads=32 
   -XX:G1NewSizePercent=50 
   -XX:+DisableExplicitGC 
   -XX:-ResizePLAB
   
   
   
   ### broker.conf
   
   
   `clusterName=dmq2-performance-test
   superUserRoles=101052529,pulsarAdmin
   brokerClientAuthenticationParameters={"credential":"pulsarAdmin", "secret":*********","appid":"101052529","appsecret":"********"}
   bookkeeperNumberOfChannelsPerBookie=64
   limitPrometheusClientIps=127.0.0.1,10.31.4.61
   maxMessageSize=20971520
   dispatcherMaxReadSizeBytes=20971520
   systemTopicEnabled=false
   topicLevelPoliciesEnabled=false
   zookeeperServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281
   globalZookeeperServers=
   configurationStoreServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281
   brokerServicePort=6650
   brokerServicePortTls=
   webServicePort=8080
   webServicePortTls=8443
   bindAddress=0.0.0.0
   advertisedAddress=ip
   keepAliveIntervalSeconds=30
   brokerDeduplicationEnabled=false
   managedLedgerDefaultEnsembleSize=3
   managedLedgerDefaultWriteQuorum=3
   managedLedgerDefaultAckQuorum=2
   managedLedgerNumWorkerThreads=8
   managedLedgerNumSchedulerThreads=8
   defaultRetentionTimeInMinutes=10080
   defaultRetentionSizeInMB=0
   failureDomainsEnabled=false
   bookkeeperClientTimeoutInSeconds=30
   zooKeeperSessionTimeoutMillis=30000
   zooKeeperOperationTimeoutSeconds=30
   zooKeeperCacheExpirySeconds=300
   bookkeeperClientRackawarePolicyEnabled=true
   bookkeeperClientRegionawarePolicyEnabled=false
   exposeTopicLevelMetricsInPrometheus=true
   exposeConsumerLevelMetricsInPrometheus=false
   exposePublisherStats=true
   statsUpdateFrequencyInSecs=60
   statsUpdateInitialDelayInSecs=60
   exposePreciseBacklogInPrometheus=false
   brokerShutdownTimeoutMs=60000
   skipBrokerShutdownOnOOM=false
   backlogQuotaCheckEnabled=true
   backlogQuotaCheckIntervalInSeconds=60
   backlogQuotaDefaultLimitGB=-1
   backlogQuotaDefaultRetentionPolicy=producer_exception
   ttlDurationDefaultInSeconds=604800
   allowAutoTopicCreation=false
   allowAutoTopicCreationType=partitioned
   allowAutoSubscriptionCreation=false
   defaultNumPartitions=1
   brokerDeleteInactiveTopicsEnabled=false
   brokerDeleteInactiveTopicsFrequencySeconds=60
   brokerDeleteInactiveTopicsMode=delete_when_no_subscriptions
   messageExpiryCheckIntervalInMinutes=5
   activeConsumerFailoverDelayTimeMillis=1000
   subscriptionExpirationTimeMinutes=0
   subscriptionRedeliveryTrackerEnabled=true
   subscriptionExpiryCheckIntervalInMinutes=5
   subscriptionKeySharedEnable=true
   subscriptionKeySharedUseConsistentHashing=false
   subscriptionKeySharedConsistentHashingReplicaPoints=100
   brokerDeduplicationMaxNumberOfProducers=10000
   brokerDeduplicationEntriesInterval=1000
   brokerDeduplicationProducerInactivityTimeoutMinutes=360
   defaultNumberOfNamespaceBundles=4
   clientLibraryVersionCheckEnabled=false
   preferLaterVersions=false
   maxUnackedMessagesPerConsumer=50000
   maxUnackedMessagesPerSubscription=200000
   maxUnackedMessagesPerBroker=0
   maxUnackedMessagesPerSubscriptionOnBrokerBlocked=0.16
   topicPublisherThrottlingTickTimeMillis=10
   brokerPublisherThrottlingTickTimeMillis=50
   brokerPublisherThrottlingMaxMessageRate=0
   brokerPublisherThrottlingMaxByteRate=0
   subscribeThrottlingRatePerConsumer=0
   subscribeRatePeriodPerConsumerInSecond=30
   dispatchThrottlingRatePerTopicInMsg=0
   dispatchThrottlingRatePerTopicInByte=0
   dispatchThrottlingRatePerSubscriptionInMsg=0
   dispatchThrottlingRatePerSubscriptionInByte=0
   dispatchThrottlingRatePerReplicatorInMsg=0
   dispatchThrottlingRatePerReplicatorInByte=0
   dispatchThrottlingRateRelativeToPublishRate=false
   dispatchThrottlingOnNonBacklogConsumerEnabled=true
   dispatcherMaxReadBatchSize=100
   dispatcherMinReadBatchSize=1
   dispatcherMaxRoundRobinBatchSize=20
   preciseDispatcherFlowControl=false
   maxConcurrentLookupRequest=50000
   maxConcurrentTopicLoadRequest=5000
   maxConcurrentNonPersistentMessagePerConnection=1000
   numWorkerThreadsForNonPersistentTopic=8
   enablePersistentTopics=true
   enableNonPersistentTopics=true
   enableRunBookieTogether=false
   enableRunBookieAutoRecoveryTogether=false
   maxProducersPerTopic=0
   maxConsumersPerTopic=0
   maxConsumersPerSubscription=0
   brokerServiceCompactionMonitorIntervalInSeconds=60
   delayedDeliveryEnabled=true
   delayedDeliveryTickTimeMillis=1000
   acknowledgmentAtBatchIndexLevelEnabled=false
   enableReplicatedSubscriptions=true
   replicatedSubscriptionsSnapshotFrequencyMillis=1000
   replicatedSubscriptionsSnapshotTimeoutSeconds=30
   replicatedSubscriptionsSnapshotMaxCachedPerSubscription=10
   messagePublishBufferCheckIntervalInMillis=100
   retentionCheckIntervalInSeconds=120
   maxNumPartitionsPerPartitionedTopic=0
   zookeeperSessionExpiredPolicy=shutdown
   authenticateOriginalAuthData=false
   tlsEnabled=false
   tlsCertRefreshCheckDurationSec=300
   
   authenticationEnabled=true
   authenticationProviders=com.huawei.dmq2.security.dmq.broker.server.AuthenticationProviderSCRAM
   authenticationRefreshCheckSeconds=60
   authorizationEnabled=true
   authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider
   authorizationAllowWildcardsMatching=false
   brokerClientTlsEnabled=false
   brokerClientAuthenticationPlugin=com.huawei.dmq2.security.dmq.broker.client.AuthenticationSCRAM
   saslJaasClientAllowedIds=.*
   saslJaasBrokerSectionName=PulsarBroker
   httpMaxRequestSize=-1
   bookkeeperMetadataServiceUri=
   bookkeeperClientAuthenticationPlugin=com.huawei.dmq2.security.dmq.bookie.client.SASLClientProviderFactory
   bookkeeperClientSpeculativeReadTimeoutInMillis=0
   bookkeeperUseV2WireProtocol=true
   bookkeeperClientHealthCheckEnabled=true
   bookkeeperClientHealthCheckIntervalSeconds=60
   bookkeeperClientHealthCheckErrorThresholdPerInterval=5
   bookkeeperClientHealthCheckQuarantineTimeInSeconds=1800
   bookkeeperGetBookieInfoIntervalSeconds=86400
   bookkeeperGetBookieInfoRetryIntervalSeconds=60
   bookkeeperClientReorderReadSequenceEnabled=false
   
   bookkeeperEnableStickyReads=false
   bookkeeperDiskWeightBasedPlacementEnabled=false
   bookkeeperExplicitLacIntervalInMills=0
   managedLedgerDigestType=CRC32C
   managedLedgerCacheCopyEntries=false
   managedLedgerCacheEvictionWatermark=0.9
   managedLedgerCacheEvictionFrequency=100.0
   managedLedgerCacheEvictionTimeThresholdMillis=1000
   managedLedgerCursorBackloggedThreshold=1000
   managedLedgerDefaultMarkDeleteRateLimit=1.0
   managedLedgerMaxEntriesPerLedger=50000
   managedLedgerMinLedgerRolloverTimeMinutes=10
   managedLedgerMaxLedgerRolloverTimeMinutes=240
   managedLedgerMaxSizePerLedgerMbytes=2048
   managedLedgerOffloadDeletionLagMs=14400000
   managedLedgerOffloadAutoTriggerSizeThresholdBytes=-1
   managedLedgerCursorMaxEntriesPerLedger=50000
   managedLedgerCursorRolloverTimeInSeconds=14400
   managedLedgerMaxUnackedRangesToPersist=10000
   managedLedgerMaxUnackedRangesToPersistInZooKeeper=1000
   autoSkipNonRecoverableData=false
   managedLedgerMetadataOperationsTimeoutSeconds=60
   managedLedgerReadEntryTimeoutSeconds=0
   managedLedgerAddEntryTimeoutSeconds=0
   managedLedgerPrometheusStatsLatencyRolloverSeconds=60
   managedLedgerTraceTaskExecution=true
   managedLedgerNewEntriesCheckDelayInMillis=10
   loadBalancerEnabled=true
   loadBalancerReportUpdateThresholdPercentage=10
   loadBalancerReportUpdateMaxIntervalMinutes=15
   loadBalancerHostUsageCheckIntervalMinutes=1
   loadBalancerSheddingEnabled=true
   loadBalancerSheddingIntervalMinutes=1
   loadBalancerSheddingGracePeriodMinutes=30
   loadBalancerBrokerMaxTopics=50000
   loadBalancerBrokerOverloadedThresholdPercentage=85
   loadBalancerResourceQuotaUpdateIntervalMinutes=15
   loadBalancerAutoBundleSplitEnabled=true
   loadBalancerAutoUnloadSplitBundlesEnabled=true
   loadBalancerNamespaceBundleMaxTopics=1000
   loadBalancerNamespaceBundleMaxSessions=1000
   loadBalancerNamespaceBundleMaxMsgRate=30000
   loadBalancerNamespaceBundleMaxBandwidthMbytes=100
   loadBalancerNamespaceMaximumBundles=128
   loadBalancerOverrideBrokerNicSpeedGbps=
   loadManagerClassName=org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl
   supportedNamespaceBundleSplitAlgorithms=range_equally_divide,topic_count_equally_divide
   defaultNamespaceBundleSplitAlgorithm=range_equally_divide
   loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.OverloadShedder
   loadBalancerBrokerThresholdShedderPercentage=10
   loadBalancerHistoryResourcePercentage=0.9
   loadBalancerBandwithInResourceWeight=1.0
   loadBalancerBandwithOutResourceWeight=1.0
   loadBalancerCPUResourceWeight=1.0
   loadBalancerMemoryResourceWeight=1.0
   loadBalancerDirectMemoryResourceWeight=1.0
   loadBalancerBundleUnloadMinThroughputThreshold=10
   replicationMetricsEnabled=true
   replicationConnectionsPerBroker=16
   replicationProducerQueueSize=1000
   replicatorPrefix=pulsar.repl
   replicatioPolicyCheckDurationSeconds=600
   bootstrapNamespaces=
   webSocketServiceEnabled=false
   webSocketNumIoThreads=8
   webSocketConnectionsPerBroker=8
   webSocketSessionIdleTimeoutMillis=300000
   webSocketMaxTextFrameSize=1048576
   functionsWorkerEnabled=false
   schemaRegistryStorageClassName=org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory
   isSchemaValidationEnforced=false
   managedLedgerOffloadDriver=
   managedLedgerOffloadMaxThreads=2
   managedLedgerOffloadPrefetchRounds=1
   managedLedgerUnackedRangesOpenCacheSetEnabled=true
   `
   
   ### broker ERROR OutOfDirectMemoryError
   
   ce MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x70000000_0x80000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x00000000_0x10000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x18000000_0x20000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xa0000000_0xc0000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x90000000_0xa0000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xc0000000_0xdfffffff because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xdfffffff_0xefffffff because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x20000000_0x40000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN  o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x40000000_0x60000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
   
   
   ### Bookie JVM config 
   
   -Xms20g -Xmx20g -XX:MaxDirectMemorySize=40g -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024 
   
   ### Bookie  config 
   bookiePort=3181
   journalDirectory=/opt/huawei/data1/journal,/opt/huawei/data2/journal,/opt/huawei/data3/journal
   ledgerDirectories=/opt/huawei/data4/ledgers,/opt/huawei/data5/ledgers,/opt/huawei/data6/ledgers,/opt/huawei/data7/ledgers,/opt/huawei/data8/ledgers,/opt/huawei/data9/ledgers
   zkServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281
   zkTimeout=60000
   zkEnableSecurity=true
   journalSyncData=true
   statsProviderClass=com.huawei.dmq2.security.dmq.bookie.metrics.PrometheusMetricsProvider
   prometheusStatsHttpHost=0.0.0.0
   prometheusStatsHttpPort=8000
   dbStorage_writeCacheMaxSizeMb=30000
   dbStorage_readAheadCacheMaxSizeMb=
   dbStorage_readAheadCacheBatchSize=1000
   dbStorage_rocksDB_blockCacheSize=
   dbStorage_rocksDB_writeBufferSizeMB=64
   dbStorage_rocksDB_sstSizeInMB=64
   dbStorage_rocksDB_blockSize=65536
   dbStorage_rocksDB_bloomFilterBitsPerKey=10
   dbStorage_rocksDB_numLevels=-1
   dbStorage_rocksDB_numFilesInLevel0=4
   dbStorage_rocksDB_maxSizeInLevel1MB=256
   ledgerStorageClass=org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage
   minUsableSizeForIndexFileCreation=1073741824
   advertisedAddress=
   allowLoopback=false
   bookieDeathWatchInterval=1000
   flushInterval=60000
   useHostNameAsBookieID=false
   bookieAuthProviderFactoryClass=com.huawei.dmq2.security.dmq.bookie.server.SASLBookieAuthProviderFactory
   clientAuthProviderFactoryClass=com.huawei.dmq2.security.dmq.bookie.client.SASLClientProviderFactory
   gcWaitTime=900000
   gcOverreplicatedLedgerWaitTime=86400000
   numAddWorkerThreads=0
   numReadWorkerThreads=8
   numHighPriorityWorkerThreads=8
   maxPendingReadRequestsPerThread=2500
   maxPendingAddRequestsPerThread=10000
   auditorPeriodicBookieCheckInterval=86400
   rereplicationEntryBatchSize=100
   openLedgerRereplicationGracePeriod=30000
   autoRecoveryDaemonEnabled=true
   lostBookieRecoveryDelay=0
   serverTcpNoDelay=true
   nettyMaxFrameSizeBytes=5253120
   journalMaxSizeMB=2048
   journalMaxBackups=5
   journalPreAllocSizeMB=16
   journalWriteBufferSizeKB=64
   journalRemoveFromPageCache=true
   journalAdaptiveGroupWrites=true
   journalMaxGroupWaitMSec=1
   journalBufferedWritesThreshold=524288
   numJournalCallbackThreads=8
   journalAlignmentSize=4096
   journalFlushWhenQueueEmpty=false
   auditorPeriodicCheckInterval=604800
   openFileLimit=0
   pageLimit=0
   zkLedgersRootPath=/ledgers
   logSizeLimit=1073741824
   entryLogFilePreallocationEnabled=true
   flushEntrylogBytes=268435456
   readBufferSizeBytes=4096
   writeBufferSizeBytes=65536
   compactionRate=1000
   minorCompactionThreshold=0.2
   minorCompactionInterval=3600
   compactionMaxOutstandingRequests=100000
   majorCompactionThreshold=0.5
   majorCompactionInterval=86400
   isThrottleByBytes=false
   compactionRateByEntries=1000
   compactionRateByBytes=1000000
   readOnlyModeEnabled=true
   diskUsageThreshold=0.95
   diskCheckInterval=10000
   httpServerEnabled=false
   httpServerPort=8000
   httpServerClass=org.apache.bookkeeper.http.vertx.VertxHttpServer
   
   
   ### Bookie ERROR OutOfDirectMemoryError
   
   
   2021-02-04 21:47:38.066 [bookie-io-1-28] ERROR org.apache.bookkeeper.proto.BookieRequestHandler - Unhandled exception occurred in I/O thread or handler on [id: 0x6a9cf1ad, L:/10.33.141.145:3181 - R:/10.33.141.26:55134]
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 42949672956, max: 42949672960)
   	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
   	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
   	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:755)
   	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:731)
   	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247)
   	at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
   	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:394)
   	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118)
   	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:306)
   	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282)
   	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1104)
   	at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99)
   	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
   	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
   	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
   	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387)
   	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
   	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
   	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(Thread.java:748)
   
   
   ### Zookeeper JVM config 
   -Xmx1524M -Xms1524M -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 
   
   
   ### Zookeeper  config 
   
   dataDir=/opt/huawei/data1/zookeeperdata
   clientPort=2181
   secureClientPort=2281
   maxClientCnxns=100
   tickTime=2000
   initLimit=10
   syncLimit=5
   autopurge.snapRetainCount=3
   autopurge.purgeInterval=1
   
   
   authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
   requireClientAuthScheme=sasl
   jaasLoginRenew=3600000
   
   admin.enableServer=false
   
   quorum.auth.enableSasl=true
   quorum.auth.learnerRequireSasl=true
   quorum.auth.serverRequireSasl=true
   quorum.auth.learner.loginContext=QuorumLearner
   quorum.auth.server.loginContext=QuorumServer
   quorum.cnxn.threads.size=20
   
   4lw.commands.whitelist==stat,ruok,mntr,stat
   
   forceSync=yes
   
   clientPortAddress=127.0.0.1
   secureClientPortAddress=10.33.141.138
   server.1=10.33.141.111:2888:3888
   server.2=10.33.141.45:2888:3888
   server.3=10.33.141.138:2888:3888
   server.4=10.33.141.149:2888:3888
   server.5=10.33.141.240:2888:3888
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] xiaotongwang1 commented on issue #9484: pulsar performance help

Posted by GitBox <gi...@apache.org>.
xiaotongwang1 commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-773761385


   [dmq@host-10-33-141-93 arthas]$ jmap -histo:live 20135|head -n 100
   
    num     #instances         #bytes  class name
   ----------------------------------------------
      1:          2728     2908059496  [J
      2:         10303       35648096  [B
      3:         12900       13121224  [Ljava.lang.Object;
      4:         55838        5813320  [C
      5:          5226        3428256  io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
      6:          1525        3106944  [D
      7:           967        2387056  [Lio.netty.util.Recycler$DefaultHandle;
      8:          5336        1508352  [I
      9:         55382        1329168  java.lang.String
     10:          5337         854992  [Ljava.util.HashMap$Node;
     11:          7141         803024  java.lang.Class
     12:         20049         641568  java.util.HashMap$Node
     13:         12251         392032  java.util.concurrent.ConcurrentHashMap$Node
     14:          5515         308840  org.apache.bookkeeper.client.LedgerFragment
     15:          7630         305200  java.util.LinkedHashMap$Entry
     16:          4117         296424  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
     17:          9211         294752  io.netty.util.Recycler$DefaultHandle
     18:           162         290336  [Lio.netty.buffer.PoolSubpage;
     19:          5516         264768  java.util.HashMap
     20:          8274         264768  java.util.Hashtable$Entry
     21:          2137         188056  java.lang.reflect.Method
     22:          2786         178304  io.netty.buffer.PoolSubpage
     23:          5131         164192  io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
     24:          2885         161560  io.netty.channel.DefaultChannelHandlerContext
     25:          9819         157104  java.lang.Object
     26:          4824         154368  io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache
     27:          3304         132160  org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorageDataFormats$LedgerData
     28:          5433         130392  java.util.jar.Attributes$Name
     29:           301         106320  [Ljava.util.concurrent.ConcurrentHashMap$Node;
     30:          1300         104000  org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion
     31:          1518          97152  com.yahoo.sketches.quantiles.HeapDoublesSketch
     32:          1519          85064  java.security.Provider$Service
     33:          3364          80736  com.google.protobuf.ByteString$LiteralByteString
     34:           136          77248  io.netty.util.internal.shaded.org.jctools.queues.MpscUnboundedArrayQueue
     35:          3133          73544  [Ljava.lang.Class;
     36:           570          72136  [Ljava.lang.String;
     37:          1000          72000  java.lang.reflect.Field
     38:          4457          71312  java.util.jar.Attributes
     39:          2971          71304  java.security.Provider$ServiceKey
     40:          1669          66760  java.util.WeakHashMap$Entry
     41:            60          64688  [Ljava.util.Hashtable$Entry;
     42:            27          64440  [Ljava.util.concurrent.RunnableScheduledFuture;
     43:            29          62008  [Ljava.nio.ByteBuffer;
     44:           859          54976  java.util.concurrent.ConcurrentHashMap
     45:           680          54400  java.lang.reflect.Constructor
     46:          2048          49152  io.netty.util.HashedWheelTimer$HashedWheelBucket
     47:           768          49152  org.apache.bookkeeper.util.collections.ConcurrentLongLongPairHashMap$Section
     48:           501          48032  [Ljava.util.WeakHashMap$Entry;
     49:           122          46848  io.netty.util.concurrent.FastThreadLocalThread
     50:           363          46464  io.netty.channel.epoll.EpollSocketChannel
     51:           722          46208  sun.security.provider.SHA2$SHA256
     52:          1366          43712  org.apache.bookkeeper.proto.PerChannelBookieClient$V3CompletionKey
     53:          1300          41600  org.apache.bookkeeper.client.LedgerFragmentReplicator$2
     54:          1277          40864  sun.security.util.ObjectIdentifier
     55:          2458          39328  java.util.concurrent.atomic.AtomicBoolean
     56:           694          38864  java.lang.invoke.MemberName
     57:          1204          38528  java.util.concurrent.atomic.LongAdder
     58:           931          37240  java.lang.ref.SoftReference
     59:          1530          36720  java.lang.Long
     60:           553          35392  java.net.URL
     61:          2210          35360  java.util.concurrent.atomic.AtomicInteger
     62:           873          34920  java.lang.ref.Finalizer
     63:           122          34160  java.util.concurrent.atomic.Striped64$Cell
     64:           518          33152  io.netty.util.Recycler$Stack
     65:           690          33120  java.util.concurrent.locks.StampedLock
     66:          1326          31824  java.util.ArrayList
     67:           361          31768  io.netty.handler.codec.LengthFieldBasedFrameDecoder
     68:           773          30920  java.math.BigInteger
     69:           519          29064  java.lang.Class$ReflectionData
     70:           436          27904  io.netty.channel.ChannelOutboundBuffer$Entry
     71:           402          27872  [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache;
     72:           659          26360  java.lang.invoke.MethodType
     73:           821          26272  sun.security.util.DerInputBuffer
     74:           821          26272  sun.security.util.DerValue
     75:           364          26208  io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl
     76:           396          25344  org.apache.bookkeeper.util.collections.ConcurrentLongLongHashMap$Section
     77:           450          25200  io.netty.util.Recycler$WeakOrderQueue
     78:           768          24576  io.netty.handler.codec.CodecOutputList
     79:           336          24192  org.apache.bookkeeper.util.collections.ConcurrentLongHashMap$Section
     80:           501          24048  java.util.WeakHashMap
     81:           742          23744  java.net.InetAddress$InetAddressHolder
     82:           367          23488  java.security.SecureRandom
     83:           244          23424  java.util.jar.JarFile$JarFileEntry
     84:           364          23296  io.netty.channel.ChannelOutboundBuffer
     85:           364          23296  io.netty.channel.DefaultChannelPipeline$HeadContext
     86:           363          23232  io.netty.channel.epoll.EpollSocketChannelConfig
     87:           725          23200  java.security.MessageDigest$Delegate
     88:           552          22080  java.util.TreeMap$Entry
     89:           663          21216  java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry
     90:           260          20992  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
     91:           863          20712  java.util.LinkedList$Node
     92:           369          20664  sun.nio.cs.UTF_8$Encoder
     93:           364          20384  io.netty.channel.DefaultChannelPipeline$TailContext
     94:           363          20328  io.netty.channel.epoll.EpollSocketChannel$EpollSocketChannelUnsafe
     95:           499          19960  java.util.concurrent.locks.StampedLock$WNode
     96:           311          19904  org.apache.bookkeeper.bookie.Journal$QueueEntry
     97:           821          19704  sun.security.util.DerInputStream
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] frankjkelly commented on issue #9484: pulsar performance help

Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-781619315


   I could be misreading this but if you have `9 broker with 16C 128G` then thats `128G/9 = 14.2G` per Broker but you have configured
   ```
   -Xmx10g
   -XX:MaxDirectMemorySize=20g
   ```
   which requires 30g (at least) per broker. Am I misunderstanding?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] xiaotongwang1 commented on issue #9484: pulsar performance help

Posted by GitBox <gi...@apache.org>.
xiaotongwang1 commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-773732329






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9484: pulsar performance help

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-1058893479


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] xiaotongwang1 commented on issue #9484: pulsar performance help

Posted by GitBox <gi...@apache.org>.
xiaotongwang1 commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-773732329


   @codelipenghui can you help check the  config and error log ,thanks 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org