You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/04 14:04:40 UTC
[GitHub] [pulsar] xiaotongwang1 opened a new issue #9484: pulsar performance help
xiaotongwang1 opened a new issue #9484:
URL: https://github.com/apache/pulsar/issues/9484
### VM Server config
1、9 broker with 16C 128G
2、9 bookie with 16C128G and 9 SSD disk * 500G data1,data2,data3 for journal,and data4、data5、data6、data7、data8、data9 for ledgers
3、5 zookeeper with 8C32G
### TPS Now
1、1 topic with 500 partitions
2、1500 producer thread
3、1 subscribeName and init 50 pulsar Consumers
4、message size is 100
5、producer TPS is 457K, avg time 9 ms ,max time 500+ms
6、consumer TPS 16916 with some may error log
`2021-02-04 21:55:09,611 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50689 lookup request timedout after ms 30000
org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50689 lookup request timedout after ms 30000
at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2021-02-04 21:55:09,612 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50698 lookup request timedout after ms 30000
org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50698 lookup request timedout after ms 30000
at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2021-02-04 21:55:09,712 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms
2021-02-04 21:55:09,713 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms
2021-02-04 21:55:30,242 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50707 lookup request timedout after ms 30000
org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50707 lookup request timedout after ms 30000
at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2021-02-04 21:55:30,432 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 188 ms
2021-02-04 21:55:59,778 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50716 lookup request timedout after ms 30000
org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50716 lookup request timedout after ms 30000
at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2021-02-04 21:55:59,879 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms
`
![image](https://user-images.githubusercontent.com/6071261/106903055-7becaa00-6734-11eb-8b34-c267bba161c8.png)
### Broker JVM config
some jvm properties
-Djute.maxbuffer=10485760
-Djava.net.preferIPv4Stack=true
-Dpulsar.allocator.exit_on_oom=true
-Dio.netty.recycler.maxCapacity.default=1000
-Dio.netty.recycler.linkCapacity=1024
-Xms10g
-Xmx10g
-XX:MaxDirectMemorySize=20g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:+DoEscapeAnalysis
-XX:ParallelGCThreads=32
-XX:ConcGCThreads=32
-XX:G1NewSizePercent=50
-XX:+DisableExplicitGC
-XX:-ResizePLAB
### broker.conf
`clusterName=dmq2-performance-test
superUserRoles=101052529,pulsarAdmin
brokerClientAuthenticationParameters={"credential":"pulsarAdmin", "secret":*********","appid":"101052529","appsecret":"********"}
bookkeeperNumberOfChannelsPerBookie=64
limitPrometheusClientIps=127.0.0.1,10.31.4.61
maxMessageSize=20971520
dispatcherMaxReadSizeBytes=20971520
systemTopicEnabled=false
topicLevelPoliciesEnabled=false
zookeeperServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281
globalZookeeperServers=
configurationStoreServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281
brokerServicePort=6650
brokerServicePortTls=
webServicePort=8080
webServicePortTls=8443
bindAddress=0.0.0.0
advertisedAddress=ip
keepAliveIntervalSeconds=30
brokerDeduplicationEnabled=false
managedLedgerDefaultEnsembleSize=3
managedLedgerDefaultWriteQuorum=3
managedLedgerDefaultAckQuorum=2
managedLedgerNumWorkerThreads=8
managedLedgerNumSchedulerThreads=8
defaultRetentionTimeInMinutes=10080
defaultRetentionSizeInMB=0
failureDomainsEnabled=false
bookkeeperClientTimeoutInSeconds=30
zooKeeperSessionTimeoutMillis=30000
zooKeeperOperationTimeoutSeconds=30
zooKeeperCacheExpirySeconds=300
bookkeeperClientRackawarePolicyEnabled=true
bookkeeperClientRegionawarePolicyEnabled=false
exposeTopicLevelMetricsInPrometheus=true
exposeConsumerLevelMetricsInPrometheus=false
exposePublisherStats=true
statsUpdateFrequencyInSecs=60
statsUpdateInitialDelayInSecs=60
exposePreciseBacklogInPrometheus=false
brokerShutdownTimeoutMs=60000
skipBrokerShutdownOnOOM=false
backlogQuotaCheckEnabled=true
backlogQuotaCheckIntervalInSeconds=60
backlogQuotaDefaultLimitGB=-1
backlogQuotaDefaultRetentionPolicy=producer_exception
ttlDurationDefaultInSeconds=604800
allowAutoTopicCreation=false
allowAutoTopicCreationType=partitioned
allowAutoSubscriptionCreation=false
defaultNumPartitions=1
brokerDeleteInactiveTopicsEnabled=false
brokerDeleteInactiveTopicsFrequencySeconds=60
brokerDeleteInactiveTopicsMode=delete_when_no_subscriptions
messageExpiryCheckIntervalInMinutes=5
activeConsumerFailoverDelayTimeMillis=1000
subscriptionExpirationTimeMinutes=0
subscriptionRedeliveryTrackerEnabled=true
subscriptionExpiryCheckIntervalInMinutes=5
subscriptionKeySharedEnable=true
subscriptionKeySharedUseConsistentHashing=false
subscriptionKeySharedConsistentHashingReplicaPoints=100
brokerDeduplicationMaxNumberOfProducers=10000
brokerDeduplicationEntriesInterval=1000
brokerDeduplicationProducerInactivityTimeoutMinutes=360
defaultNumberOfNamespaceBundles=4
clientLibraryVersionCheckEnabled=false
preferLaterVersions=false
maxUnackedMessagesPerConsumer=50000
maxUnackedMessagesPerSubscription=200000
maxUnackedMessagesPerBroker=0
maxUnackedMessagesPerSubscriptionOnBrokerBlocked=0.16
topicPublisherThrottlingTickTimeMillis=10
brokerPublisherThrottlingTickTimeMillis=50
brokerPublisherThrottlingMaxMessageRate=0
brokerPublisherThrottlingMaxByteRate=0
subscribeThrottlingRatePerConsumer=0
subscribeRatePeriodPerConsumerInSecond=30
dispatchThrottlingRatePerTopicInMsg=0
dispatchThrottlingRatePerTopicInByte=0
dispatchThrottlingRatePerSubscriptionInMsg=0
dispatchThrottlingRatePerSubscriptionInByte=0
dispatchThrottlingRatePerReplicatorInMsg=0
dispatchThrottlingRatePerReplicatorInByte=0
dispatchThrottlingRateRelativeToPublishRate=false
dispatchThrottlingOnNonBacklogConsumerEnabled=true
dispatcherMaxReadBatchSize=100
dispatcherMinReadBatchSize=1
dispatcherMaxRoundRobinBatchSize=20
preciseDispatcherFlowControl=false
maxConcurrentLookupRequest=50000
maxConcurrentTopicLoadRequest=5000
maxConcurrentNonPersistentMessagePerConnection=1000
numWorkerThreadsForNonPersistentTopic=8
enablePersistentTopics=true
enableNonPersistentTopics=true
enableRunBookieTogether=false
enableRunBookieAutoRecoveryTogether=false
maxProducersPerTopic=0
maxConsumersPerTopic=0
maxConsumersPerSubscription=0
brokerServiceCompactionMonitorIntervalInSeconds=60
delayedDeliveryEnabled=true
delayedDeliveryTickTimeMillis=1000
acknowledgmentAtBatchIndexLevelEnabled=false
enableReplicatedSubscriptions=true
replicatedSubscriptionsSnapshotFrequencyMillis=1000
replicatedSubscriptionsSnapshotTimeoutSeconds=30
replicatedSubscriptionsSnapshotMaxCachedPerSubscription=10
messagePublishBufferCheckIntervalInMillis=100
retentionCheckIntervalInSeconds=120
maxNumPartitionsPerPartitionedTopic=0
zookeeperSessionExpiredPolicy=shutdown
authenticateOriginalAuthData=false
tlsEnabled=false
tlsCertRefreshCheckDurationSec=300
authenticationEnabled=true
authenticationProviders=com.huawei.dmq2.security.dmq.broker.server.AuthenticationProviderSCRAM
authenticationRefreshCheckSeconds=60
authorizationEnabled=true
authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider
authorizationAllowWildcardsMatching=false
brokerClientTlsEnabled=false
brokerClientAuthenticationPlugin=com.huawei.dmq2.security.dmq.broker.client.AuthenticationSCRAM
saslJaasClientAllowedIds=.*
saslJaasBrokerSectionName=PulsarBroker
httpMaxRequestSize=-1
bookkeeperMetadataServiceUri=
bookkeeperClientAuthenticationPlugin=com.huawei.dmq2.security.dmq.bookie.client.SASLClientProviderFactory
bookkeeperClientSpeculativeReadTimeoutInMillis=0
bookkeeperUseV2WireProtocol=true
bookkeeperClientHealthCheckEnabled=true
bookkeeperClientHealthCheckIntervalSeconds=60
bookkeeperClientHealthCheckErrorThresholdPerInterval=5
bookkeeperClientHealthCheckQuarantineTimeInSeconds=1800
bookkeeperGetBookieInfoIntervalSeconds=86400
bookkeeperGetBookieInfoRetryIntervalSeconds=60
bookkeeperClientReorderReadSequenceEnabled=false
bookkeeperEnableStickyReads=false
bookkeeperDiskWeightBasedPlacementEnabled=false
bookkeeperExplicitLacIntervalInMills=0
managedLedgerDigestType=CRC32C
managedLedgerCacheCopyEntries=false
managedLedgerCacheEvictionWatermark=0.9
managedLedgerCacheEvictionFrequency=100.0
managedLedgerCacheEvictionTimeThresholdMillis=1000
managedLedgerCursorBackloggedThreshold=1000
managedLedgerDefaultMarkDeleteRateLimit=1.0
managedLedgerMaxEntriesPerLedger=50000
managedLedgerMinLedgerRolloverTimeMinutes=10
managedLedgerMaxLedgerRolloverTimeMinutes=240
managedLedgerMaxSizePerLedgerMbytes=2048
managedLedgerOffloadDeletionLagMs=14400000
managedLedgerOffloadAutoTriggerSizeThresholdBytes=-1
managedLedgerCursorMaxEntriesPerLedger=50000
managedLedgerCursorRolloverTimeInSeconds=14400
managedLedgerMaxUnackedRangesToPersist=10000
managedLedgerMaxUnackedRangesToPersistInZooKeeper=1000
autoSkipNonRecoverableData=false
managedLedgerMetadataOperationsTimeoutSeconds=60
managedLedgerReadEntryTimeoutSeconds=0
managedLedgerAddEntryTimeoutSeconds=0
managedLedgerPrometheusStatsLatencyRolloverSeconds=60
managedLedgerTraceTaskExecution=true
managedLedgerNewEntriesCheckDelayInMillis=10
loadBalancerEnabled=true
loadBalancerReportUpdateThresholdPercentage=10
loadBalancerReportUpdateMaxIntervalMinutes=15
loadBalancerHostUsageCheckIntervalMinutes=1
loadBalancerSheddingEnabled=true
loadBalancerSheddingIntervalMinutes=1
loadBalancerSheddingGracePeriodMinutes=30
loadBalancerBrokerMaxTopics=50000
loadBalancerBrokerOverloadedThresholdPercentage=85
loadBalancerResourceQuotaUpdateIntervalMinutes=15
loadBalancerAutoBundleSplitEnabled=true
loadBalancerAutoUnloadSplitBundlesEnabled=true
loadBalancerNamespaceBundleMaxTopics=1000
loadBalancerNamespaceBundleMaxSessions=1000
loadBalancerNamespaceBundleMaxMsgRate=30000
loadBalancerNamespaceBundleMaxBandwidthMbytes=100
loadBalancerNamespaceMaximumBundles=128
loadBalancerOverrideBrokerNicSpeedGbps=
loadManagerClassName=org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl
supportedNamespaceBundleSplitAlgorithms=range_equally_divide,topic_count_equally_divide
defaultNamespaceBundleSplitAlgorithm=range_equally_divide
loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.OverloadShedder
loadBalancerBrokerThresholdShedderPercentage=10
loadBalancerHistoryResourcePercentage=0.9
loadBalancerBandwithInResourceWeight=1.0
loadBalancerBandwithOutResourceWeight=1.0
loadBalancerCPUResourceWeight=1.0
loadBalancerMemoryResourceWeight=1.0
loadBalancerDirectMemoryResourceWeight=1.0
loadBalancerBundleUnloadMinThroughputThreshold=10
replicationMetricsEnabled=true
replicationConnectionsPerBroker=16
replicationProducerQueueSize=1000
replicatorPrefix=pulsar.repl
replicatioPolicyCheckDurationSeconds=600
bootstrapNamespaces=
webSocketServiceEnabled=false
webSocketNumIoThreads=8
webSocketConnectionsPerBroker=8
webSocketSessionIdleTimeoutMillis=300000
webSocketMaxTextFrameSize=1048576
functionsWorkerEnabled=false
schemaRegistryStorageClassName=org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory
isSchemaValidationEnforced=false
managedLedgerOffloadDriver=
managedLedgerOffloadMaxThreads=2
managedLedgerOffloadPrefetchRounds=1
managedLedgerUnackedRangesOpenCacheSetEnabled=true
`
### broker ERROR OutOfDirectMemoryError
ce MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x70000000_0x80000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x00000000_0x10000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x18000000_0x20000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xa0000000_0xc0000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x90000000_0xa0000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xc0000000_0xdfffffff because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xdfffffff_0xefffffff because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x20000000_0x40000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x40000000_0x60000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128
### Bookie JVM config
-Xms20g -Xmx20g -XX:MaxDirectMemorySize=40g -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024
### Bookie config
bookiePort=3181
journalDirectory=/opt/huawei/data1/journal,/opt/huawei/data2/journal,/opt/huawei/data3/journal
ledgerDirectories=/opt/huawei/data4/ledgers,/opt/huawei/data5/ledgers,/opt/huawei/data6/ledgers,/opt/huawei/data7/ledgers,/opt/huawei/data8/ledgers,/opt/huawei/data9/ledgers
zkServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281
zkTimeout=60000
zkEnableSecurity=true
journalSyncData=true
statsProviderClass=com.huawei.dmq2.security.dmq.bookie.metrics.PrometheusMetricsProvider
prometheusStatsHttpHost=0.0.0.0
prometheusStatsHttpPort=8000
dbStorage_writeCacheMaxSizeMb=30000
dbStorage_readAheadCacheMaxSizeMb=
dbStorage_readAheadCacheBatchSize=1000
dbStorage_rocksDB_blockCacheSize=
dbStorage_rocksDB_writeBufferSizeMB=64
dbStorage_rocksDB_sstSizeInMB=64
dbStorage_rocksDB_blockSize=65536
dbStorage_rocksDB_bloomFilterBitsPerKey=10
dbStorage_rocksDB_numLevels=-1
dbStorage_rocksDB_numFilesInLevel0=4
dbStorage_rocksDB_maxSizeInLevel1MB=256
ledgerStorageClass=org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage
minUsableSizeForIndexFileCreation=1073741824
advertisedAddress=
allowLoopback=false
bookieDeathWatchInterval=1000
flushInterval=60000
useHostNameAsBookieID=false
bookieAuthProviderFactoryClass=com.huawei.dmq2.security.dmq.bookie.server.SASLBookieAuthProviderFactory
clientAuthProviderFactoryClass=com.huawei.dmq2.security.dmq.bookie.client.SASLClientProviderFactory
gcWaitTime=900000
gcOverreplicatedLedgerWaitTime=86400000
numAddWorkerThreads=0
numReadWorkerThreads=8
numHighPriorityWorkerThreads=8
maxPendingReadRequestsPerThread=2500
maxPendingAddRequestsPerThread=10000
auditorPeriodicBookieCheckInterval=86400
rereplicationEntryBatchSize=100
openLedgerRereplicationGracePeriod=30000
autoRecoveryDaemonEnabled=true
lostBookieRecoveryDelay=0
serverTcpNoDelay=true
nettyMaxFrameSizeBytes=5253120
journalMaxSizeMB=2048
journalMaxBackups=5
journalPreAllocSizeMB=16
journalWriteBufferSizeKB=64
journalRemoveFromPageCache=true
journalAdaptiveGroupWrites=true
journalMaxGroupWaitMSec=1
journalBufferedWritesThreshold=524288
numJournalCallbackThreads=8
journalAlignmentSize=4096
journalFlushWhenQueueEmpty=false
auditorPeriodicCheckInterval=604800
openFileLimit=0
pageLimit=0
zkLedgersRootPath=/ledgers
logSizeLimit=1073741824
entryLogFilePreallocationEnabled=true
flushEntrylogBytes=268435456
readBufferSizeBytes=4096
writeBufferSizeBytes=65536
compactionRate=1000
minorCompactionThreshold=0.2
minorCompactionInterval=3600
compactionMaxOutstandingRequests=100000
majorCompactionThreshold=0.5
majorCompactionInterval=86400
isThrottleByBytes=false
compactionRateByEntries=1000
compactionRateByBytes=1000000
readOnlyModeEnabled=true
diskUsageThreshold=0.95
diskCheckInterval=10000
httpServerEnabled=false
httpServerPort=8000
httpServerClass=org.apache.bookkeeper.http.vertx.VertxHttpServer
### Bookie ERROR OutOfDirectMemoryError
2021-02-04 21:47:38.066 [bookie-io-1-28] ERROR org.apache.bookkeeper.proto.BookieRequestHandler - Unhandled exception occurred in I/O thread or handler on [id: 0x6a9cf1ad, L:/10.33.141.145:3181 - R:/10.33.141.26:55134]
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 42949672956, max: 42949672960)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:755)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:731)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:394)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118)
at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:306)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1104)
at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
### Zookeeper JVM config
-Xmx1524M -Xms1524M -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15
### Zookeeper config
dataDir=/opt/huawei/data1/zookeeperdata
clientPort=2181
secureClientPort=2281
maxClientCnxns=100
tickTime=2000
initLimit=10
syncLimit=5
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
requireClientAuthScheme=sasl
jaasLoginRenew=3600000
admin.enableServer=false
quorum.auth.enableSasl=true
quorum.auth.learnerRequireSasl=true
quorum.auth.serverRequireSasl=true
quorum.auth.learner.loginContext=QuorumLearner
quorum.auth.server.loginContext=QuorumServer
quorum.cnxn.threads.size=20
4lw.commands.whitelist==stat,ruok,mntr,stat
forceSync=yes
clientPortAddress=127.0.0.1
secureClientPortAddress=10.33.141.138
server.1=10.33.141.111:2888:3888
server.2=10.33.141.45:2888:3888
server.3=10.33.141.138:2888:3888
server.4=10.33.141.149:2888:3888
server.5=10.33.141.240:2888:3888
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] xiaotongwang1 commented on issue #9484: pulsar performance help
Posted by GitBox <gi...@apache.org>.
xiaotongwang1 commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-773761385
[dmq@host-10-33-141-93 arthas]$ jmap -histo:live 20135|head -n 100
num #instances #bytes class name
----------------------------------------------
1: 2728 2908059496 [J
2: 10303 35648096 [B
3: 12900 13121224 [Ljava.lang.Object;
4: 55838 5813320 [C
5: 5226 3428256 io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
6: 1525 3106944 [D
7: 967 2387056 [Lio.netty.util.Recycler$DefaultHandle;
8: 5336 1508352 [I
9: 55382 1329168 java.lang.String
10: 5337 854992 [Ljava.util.HashMap$Node;
11: 7141 803024 java.lang.Class
12: 20049 641568 java.util.HashMap$Node
13: 12251 392032 java.util.concurrent.ConcurrentHashMap$Node
14: 5515 308840 org.apache.bookkeeper.client.LedgerFragment
15: 7630 305200 java.util.LinkedHashMap$Entry
16: 4117 296424 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
17: 9211 294752 io.netty.util.Recycler$DefaultHandle
18: 162 290336 [Lio.netty.buffer.PoolSubpage;
19: 5516 264768 java.util.HashMap
20: 8274 264768 java.util.Hashtable$Entry
21: 2137 188056 java.lang.reflect.Method
22: 2786 178304 io.netty.buffer.PoolSubpage
23: 5131 164192 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
24: 2885 161560 io.netty.channel.DefaultChannelHandlerContext
25: 9819 157104 java.lang.Object
26: 4824 154368 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache
27: 3304 132160 org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorageDataFormats$LedgerData
28: 5433 130392 java.util.jar.Attributes$Name
29: 301 106320 [Ljava.util.concurrent.ConcurrentHashMap$Node;
30: 1300 104000 org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion
31: 1518 97152 com.yahoo.sketches.quantiles.HeapDoublesSketch
32: 1519 85064 java.security.Provider$Service
33: 3364 80736 com.google.protobuf.ByteString$LiteralByteString
34: 136 77248 io.netty.util.internal.shaded.org.jctools.queues.MpscUnboundedArrayQueue
35: 3133 73544 [Ljava.lang.Class;
36: 570 72136 [Ljava.lang.String;
37: 1000 72000 java.lang.reflect.Field
38: 4457 71312 java.util.jar.Attributes
39: 2971 71304 java.security.Provider$ServiceKey
40: 1669 66760 java.util.WeakHashMap$Entry
41: 60 64688 [Ljava.util.Hashtable$Entry;
42: 27 64440 [Ljava.util.concurrent.RunnableScheduledFuture;
43: 29 62008 [Ljava.nio.ByteBuffer;
44: 859 54976 java.util.concurrent.ConcurrentHashMap
45: 680 54400 java.lang.reflect.Constructor
46: 2048 49152 io.netty.util.HashedWheelTimer$HashedWheelBucket
47: 768 49152 org.apache.bookkeeper.util.collections.ConcurrentLongLongPairHashMap$Section
48: 501 48032 [Ljava.util.WeakHashMap$Entry;
49: 122 46848 io.netty.util.concurrent.FastThreadLocalThread
50: 363 46464 io.netty.channel.epoll.EpollSocketChannel
51: 722 46208 sun.security.provider.SHA2$SHA256
52: 1366 43712 org.apache.bookkeeper.proto.PerChannelBookieClient$V3CompletionKey
53: 1300 41600 org.apache.bookkeeper.client.LedgerFragmentReplicator$2
54: 1277 40864 sun.security.util.ObjectIdentifier
55: 2458 39328 java.util.concurrent.atomic.AtomicBoolean
56: 694 38864 java.lang.invoke.MemberName
57: 1204 38528 java.util.concurrent.atomic.LongAdder
58: 931 37240 java.lang.ref.SoftReference
59: 1530 36720 java.lang.Long
60: 553 35392 java.net.URL
61: 2210 35360 java.util.concurrent.atomic.AtomicInteger
62: 873 34920 java.lang.ref.Finalizer
63: 122 34160 java.util.concurrent.atomic.Striped64$Cell
64: 518 33152 io.netty.util.Recycler$Stack
65: 690 33120 java.util.concurrent.locks.StampedLock
66: 1326 31824 java.util.ArrayList
67: 361 31768 io.netty.handler.codec.LengthFieldBasedFrameDecoder
68: 773 30920 java.math.BigInteger
69: 519 29064 java.lang.Class$ReflectionData
70: 436 27904 io.netty.channel.ChannelOutboundBuffer$Entry
71: 402 27872 [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache;
72: 659 26360 java.lang.invoke.MethodType
73: 821 26272 sun.security.util.DerInputBuffer
74: 821 26272 sun.security.util.DerValue
75: 364 26208 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl
76: 396 25344 org.apache.bookkeeper.util.collections.ConcurrentLongLongHashMap$Section
77: 450 25200 io.netty.util.Recycler$WeakOrderQueue
78: 768 24576 io.netty.handler.codec.CodecOutputList
79: 336 24192 org.apache.bookkeeper.util.collections.ConcurrentLongHashMap$Section
80: 501 24048 java.util.WeakHashMap
81: 742 23744 java.net.InetAddress$InetAddressHolder
82: 367 23488 java.security.SecureRandom
83: 244 23424 java.util.jar.JarFile$JarFileEntry
84: 364 23296 io.netty.channel.ChannelOutboundBuffer
85: 364 23296 io.netty.channel.DefaultChannelPipeline$HeadContext
86: 363 23232 io.netty.channel.epoll.EpollSocketChannelConfig
87: 725 23200 java.security.MessageDigest$Delegate
88: 552 22080 java.util.TreeMap$Entry
89: 663 21216 java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry
90: 260 20992 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
91: 863 20712 java.util.LinkedList$Node
92: 369 20664 sun.nio.cs.UTF_8$Encoder
93: 364 20384 io.netty.channel.DefaultChannelPipeline$TailContext
94: 363 20328 io.netty.channel.epoll.EpollSocketChannel$EpollSocketChannelUnsafe
95: 499 19960 java.util.concurrent.locks.StampedLock$WNode
96: 311 19904 org.apache.bookkeeper.bookie.Journal$QueueEntry
97: 821 19704 sun.security.util.DerInputStream
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] frankjkelly commented on issue #9484: pulsar performance help
Posted by GitBox <gi...@apache.org>.
frankjkelly commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-781619315
I could be misreading this but if you have `9 broker with 16C 128G` then thats `128G/9 = 14.2G` per Broker but you have configured
```
-Xmx10g
-XX:MaxDirectMemorySize=20g
```
which requires 30g (at least) per broker. Am I misunderstanding?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] xiaotongwang1 commented on issue #9484: pulsar performance help
Posted by GitBox <gi...@apache.org>.
xiaotongwang1 commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-773732329
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] codelipenghui commented on issue #9484: pulsar performance help
Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-1058893479
The issue had no activity for 30 days, mark with Stale label.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] xiaotongwang1 commented on issue #9484: pulsar performance help
Posted by GitBox <gi...@apache.org>.
xiaotongwang1 commented on issue #9484:
URL: https://github.com/apache/pulsar/issues/9484#issuecomment-773732329
@codelipenghui can you help check the config and error log ,thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org