You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/09/27 02:00:20 UTC
Apache Pinot Daily Email Digest (2020-09-26)
<h3><u>#general</u></h3><br><strong>@aaplseeds: </strong>@aaplseeds has joined the channel<br><strong>@jahaines82: </strong>@jahaines82 has joined the channel<br><strong>@jahaines82: </strong>Does Pinot offer any support for ldap security?<br><h3><u>#random</u></h3><br><strong>@aaplseeds: </strong>@aaplseeds has joined the channel<br><strong>@jahaines82: </strong>@jahaines82 has joined the channel<br><h3><u>#lp-pinot-poc</u></h3><br><strong>@andrew: </strong>i don’t see anything crazy in the GC logs, e.g.
<br><strong>@andrew: </strong>```2020-09-26T21:15:09.554+0000: 1122.824: Application time: 11.0016086 seconds
2020-09-26T21:15:09.555+0000: 1122.824: Total time for which application threads were stopped: 0.0002695 seconds, Stopping threads took: 0.0000686 seconds
2020-09-26T21:15:16.822+0000: 1130.092: Application time: 7.2677277 seconds
2020-09-26T21:15:16.823+0000: 1130.092: [GC pause (G1 Evacuation Pause) (young), 0.0044037 secs]
[Parallel Time: 2.0 ms, GC Workers: 11]
[GC Worker Start (ms): Min: 1130092.6, Avg: 1130092.8, Max: 1130093.2, Diff: 0.6]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.8, Diff: 1.8, Sum: 6.1]
[Update RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 1.5]
[Processed Buffers: Min: 0, Avg: 3.5, Max: 12, Diff: 12, Sum: 38]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Object Copy (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 8.7]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Termination Attempts: Min: 1, Avg: 1.8, Max: 3, Diff: 2, Sum: 20]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 1.2, Avg: 1.6, Max: 1.8, Diff: 0.6, Sum: 17.4]
[GC Worker End (ms): Min: 1130094.4, Avg: 1130094.4, Max: 1130094.4, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.6 ms]
[Other: 1.8 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.9 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.6 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 248.0M(248.0M)->0.0B(248.0M) Survivors: 8192.0K->8192.0K Heap: 447.2M(512.0M)->200.5M(512.0M)]
[Times: user=0.02 sys=0.01, real=0.00 secs]
2020-09-26T21:15:16.827+0000: 1130.097: Total time for which application threads were stopped: 0.0050380 seconds, Stopping threads took: 0.0000944 seconds
2020-09-26T21:15:16.828+0000: 1130.097: Application time: 0.0002197 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Total time for which application threads were stopped: 0.0004030 seconds, Stopping threads took: 0.0001157 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Application time: 0.0001401 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Total time for which application threads were stopped: 0.0003167 seconds, Stopping threads took: 0.0001029 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Application time: 0.0001051 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Total time for which application threads were stopped: 0.0002839 seconds, Stopping threads took: 0.0001103 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Application time: 0.0001181 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Total time for which application threads were stopped: 0.0002497 seconds, Stopping threads took: 0.0000885 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Application time: 0.0000859 seconds
2020-09-26T21:15:16.830+0000: 1130.100: Total time for which application threads were stopped: 0.0010712 seconds, Stopping threads took: 0.0000713 seconds
2020-09-26T21:15:16.830+0000: 1130.100: Application time: 0.0000697 seconds
2020-09-26T21:15:16.831+0000: 1130.100: Total time for which application threads were stopped: 0.0002133 seconds, Stopping threads took: 0.0000834 seconds
2020-09-26T21:15:16.831+0000: 1130.101: Application time: 0.0000610 seconds
2020-09-26T21:15:16.831+0000: 1130.101: Total time for which application threads were stopped: 0.0002142 seconds, Stopping threads took: 0.0000758 seconds
2020-09-26T21:15:16.970+0000: 1130.240: Application time: 0.1391114 seconds
2020-09-26T21:15:16.970+0000: 1130.240: Total time for which application threads were stopped: 0.0003134 seconds, Stopping threads took: 0.0000673 seconds
2020-09-26T21:15:16.970+0000: 1130.240: Application time: 0.0001312 seconds
2020-09-26T21:15:16.971+0000: 1130.240: Total time for which application threads were stopped: 0.0001755 seconds, Stopping threads took: 0.0000426 seconds
2020-09-26T21:15:18.552+0000: 1131.822: Application time: 1.5812784 seconds
2020-09-26T21:15:18.552+0000: 1131.822: Total time for which application threads were stopped: 0.0003452 seconds, Stopping threads took: 0.0000716 seconds
2020-09-26T21:15:18.552+0000: 1131.822: Application time: 0.0001429 seconds
2020-09-26T21:15:18.553+0000: 1131.822: Total time for which application threads were stopped: 0.0001542 seconds, Stopping threads took: 0.0000333 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Application time: 0.0001986 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Total time for which application threads were stopped: 0.0001635 seconds, Stopping threads took: 0.0000348 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Application time: 0.0001008 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Total time for which application threads were stopped: 0.0001962 seconds, Stopping threads took: 0.0000354 seconds
2020-09-26T21:15:36.964+0000: 1150.234: Application time: 18.4112311 seconds
2020-09-26T21:15:36.965+0000: 1150.235: Total time for which application threads were stopped: 0.0004029 seconds, Stopping threads took: 0.0000877 seconds
2020-09-26T21:15:36.965+0000: 1150.235: Application time: 0.0001418 seconds
2020-09-26T21:15:36.965+0000: 1150.235: Total time for which application threads were stopped: 0.0002371 seconds, Stopping threads took: 0.0000314 seconds
2020-09-26T21:15:37.965+0000: 1151.235: Application time: 1.0001424 seconds
2020-09-26T21:15:37.966+0000: 1151.235: Total time for which application threads were stopped: 0.0002422 seconds, Stopping threads took: 0.0000651 seconds
2020-09-26T21:15:38.552+0000: 1151.822: Application time: 0.5863695 seconds
2020-09-26T21:15:38.552+0000: 1151.822: Total time for which application threads were stopped: 0.0004661 seconds, Stopping threads took: 0.0001339 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Application time: 0.0002154 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Total time for which application threads were stopped: 0.0001287 seconds, Stopping threads took: 0.0000316 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Application time: 0.0001844 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Total time for which application threads were stopped: 0.0001223 seconds, Stopping threads took: 0.0000219 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Application time: 0.0001479 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Total time for which application threads were stopped: 0.0001140 seconds, Stopping threads took: 0.0000213 seconds
2020-09-26T21:15:49.555+0000: 1162.825: Application time: 11.0014105 seconds
2020-09-26T21:15:49.555+0000: 1162.825: Total time for which application threads were stopped: 0.0003067 seconds, Stopping threads took: 0.0001044 seconds
2020-09-26T21:15:56.935+0000: 1170.205: Application time: 7.3800023 seconds
2020-09-26T21:15:56.936+0000: 1170.205: [GC pause (G1 Evacuation Pause) (young), 0.0049281 secs]
[Parallel Time: 2.1 ms, GC Workers: 11]
[GC Worker Start (ms): Min: 1170205.5, Avg: 1170205.8, Max: 1170206.0, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.2, Avg: 0.6, Max: 2.0, Diff: 1.8, Sum: 6.7]
[Update RS (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.2]
[Processed Buffers: Min: 0, Avg: 3.4, Max: 9, Diff: 9, Sum: 37]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2]
[Object Copy (ms): Min: 0.0, Avg: 0.6, Max: 0.8, Diff: 0.8, Sum: 6.6]
[Termination (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 3.4]
[Termination Attempts: Min: 1, Avg: 1.6, Max: 3, Diff: 2, Sum: 18]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 1.5, Avg: 1.8, Max: 2.0, Diff: 0.5, Sum: 19.5]
[GC Worker End (ms): Min: 1170207.5, Avg: 1170207.5, Max: 1170207.5, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.9 ms]
[Other: 1.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 248.0M(248.0M)->0.0B(248.0M) Survivors: 8192.0K->8192.0K Heap: 448.5M(512.0M)->199.2M(512.0M)]
[Times: user=0.02 sys=0.00, real=0.00 secs] ```<br><strong>@andrew: </strong>server config:
``` pinot-server.conf: |-
pinot.server.netty.port=8098
pinot.server.adminapi.port=8097
pinot.server.instance.dataDir=/var/pinot/server/data/index
pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
pinot.set.instance.id.to.hostname=true
pinot.server.instance.realtime.alloc.offheap=true```<br><strong>@g.kishore: </strong>Whats the ingestion rate<br><strong>@g.kishore: </strong>can you add
```pinot.server.instance.realtime.max.parallel.segment.builds=4```<br><strong>@andrew: </strong>right now i’m testing with 50k events/second<br><strong>@g.kishore: </strong>ok, the graph shows 100k<br><strong>@andrew: </strong>yeah, earlier i was going higher<br><strong>@andrew: </strong>ok, trying that<br><strong>@andrew: </strong>I can see the number of parallel segment builds is down from 30-ish to 4 per node. But the number of segments in the table still increases like a step function by 100 or so at a time.<br><strong>@g.kishore: </strong>can you share the table config again<br><strong>@andrew: </strong>``` {
"tableName": "events",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "timestampMillis",
"timeType": "MILLISECONDS",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "3650",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "events",
"replication": "3",
"replicasPerPartition": "3"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {
"realtimeConsuming": "DefaultTenant_REALTIME",
"realtimeCompleted": "DefaultTenant_OFFLINE"
}
},
"tableIndexConfig": {
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "LowLevel",
"stream.kafka.topic.name": "events",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.hlc.zk.connect.string": "kafka-zookeeper:2181",
"stream.kafka.zk.broker.url": "kafka-zookeeper:2181",
"stream.kafka.broker.list": "kafka:9092",
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "1h",
"realtime.segment.flush.desired.size": "180M",
"realtime.segment.flush.autotune.initialRows": "100K",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest"
},
"segmentPartitionConfig": {
"columnPartitionMap": {
"userId": {
"functionName": "Murmur",
"numPartitions": 128
}
}
},
"sortedColumn": [
],
"invertedIndexColumns": [
"userId"
]
},
"routing": {
"segmentPrunerTypes": ["partition"]
},
"metadata": {
"customConfigs": {}
}
}```<br><strong>@andrew: </strong>P95 latency is above 100 ms<br><strong>@g.kishore: </strong>can we add userId to sorted instead of inverted<br><strong>@andrew: </strong>I can. it was sorted originally, but I thought the suggestion was to make it inverted<br><strong>@andrew: </strong><br><strong>@g.kishore: </strong>oh, I think we just wanted to compare the two. both worked for us, but sorted will be much more efficient<br><strong>@andrew: </strong>oh<br><strong>@g.kishore: </strong>anyhow, is it holding up after the "pinot.server.instance.realtime.max.parallel.segment.builds=4" change?<br><strong>@andrew: </strong>it seems more stable, but the P95 latency keeps trending upwards. started at 17 ms and now it’s over 200 ms, and continues to increase<br><strong>@andrew: </strong>will try updating the index and recreating the cluster<br><strong>@g.kishore: </strong>ok<br>