You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/09/27 02:00:20 UTC

Apache Pinot Daily Email Digest (2020-09-26)

<h3><u>#general</u></h3><br><strong>@aaplseeds: </strong>@aaplseeds has joined the channel<br><strong>@jahaines82: </strong>@jahaines82 has joined the channel<br><strong>@jahaines82: </strong>Does Pinot offer any support for ldap security?<br><h3><u>#random</u></h3><br><strong>@aaplseeds: </strong>@aaplseeds has joined the channel<br><strong>@jahaines82: </strong>@jahaines82 has joined the channel<br><h3><u>#lp-pinot-poc</u></h3><br><strong>@andrew: </strong>i don’t see anything crazy in the GC logs, e.g.

<br><strong>@andrew: </strong>```2020-09-26T21:15:09.554+0000: 1122.824: Application time: 11.0016086 seconds
2020-09-26T21:15:09.555+0000: 1122.824: Total time for which application threads were stopped: 0.0002695 seconds, Stopping threads took: 0.0000686 seconds
2020-09-26T21:15:16.822+0000: 1130.092: Application time: 7.2677277 seconds
2020-09-26T21:15:16.823+0000: 1130.092: [GC pause (G1 Evacuation Pause) (young), 0.0044037 secs]
   [Parallel Time: 2.0 ms, GC Workers: 11]
      [GC Worker Start (ms): Min: 1130092.6, Avg: 1130092.8, Max: 1130093.2, Diff: 0.6]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 1.8, Diff: 1.8, Sum: 6.1]
      [Update RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 1.5]
         [Processed Buffers: Min: 0, Avg: 3.5, Max: 12, Diff: 12, Sum: 38]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
      [Object Copy (ms): Min: 0.0, Avg: 0.8, Max: 1.1, Diff: 1.1, Sum: 8.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
         [Termination Attempts: Min: 1, Avg: 1.8, Max: 3, Diff: 2, Sum: 20]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 1.2, Avg: 1.6, Max: 1.8, Diff: 0.6, Sum: 17.4]
      [GC Worker End (ms): Min: 1130094.4, Avg: 1130094.4, Max: 1130094.4, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.6 ms]
   [Other: 1.8 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.9 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.6 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 248.0M(248.0M)-&gt;0.0B(248.0M) Survivors: 8192.0K-&gt;8192.0K Heap: 447.2M(512.0M)-&gt;200.5M(512.0M)]
 [Times: user=0.02 sys=0.01, real=0.00 secs] 
2020-09-26T21:15:16.827+0000: 1130.097: Total time for which application threads were stopped: 0.0050380 seconds, Stopping threads took: 0.0000944 seconds
2020-09-26T21:15:16.828+0000: 1130.097: Application time: 0.0002197 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Total time for which application threads were stopped: 0.0004030 seconds, Stopping threads took: 0.0001157 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Application time: 0.0001401 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Total time for which application threads were stopped: 0.0003167 seconds, Stopping threads took: 0.0001029 seconds
2020-09-26T21:15:16.828+0000: 1130.098: Application time: 0.0001051 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Total time for which application threads were stopped: 0.0002839 seconds, Stopping threads took: 0.0001103 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Application time: 0.0001181 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Total time for which application threads were stopped: 0.0002497 seconds, Stopping threads took: 0.0000885 seconds
2020-09-26T21:15:16.829+0000: 1130.099: Application time: 0.0000859 seconds
2020-09-26T21:15:16.830+0000: 1130.100: Total time for which application threads were stopped: 0.0010712 seconds, Stopping threads took: 0.0000713 seconds
2020-09-26T21:15:16.830+0000: 1130.100: Application time: 0.0000697 seconds
2020-09-26T21:15:16.831+0000: 1130.100: Total time for which application threads were stopped: 0.0002133 seconds, Stopping threads took: 0.0000834 seconds
2020-09-26T21:15:16.831+0000: 1130.101: Application time: 0.0000610 seconds
2020-09-26T21:15:16.831+0000: 1130.101: Total time for which application threads were stopped: 0.0002142 seconds, Stopping threads took: 0.0000758 seconds
2020-09-26T21:15:16.970+0000: 1130.240: Application time: 0.1391114 seconds
2020-09-26T21:15:16.970+0000: 1130.240: Total time for which application threads were stopped: 0.0003134 seconds, Stopping threads took: 0.0000673 seconds
2020-09-26T21:15:16.970+0000: 1130.240: Application time: 0.0001312 seconds
2020-09-26T21:15:16.971+0000: 1130.240: Total time for which application threads were stopped: 0.0001755 seconds, Stopping threads took: 0.0000426 seconds
2020-09-26T21:15:18.552+0000: 1131.822: Application time: 1.5812784 seconds
2020-09-26T21:15:18.552+0000: 1131.822: Total time for which application threads were stopped: 0.0003452 seconds, Stopping threads took: 0.0000716 seconds
2020-09-26T21:15:18.552+0000: 1131.822: Application time: 0.0001429 seconds
2020-09-26T21:15:18.553+0000: 1131.822: Total time for which application threads were stopped: 0.0001542 seconds, Stopping threads took: 0.0000333 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Application time: 0.0001986 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Total time for which application threads were stopped: 0.0001635 seconds, Stopping threads took: 0.0000348 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Application time: 0.0001008 seconds
2020-09-26T21:15:18.553+0000: 1131.823: Total time for which application threads were stopped: 0.0001962 seconds, Stopping threads took: 0.0000354 seconds
2020-09-26T21:15:36.964+0000: 1150.234: Application time: 18.4112311 seconds
2020-09-26T21:15:36.965+0000: 1150.235: Total time for which application threads were stopped: 0.0004029 seconds, Stopping threads took: 0.0000877 seconds
2020-09-26T21:15:36.965+0000: 1150.235: Application time: 0.0001418 seconds
2020-09-26T21:15:36.965+0000: 1150.235: Total time for which application threads were stopped: 0.0002371 seconds, Stopping threads took: 0.0000314 seconds
2020-09-26T21:15:37.965+0000: 1151.235: Application time: 1.0001424 seconds
2020-09-26T21:15:37.966+0000: 1151.235: Total time for which application threads were stopped: 0.0002422 seconds, Stopping threads took: 0.0000651 seconds
2020-09-26T21:15:38.552+0000: 1151.822: Application time: 0.5863695 seconds
2020-09-26T21:15:38.552+0000: 1151.822: Total time for which application threads were stopped: 0.0004661 seconds, Stopping threads took: 0.0001339 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Application time: 0.0002154 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Total time for which application threads were stopped: 0.0001287 seconds, Stopping threads took: 0.0000316 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Application time: 0.0001844 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Total time for which application threads were stopped: 0.0001223 seconds, Stopping threads took: 0.0000219 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Application time: 0.0001479 seconds
2020-09-26T21:15:38.553+0000: 1151.823: Total time for which application threads were stopped: 0.0001140 seconds, Stopping threads took: 0.0000213 seconds
2020-09-26T21:15:49.555+0000: 1162.825: Application time: 11.0014105 seconds
2020-09-26T21:15:49.555+0000: 1162.825: Total time for which application threads were stopped: 0.0003067 seconds, Stopping threads took: 0.0001044 seconds
2020-09-26T21:15:56.935+0000: 1170.205: Application time: 7.3800023 seconds
2020-09-26T21:15:56.936+0000: 1170.205: [GC pause (G1 Evacuation Pause) (young), 0.0049281 secs]
   [Parallel Time: 2.1 ms, GC Workers: 11]
      [GC Worker Start (ms): Min: 1170205.5, Avg: 1170205.8, Max: 1170206.0, Diff: 0.5]
      [Ext Root Scanning (ms): Min: 0.2, Avg: 0.6, Max: 2.0, Diff: 1.8, Sum: 6.7]
      [Update RS (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.2]
         [Processed Buffers: Min: 0, Avg: 3.4, Max: 9, Diff: 9, Sum: 37]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2]
      [Object Copy (ms): Min: 0.0, Avg: 0.6, Max: 0.8, Diff: 0.8, Sum: 6.6]
      [Termination (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 3.4]
         [Termination Attempts: Min: 1, Avg: 1.6, Max: 3, Diff: 2, Sum: 18]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 1.5, Avg: 1.8, Max: 2.0, Diff: 0.5, Sum: 19.5]
      [GC Worker End (ms): Min: 1170207.5, Avg: 1170207.5, Max: 1170207.5, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.9 ms]
   [Other: 1.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.0 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.7 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 248.0M(248.0M)-&gt;0.0B(248.0M) Survivors: 8192.0K-&gt;8192.0K Heap: 448.5M(512.0M)-&gt;199.2M(512.0M)]
 [Times: user=0.02 sys=0.00, real=0.00 secs] ```<br><strong>@andrew: </strong>server config:

```  pinot-server.conf: |-
    pinot.server.netty.port=8098
    pinot.server.adminapi.port=8097
    pinot.server.instance.dataDir=/var/pinot/server/data/index
    pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
    pinot.set.instance.id.to.hostname=true
    pinot.server.instance.realtime.alloc.offheap=true```<br><strong>@g.kishore: </strong>Whats the ingestion rate<br><strong>@g.kishore: </strong>can you add
```pinot.server.instance.realtime.max.parallel.segment.builds=4```<br><strong>@andrew: </strong>right now i’m testing with 50k events/second<br><strong>@g.kishore: </strong>ok, the graph shows 100k<br><strong>@andrew: </strong>yeah, earlier i was going higher<br><strong>@andrew: </strong>ok, trying that<br><strong>@andrew: </strong>I can see the number of parallel segment builds is down from 30-ish to 4 per node. But the number of segments in the table still increases like a step function by 100 or so at a time.<br><strong>@g.kishore: </strong>can you share the table config again<br><strong>@andrew: </strong>```    {
      "tableName": "events",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "timestampMillis",
        "timeType": "MILLISECONDS",
        "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "3650",
        "segmentPushType": "APPEND",
        "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
        "schemaName": "events",
        "replication": "3",
        "replicasPerPartition": "3"
      },
      "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant",
        "tagOverrideConfig": {
          "realtimeConsuming": "DefaultTenant_REALTIME",
          "realtimeCompleted": "DefaultTenant_OFFLINE"
        }
      },
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "LowLevel",
          "stream.kafka.topic.name": "events",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.hlc.zk.connect.string": "kafka-zookeeper:2181",
          "stream.kafka.zk.broker.url": "kafka-zookeeper:2181",
          "stream.kafka.broker.list": "kafka:9092",
          "realtime.segment.flush.threshold.size": "0",
          "realtime.segment.flush.threshold.time": "1h",
          "realtime.segment.flush.desired.size": "180M",
          "realtime.segment.flush.autotune.initialRows": "100K",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
        },
        "segmentPartitionConfig": {
          "columnPartitionMap": {
            "userId": {
              "functionName": "Murmur",
              "numPartitions": 128
            }
          }
        },
        "sortedColumn": [
        ],
        "invertedIndexColumns": [
          "userId"
        ]
      },
      "routing": {
        "segmentPrunerTypes": ["partition"]
      },
      "metadata": {
        "customConfigs": {}
      }
    }```<br><strong>@andrew: </strong>P95 latency is above 100 ms<br><strong>@g.kishore: </strong>can we add userId to sorted instead of inverted<br><strong>@andrew: </strong>I can. it was sorted originally, but I thought the suggestion was to make it inverted<br><strong>@andrew: </strong><br><strong>@g.kishore: </strong>oh, I think we just wanted to compare the two. both worked for us, but sorted will be much more efficient<br><strong>@andrew: </strong>oh<br><strong>@g.kishore: </strong>anyhow, is it holding up after the "pinot.server.instance.realtime.max.parallel.segment.builds=4" change?<br><strong>@andrew: </strong>it seems more stable, but the P95 latency keeps trending upwards. started at 17 ms and now it’s over 200 ms, and continues to increase<br><strong>@andrew: </strong>will try updating the index and recreating the cluster<br><strong>@g.kishore: </strong>ok<br>