You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/10/02 02:00:18 UTC
Apache Pinot Daily Email Digest (2020-10-01)

<h3><u>#general</u></h3><br><strong>@tirumalap: </strong>Hello Everyone,
Very quick question
I have lots of data in s3 bucket (parquet format). Can I use Pinot to retrieve or query single record base on query or condition?
Is Pinot right software to do such thing?<br><strong>@shane.fitzroy: </strong>Hi all, I work at an analytics start-up in Sydney Australia who've just landed series-A investment. And so we're now re-architecting a particular product for scale and to support more features - so going from an MVP monolith with a PostgresDB to a more distributed system.

I'm looking to understand how Pinot may fit into a new stack and would like to know from an operations perspective how demanding a Pinot deployment might be, and how we might have to scale our engineering teams to support such a deployment in production.

If the context helps, it's a multi-tenant application that orchestrates some ETL+ML data-pipelines with Spark/Databricks and the Gurobi optimiser. But essentially the output are "results" datasets (parquets on Azure data lake) comprised of 50M~100M rows and 30~40 columns. We're aiming at supporting at least 500 users spread over several customers/tenants. We expect concurrent users to be generating/experimenting with new datasets regularly throughout the day (hourly). One of the web applications will be a front-end with a UI "data-grid" where users will want to perform exploratory/interactive analysis on results, so aggregations/group-by/filtering/search/counting etc, at reasonably-low latency.

On paper, Pinot looks like a great fit, but is it overkill for us? How many engineers would it take to support a deployment for our volume of data/ingestion? Note that ZooKeeper is not an existing part of our stack yet. Sorry for the wall of text. Any advice or experience from others here would be greatly appreciated. Cheers.<br><strong>@igor.lema: </strong>Hi everyone, I’m looking to leverage Pinot in a simple Analytics use-case: allowing distinct counts, funnel analysis and anomaly detection of user click events from our App

Currently, our somewhat large company we are ingesting 100~200K events/second of 300 different (but defined) schemas , the biggest schema should have 40 columns but the majority are less than 20. In this mix, there are also at least 10% of late-events and duplicates. (2TB a day)
Currently we reach for more than 500 users querying in an exploratory/interactive fashion over this data in our own front-end. With Pinot we hope to achieve sub-minute latency.

Pinot looks the perfect fit for this use-case, since there is no need to join events, but my main doubt is how big should this infrastructure to support this volume? And how hard is going to support a deployment for this volume?

I’m planning on deploying with K8s using S3 as segment store for Pinot. I also don’t need the Offline Server or any batch ingestion job<br><strong>@sudar.parakalan: </strong>@sudar.parakalan has joined the channel<br><strong>@sudarson.pr: </strong>@sudarson.pr has joined the channel<br><h3><u>#random</u></h3><br><strong>@sudar.parakalan: </strong>@sudar.parakalan has joined the channel<br><strong>@sudarson.pr: </strong>@sudarson.pr has joined the channel<br><h3><u>#feat-text-search</u></h3><br><strong>@a_pathak: </strong>@a_pathak has joined the channel<br><h3><u>#feat-presto-connector</u></h3><br><strong>@a_pathak: </strong>@a_pathak has joined the channel<br><h3><u>#troubleshooting</u></h3><br><strong>@a_pathak: </strong>@a_pathak has joined the channel<br><strong>@pradeepgv42: </strong>Hi, I am trying to optimize a query of the format
```select colA, distinctCountHll(colb)
from table
where timestamp &gt; X
group by colA```
We added a star-tree with
dimensionsSplitOrder: [“colA”]
and aggregateFunction as DistinctCountHLL__colB

I am not seeing much query-time improvements, I am doing a comparison against aggregate by colC which is not part of star-treee and see very similar times. I see that star-tree index is getting generated.
Wondering if I am missing someething?<br><strong>@g.kishore: </strong>@pradeepgv42 output stats?<br><strong>@pradeepgv42: </strong>timeMs=8541,docs=361440597/16479407613,entries=336003488/722881194,segments(queried/processed/matched/consuming/unavailable):5229/227/227/8/0,consumingFreshnessTimeMs=1601574112264,servers=4/4,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);172.31.17.90_R=1,8536,68556,0;172.31.30.139_O=1,3,372,0;172.31.34.149_O=1,4,372,0;172.31.24.127_R=1,8470,69174,0,<br><strong>@pradeepgv42: </strong>2servres have old data, so thy don’t match anything<br><strong>@g.kishore: </strong>its still scanning a lot<br><strong>@pradeepgv42: </strong>yaeh<br><strong>@g.kishore: </strong>@jackie.jxt ^^<br><strong>@ssubrama: </strong>we dont build startree in the realtime ingestion path, right?<br><strong>@yash.agarwal: </strong>Also I think timestamp should be part of the dimensionsSplitOrder right ?<br><strong>@jackie.jxt: </strong>Yes, yash gives the answer<br><strong>@jackie.jxt: </strong>Can you try removing the filter on `timestamp` and see the latency?<br><strong>@pradeepgv42: </strong>ah it’s very big table, it killed our servers last time :slightly_smiling_face:<br><strong>@pradeepgv42: </strong>yeah I thought that could be the reason too,
 so i included a minutes as one of the dimension yesterday evening<br><strong>@pradeepgv42: </strong>let me print the stats for that<br><strong>@pradeepgv42: </strong>this is for a much shorter time period 4hours
```
timeMs=1802,docs=62235061/16481320579,entries=98184718/124470122,segments(queried/processed/matched/consuming/unavailable):5230/57/57/8/0,consumingFreshnessTimeMs=1601574542463,servers=4/4,groupLimitReached=false,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);172.31.17.90_R=1,1797,68551,0;172.31.30.139_O=0,4,372,0;172.31.34.149_O=0,4,372,0;172.31.24.127_R=1,1563,69169,0,```
<br><strong>@pradeepgv42: </strong>is there a way to verify that star-tree is being used?<br><strong>@pradeepgv42: </strong>I think baseed on the results it’s not, but not sure why<br><strong>@jackie.jxt: </strong>You included `minutes` in the star-tree config?<br><strong>@jackie.jxt: </strong>What is the query?<br><strong>@pradeepgv42: </strong>```select colA, distinctCountHll(colb)
from table
where epochMinutes &gt; X
group by colA```
<br><strong>@pradeepgv42: </strong>epochMinutes is our minutes column<br><strong>@jackie.jxt: </strong>`dimensionsSplitOrder: ["colA", "epochMinutes"]`?<br><strong>@pradeepgv42: </strong>yup<br><strong>@jackie.jxt: </strong>Then it should be solved by star-tree. Did this table have star-tree before you making this change?<br><strong>@pradeepgv42: </strong>yes there was one already present<br><strong>@pradeepgv42: </strong>it contains dimensionsSplitOrder: [“colA”, “epochMinutes”] + some more columns but aggregate function is just a count<br><strong>@jackie.jxt: </strong>Then reloading won't help in that case..<br><strong>@pradeepgv42: </strong>yeah that’s why I am just looking at the newer data<br><strong>@jackie.jxt: </strong>Can you please check the metadata for the queried segments and see if the new star-tree is generated?<br><strong>@pradeepgv42: </strong>yup, I pinged you in personal message the metadata<br><strong>@pradeepgv42: </strong>could it be because colA and colB are present in both the star-tree indicies?<br><strong>@pradeepgv42: </strong>we most likely figured out the issue, thanks a lot @jackie.jxt
we have a hybrid table and timestamp column gets auto added to the filter and star-tree is not used in query execution<br><h3><u>#lp-pinot-poc</u></h3><br><strong>@fx19880617: </strong><br><strong>@fx19880617: </strong>here I tried to tune kafka to 32 partitions. pinot segment to 400M<br><strong>@fx19880617: </strong>And from 23:00 I changed the gc to `g1gc`<br><strong>@fx19880617: </strong>also I changed `pinot.server.instance.realtime.max.parallel.segment.builds=2`<br><strong>@fx19880617: </strong>seems p99 will increase when consuming half size of the segment  and then drop once all segments are flushed<br><strong>@fx19880617: </strong>p999 is not shooting up during segment build after I changed the gc to `g1gc`<br><strong>@fx19880617: </strong>this is my current jvmOpts:
```-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=80:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml
-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime
-Xloggc:/opt/pinot/gc-pinot-server.log -Dlog4j2.configurationFile=/opt/pinot/conf/pinot-server-log4j2.xml```<br><strong>@mayanks: </strong>I think Xms should be same as Xmx <br><strong>@fx19880617: </strong>I can make that change as well<br><strong>@fx19880617: </strong>here are some slow queries:<br><strong>@fx19880617: </strong>```2020/10/01 18:46:07.776 INFO [QueryScheduler] [pqr-6] Processed requestId=8989635,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=490,totalExecMs=0,resSerMs=0,totalTimeMs=490,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-2.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=3,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.777 INFO [QueryScheduler] [pqr-0] Processed requestId=8979151,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=520,totalExecMs=1,resSerMs=0,totalTimeMs=521,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.777 INFO [QueryScheduler] [pqr-8] Processed requestId=8980815,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=535,totalExecMs=1,resSerMs=0,totalTimeMs=536,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.777 INFO [QueryScheduler] [pqr-2] Processed requestId=8993187,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=543,totalExecMs=0,resSerMs=0,totalTimeMs=543,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-3.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=4,scanPostFilter=0,sched=fcfs
2020/10/01 18:46:07.789 INFO [QueryScheduler] [pqr-3] Processed requestId=8999077,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/-1,schedulerWaitMs=0,reqDeserMs=564,totalExecMs=1,resSerMs=0,totalTimeMs=565,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-4.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=2,scanPostFilter=0,sched=fcfs```
<br><strong>@fx19880617: </strong>and gc log:<br><strong>@fx19880617: </strong>```2020-10-01T16:46:05.571+0000: 39322.336: Application time: 4.9600067 seconds
2020-10-01T16:46:05.571+0000: 39322.336: [GC pause (G1 Evacuation Pause) (young), 1.0534652 secs]
   [Parallel Time: 1048.2 ms, GC Workers: 11]
      [GC Worker Start (ms): Min: 39322336.8, Avg: 39322336.9, Max: 39322337.1, Diff: 0.3]
      [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 1.4, Diff: 0.8, Sum: 8.3]
      [Update RS (ms): Min: 24.1, Avg: 24.6, Max: 25.3, Diff: 1.2, Sum: 270.2]
         [Processed Buffers: Min: 14, Avg: 20.3, Max: 32, Diff: 18, Sum: 223]
      [Scan RS (ms): Min: 980.2, Avg: 980.8, Max: 981.0, Diff: 0.8, Sum: 10788.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [Object Copy (ms): Min: 41.3, Avg: 41.4, Max: 41.5, Diff: 0.2, Sum: 455.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 18.5, Max: 23, Diff: 22, Sum: 204]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 0.8]
      [GC Worker Total (ms): Min: 1047.5, Avg: 1047.7, Max: 1047.8, Diff: 0.3, Sum: 11524.3]
      [GC Worker End (ms): Min: 39323384.5, Avg: 39323384.6, Max: 39323384.7, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 4.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.4 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 1.5 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.8 ms]
   [Eden: 744.0M(744.0M)-&gt;0.0B(744.0M) Survivors: 73728.0K-&gt;73728.0K Heap: 7260.6M(16384.0M)-&gt;6525.2M(16384.0M)]
 [Times: user=11.55 sys=0.00, real=1.05 secs]
2020-10-01T16:46:06.625+0000: 39323.390: Total time for which application threads were stopped: 1.0542157 seconds, Stopping threads took: 0.0001375 seconds```
<br><strong>@fx19880617: </strong>not sure why `reqDeserMs=490` took this long<br><strong>@mayanks: </strong>P99 increasing with num in-memory rows tells me the query still fans out to &gt; 1 RT nodes <br><strong>@fx19880617: </strong>```2020/10/01 18:27:44.803 INFO [QueryScheduler] [pqr-6] Processed requestId=8781331,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/1,schedulerWaitMs=10,reqDeserMs=0,totalExecMs=1,resSerMs=0,totalTimeMs=11,minConsumingFreshnessMs=1601576864792,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=16,scanPostFilter=0,sched=fcfs
2020/10/01 18:27:44.804 INFO [QueryScheduler] [pqr-10] Processed requestId=8802075,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/1/1,schedulerWaitMs=8,reqDeserMs=1,totalExecMs=2,resSerMs=0,totalTimeMs=11,minConsumingFreshnessMs=1601576864802,broker=Broker_pinot-broker-4.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=1,scanInFilter=12,scanPostFilter=0,sched=fcfs
2020/10/01 18:27:48.020 INFO [QueryScheduler] [pqr-6] Processed requestId=8781573,table=events_REALTIME,segments(queried/processed/matched/consuming)=1/1/0/1,schedulerWaitMs=1,reqDeserMs=750,totalExecMs=2,resSerMs=0,totalTimeMs=753,minConsumingFreshnessMs=1601576648403,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=0,scanInFilter=0,scanPostFilter=0,sched=fcfs```
<br><strong>@fx19880617: </strong>some queries take more time on schedulerWaitMs<br><strong>@mayanks: </strong>How many RT servers queried<br><strong>@fx19880617: </strong>```2020/10/01 19:06:31.921 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-570] requestId=9289631,table=events_REALTIME,timeMs=2,docs=0/4781010,entries=30/0,segments(queried/processed/matched/consuming/unavailable):8/8/0/0/0,consumingFreshnessTimeMs=0,servers=3/3,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-3_R=0,1,336,0;pinot-server-7_R=0,2,338,0;pinot-server-6_R=0,2,337,1,query=SELECT COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = '11739104d64b7441707fe00c' AND eventName = '1050977c37f0'
2020/10/01 19:06:31.928 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-570] requestId=9289632,table=events_REALTIME,timeMs=2,docs=0/20480009,entries=25/0,segments(queried/processed/matched/consuming/unavailable):18/18/0/1/0,consumingFreshnessTimeMs=1601579191927,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=0,1,337,0;pinot-server-3_R=0,1,337,0;pinot-server-7_R=0,2,337,0;pinot-server-0_R=0,2,422,0;pinot-server-6_R=0,1,337,0,query=SELECT COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = '1857951479fd79dcb8756b28' AND eventName = '164790de7191'
2020/10/01 19:06:31.930 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-563] requestId=9289633,table=events_REALTIME,timeMs=2,docs=0/4755282,entries=20/0,segments(queried/processed/matched/consuming/unavailable):8/8/0/0/0,consumingFreshnessTimeMs=0,servers=4/4,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=0,1,337,0;pinot-server-3_R=0,1,336,0;pinot-server-7_R=0,2,337,0;pinot-server-6_R=0,2,336,0,query=SELECT COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = '132823755b93eb49f1ce2b5b' AND eventName = '12264662576e'
2020/10/01 19:06:31.934 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-563] requestId=9289634,table=events_REALTIME,timeMs=2,docs=0/4755282,entries=10/0,segments(queried/processed/matched/consuming/unavailable):8/8/0/0/0,consumingFreshnessTimeMs=0,servers=3/3,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=0,0,336,0;pinot-server-3_R=0,1,337,0;pinot-server-7_R=0,1,337,0,query=SELECT COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = '1548223777e5a95db4d5a606' AND eventName = '1209099b8c2d'
2020/10/01 19:06:31.936 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-570] requestId=9289635,table=events_REALTIME,timeMs=2,docs=0/20534651,entries=98/0,segments(queried/processed/matched/consuming/unavailable):18/18/0/1/0,consumingFreshnessTimeMs=1601579191931,servers=5/5,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs);pinot-server-5_R=1,0,336,0;pinot-server-3_R=1,1,338,0;pinot-server-4_R=1,1,423,0;pinot-server-7_R=1,1,339,0;pinot-server-6_R=1,1,338,0,query=SELECT COUNT(*) FROM events WHERE timestampMillis &gt; 0 AND userId = '10995588e3bc063feb373401' AND eventName = '141905eadb6b'```
<br><strong>@fx19880617: </strong>I think only 1 server<br><strong>@fx19880617: </strong>there are segments moved to offline servers<br><strong>@mayanks: </strong>why is cpu usage across servers so unbalanced<br><strong>@mayanks: </strong>are all servers not doing same/similar work?<br><strong>@mayanks: </strong>Yeah, so query still fans out to multiple RT nodes: `server-7_R=0,2,337,0;pinot-server-6_R=0,2,336,0`<br><strong>@mayanks: </strong>This is in line with what I have seen before. If fanout is higher, the as number of in-memory rows increase, p99 increases. This is because some of the servers in the fanout does GC<br><strong>@fx19880617: </strong>4 are consuming ,4 are serving offline segments moved from realtime<br><strong>@fx19880617: </strong>server 0/1/2/4 are consuming<br><strong>@fx19880617: </strong>3/5/6/7 are just querying<br><strong>@fx19880617: </strong><br><strong>@fx19880617: </strong>the cpu usage of 0/1/2/4 are higher<br><strong>@fx19880617: </strong><br><strong>@mayanks: </strong>Right, consumption takes more CPU<br><strong>@mayanks: </strong>Ok, this makes sense<br><strong>@mayanks: </strong>However, num docs scanned is quite different across servers too<br><strong>@mayanks: </strong>All offline should be in one range, and all RT should be in another range<br><strong>@fx19880617: </strong>then I think it actually makes sense?<br><strong>@fx19880617: </strong>most of segments are moved to OFFLINE<br>