You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/03/12 02:00:24 UTC

Apache Pinot Daily Email Digest (2022-03-11)

### _#general_

  
 **@az983wp:** @az983wp has joined the channel  
 **@ghanta.vardhan:** @ghanta.vardhan has joined the channel  
 **@kalpeshsaubhri:** @kalpeshsaubhri has joined the channel  
 **@hardik.joshi:** @hardik.joshi has joined the channel  
 **@padma:** @padma has joined the channel  

###  _#random_

  
 **@az983wp:** @az983wp has joined the channel  
 **@ghanta.vardhan:** @ghanta.vardhan has joined the channel  
 **@kalpeshsaubhri:** @kalpeshsaubhri has joined the channel  
 **@hardik.joshi:** @hardik.joshi has joined the channel  
 **@padma:** @padma has joined the channel  

###  _#troubleshooting_

  
 **@az983wp:** @az983wp has joined the channel  
 **@srishb:** I am trying to access this page given in the setup information
--  but looks like this doesnt exist anymore. Is there any other page that I
cam refer to  
**@g.kishore:**  
 **@ghanta.vardhan:** @ghanta.vardhan has joined the channel  
 **@kalpeshsaubhri:** @kalpeshsaubhri has joined the channel  
 **@prashant.pandey:** Hi team. I am facing a peculiar issue right now with
one of our realtime servers. This realtime server consumes from two tables, as
can be seen from its node data: ```{ "id": "backend_entity_view_REALTIME",
"simpleFields": { "BUCKET_SIZE": "0", "SESSION_ID": "30162be05be0043",
"STATE_MODEL_DEF": "SegmentOnlineOfflineStateModel",
"STATE_MODEL_FACTORY_NAME": "DEFAULT" }, "mapFields": {
"backend_entity_view__6__1770__20220311T1126Z": { "CURRENT_STATE":
"CONSUMING", "END_TIME": "1646997978968", "INFO": "", "PREVIOUS_STATE":
"OFFLINE", "START_TIME": "1646997978753", "TRIGGERED_BY": "*" } },
"listFields": {} } { "id": "service_call_view_REALTIME", "simpleFields": {
"BUCKET_SIZE": "0", "SESSION_ID": "30162be05be0043", "STATE_MODEL_DEF":
"SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" },
"mapFields": { "service_call_view__4__1268__20220311T1227Z": {
"CURRENT_STATE": "ONLINE", "END_TIME": "1647004152026", "INFO": "",
"PREVIOUS_STATE": "CONSUMING", "START_TIME": "1647004133127", "TRIGGERED_BY":
"*" }, "service_call_view__4__1269__20220311T1308Z": { "CURRENT_STATE":
"CONSUMING", "END_TIME": "1647004133319", "INFO": "", "PREVIOUS_STATE":
"OFFLINE", "START_TIME": "1647004133127", "TRIGGERED_BY": "*" } },
"listFields": {} } { "id": "span_event_view_1_REALTIME", "simpleFields": {
"BUCKET_SIZE": "0", "SESSION_ID": "30162be05be0043", "STATE_MODEL_DEF":
"SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" },
"mapFields": { "span_event_view_1__1__9751__20220309T1444Z": {
"CURRENT_STATE": "OFFLINE", "END_TIME": "1646837055817", "INFO": "",
"PREVIOUS_STATE": "CONSUMING", "START_TIME": "1646837055782", "TRIGGERED_BY":
"*" }, "span_event_view_1__1__9865__20220311T1302Z": { "CURRENT_STATE":
"ONLINE", "END_TIME": "1647004903102", "INFO": "", "PREVIOUS_STATE":
"CONSUMING", "START_TIME": "1647004896155", "TRIGGERED_BY": "*" },
"span_event_view_1__13__9635__20220311T1303Z": { "CURRENT_STATE": "CONSUMING",
"END_TIME": "1647003820644", "INFO": "", "PREVIOUS_STATE": "OFFLINE",
"START_TIME": "1647003820427", "TRIGGERED_BY": "*" },
"span_event_view_1__1__9866__20220311T1321Z": { "CURRENT_STATE": "CONSUMING",
"END_TIME": "1647004896393", "INFO": "", "PREVIOUS_STATE": "OFFLINE",
"START_TIME": "1647004896155", "TRIGGERED_BY": "*" } }, "listFields": {} }```
The server is consuming from all partitions of `span_event_view_1_REALTIME`
just fine, but the lag in just this partition (partition 6) of
`backend_entity_view_REALTIME` is continually increasing. I checked the
controller logs, and see a whole lot of: ```2022/03/11 13:20:53.141 WARN
[ConsumerConfig] [grizzly-http-server-0] The configuration
'stream.kafka.topic.name' was supplied but isn't a known config. 2022/03/11
13:20:53.397 WARN [TopStateHandoffReportStage] [HelixController-pipeline-
default-pinot-prod-(0ff7d49b_DEFAULT)] Event 0ff7d49b_DEFAULT : Cannot confirm
top state missing start time. Use the current system time as the start time.
2022/03/11 13:21:36.012 WARN [TopStateHandoffReportStage] [HelixController-
pipeline-default-pinot-prod-(d95795b6_DEFAULT)] Event d95795b6_DEFAULT :
Cannot confirm top state missing start time. Use the current system time as
the start time. 2022/03/11 13:21:59.914 WARN [ZkBaseDataAccessor]
[HelixController-pipeline-default-pinot-prod-(6831a128_DEFAULT)] Fail to read
record for paths: {/pinot-prod/INSTANCES/Broker_broker-1.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/b43eba4e-58fb-4fae-
bb8e-72bcd97ed0ea=-101} 2022/03/11 13:22:00.411 WARN
[TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-
prod-(ed7c9add_DEFAULT)] Event ed7c9add_DEFAULT : Cannot confirm top state
missing start time. Use the current system time as the start time. 2022/03/11
13:22:00.899 WARN [TaskGarbageCollectionStage] [TaskJobPurgeWorker-pinot-prod]
ResourceControllerDataProvider or HelixManager is null for event
7ed61473_TASK(CurrentStateChange) in cluster pinot-prod. Skip
TaskGarbageCollectionStage. 2022/03/11 13:22:12.082 ERROR [ZkBaseDataAccessor]
[grizzly-http-server-2] paths is null or empty 2022/03/11 13:23:09.694 WARN
[ZkBaseDataAccessor] [HelixController-pipeline-default-pinot-
prod-(6e6a3d4e_DEFAULT)] Fail to read record for paths: {/pinot-
prod/INSTANCES/Broker_broker-1.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/3f8e92c9-7eb4-49ca-91d1-caa9e868e071=-101}
2022/03/11 13:23:09.694 WARN [ZkBaseDataAccessor] [HelixController-pipeline-
task-pinot-prod-(6e6a3d4e_TASK)] Fail to read record for paths: {/pinot-
prod/INSTANCES/Broker_broker-1.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/3f8e92c9-7eb4-49ca-91d1-caa9e868e071=-101}
2022/03/11 13:23:10.248 WARN [TopStateHandoffReportStage] [HelixController-
pipeline-default-pinot-prod-(9f5bd9ba_DEFAULT)] Event 9f5bd9ba_DEFAULT :
Cannot confirm top state missing start time. Use the current system time as
the start time. 2022/03/11 13:23:20.224 WARN [ZkBaseDataAccessor]
[HelixController-pipeline-task-pinot-prod-(28af3294_TASK)] Fail to read record
for paths: {/pinot-prod/INSTANCES/Broker_broker-1.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/0ebed9b5-51b6-41fe-b5df-a8d69c1b717b=-101}
2022/03/11 13:23:20.224 WARN [ZkBaseDataAccessor] [HelixController-pipeline-
default-pinot-prod-(28af3294_DEFAULT)] Fail to read record for paths: {/pinot-
prod/INSTANCES/Broker_broker-1.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/0ebed9b5-51b6-41fe-b5df-a8d69c1b717b=-101}
2022/03/11 13:23:20.905 WARN [TopStateHandoffReportStage] [HelixController-
pipeline-default-pinot-prod-(b4f04a02_DEFAULT)] Event b4f04a02_DEFAULT :
Cannot confirm top state missing start time. Use the current system time as
the start time. 2022/03/11 13:25:05.373 WARN [ZkBaseDataAccessor]
[HelixController-pipeline-default-pinot-prod-(364d2339_DEFAULT)] Fail to read
record for paths: {/pinot-prod/INSTANCES/Broker_broker-0.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/aae2fb99-0322-4568-b410-88fddd305fd9=-101}
2022/03/11 13:25:05.373 WARN [ZkBaseDataAccessor] [HelixController-pipeline-
task-pinot-prod-(364d2339_TASK)] Fail to read record for paths: {/pinot-
prod/INSTANCES/Broker_broker-0.broker-
headless.pinot.svc.cluster.local_8099/MESSAGES/aae2fb99-0322-4568-b410-88fddd305fd9=-101}```
The server has only one conspicuous warning: ```2022/03/11 12:57:12.755 ERROR
[ServerSegmentCompletionProtocolHandler]
[span_event_view_1__1__9864__20220311T1243Z] Could not send request
java.net.SocketTimeoutException: Read timed out at
java.net.SocketInputStream.socketRead0(Native Method) ~[?:?] at
java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?] at
java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?] at
java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?] at
shaded.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
~[pinot-all-0.9.1-jar-with-
dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961] at
shaded.org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
~[pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1- at
java.lang.Thread.run(Thread.java:829) [?:?] 2022/03/11 12:57:12.756 ERROR
[LLRealtimeSegmentDataManager_span_event_view_1__1__9864__20220311T1243Z]
[span_event_view_1__1__9864__20220311T1243Z] Holding after response from
Controller:
{"offset":-1,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"isSplitCommitType":false,"status":"NOT_SENT"}```
The server’s resource usage is well under limits. Any idea what might be going
on here?  
**@prashant.pandey:** Other partitions of `backend_entity_view_REALTIME` are
being consumed from as usual, so doesn’t look like a problem with the kafka
topic.  
**@prashant.pandey:** Gave a rolling start to of realtime servers, issue got
resolved then. But I am afraid this will come back.  
**@mayanks:** @npawar @navi.trinity  
**@g.kishore:** It looks like network issue between that server and controller  
**@prashant.pandey:** But @g.kishore the same server is consuming from the
other topic, so lag should’ve gone up for both the tables, isn’t it?  
**@g.kishore:** Ah, am I reading right? That payload was 3gb  
**@g.kishore:**  
**@g.kishore:** One hypothesis is that the payload is too big and it timed out  
**@g.kishore:** And it did not recover from that  
**@prashant.pandey:** If the server couldn’t recover, it should’ve stopped
consuming from all partitions of all topics assigned to it, isn’t it? It was
consuming from `span_event_view_1_REALTIME` all fine but stopped consuming
from `backend_entity_view_REALTIME`.  
**@g.kishore:** Not really.. each partition has its own lifecycle  
**@g.kishore:** Is it still in that state?  
**@g.kishore:** Jstack would be useful  
**@prashant.pandey:** Ah. But then this read timeout happened for
`span_event_view_1_REALTIME` , which was working fine. I can look for similar
logs for `backend_entity_view_REALTIME` in Sumo Logic.  
**@prashant.pandey:** Gave it a rolling restart some time back, so don’t think
if I’ll be able to get it now.  
**@g.kishore:** Okay.. can you share the server and controller logs during
that period..  
**@prashant.pandey:** I am pulling them out now.  
**@prashant.pandey:** Still pulling the logs out, got this meanwhile. Looks
like it ran out of heap: `"1647001516682","11/03/2022 17:55:16.682
+0530","Exception in thread ""backend_entity_view__6__1770__20220311T1126Z""
java.lang.OutOfMemoryError: Java heap
space","{""timestamp"":1647001516682,""log"":""Exception in thread
\""backend_entity_view__6__1770__20220311T1126Z\"" java.lang.OutOfMemoryError:
Java heap
space"",""stream"":""stderr"",""kubernetes"":{""host"":""ip-10-1-136-249.ap-
south-1.compute.internal""}}","PROD","ip-10-1-136-249.ap-
south-1.compute.internal","234","k8s-prod-green-
eks","k8s/prod/pinot/server","prod-green-eks","pinot.server-
realtime-10.server","stderr","1647001516682"`  
**@mayanks:** Was there segment generation happening at the time?  
**@mayanks:** Also what’s the jvm config and cores/mem  
**@prashant.pandey:** @mayanks I couldn’t get that. The config is 64G total
mem with 10G of heap, which seems a bit less to me as I more than tripled the
segment generation threshold for both of these tables.  
**@mayanks:** What’s the segment threshold? Please use defaults  
**@prashant.pandey:** Segments were indeed getting generated. 4 segments at
17:55.  
**@mayanks:** For 64GB mem, I’d set xms=Xmx=16GB  
**@mayanks:** Yeah so that would explain  
**@mayanks:** My recommendation to use defaults  
**@prashant.pandey:** @mayanks Defaults are too low for us. Both of these
tables have high ingress traffic. Commits happen very frequently if I use
default thresholds.  
**@mayanks:** Then use RT2OFF  
**@prashant.pandey:** Sorry, what’s RT2OFF?  
**@mayanks:** Real-time to offline job  
**@prashant.pandey:** Oh yes, we have that. Every hour, it moves these
segments to offline servers.  
**@mayanks:** What’s the segment size being generated  
**@prashant.pandey:** For BEV, 2G. For SEV, 2.5G  
**@prashant.pandey:** ^ this was recommended by the RTP tool.  
**@mayanks:** Ok definitely not a good idea to generate 2G segments from real-
time servers, especially when you have multiple tables  
**@mayanks:** I am not sure if RTP can handle multiple tables on same server  
**@prashant.pandey:** Yes, we’re kinda stuck b/w very frequent commits (~8m)
and large segments sizes. The RTP tool gave us a segment size of 5.5G for 2h
worth of consumption.  
**@prashant.pandey:** Oh yes, I divided the total available mem by number of
tables as recommended earlier.  
**@mayanks:** Hmm something doesn’t add up, 5GB segments is huge even for
offline.  
**@mayanks:** @moradi.sajjad any idea RTP is suggesting 5GB segment size when
available is 10GB heap, and even 2GB is creating issues?  
**@prashant.pandey:** Our servers are also well within the total mem limits.
They’re 64G machines and total util. is under 30G.  
**@prashant.pandey:** Would you still recommend we use the defaults @mayanks?
Earlier we used them and had around 24k segments for SEV alone :smile:  
**@mayanks:** Will need more context, will dm  
**@moradi.sajjad:** To reduce the size of the RT segments, there are two
options: 1) decrease the consumption time 2) increase number of kafka
partitions. RTP gives you the segments size for different combinations of
these parameters. You should choose the parameters to have segments of size
less than 1GB (ideally between 200-500MB).  
**@moradi.sajjad:** Recently I experienced that RT segments of 1GB size take
approximately a couple of minutes to get completed before segment upload to
controller (~1 min for converting to immutable segment and ~1 min for making a
tar.gz before upload). So I'd aim for smaller segments.  
 **@mayanks:** The last warning is for partition 1, so likely not related to
partition 6 lag  
**@prashant.pandey:** Yes, true.  
 **@aaron.weiss:** Hey, just started playing with Trino as I have a need to do
subqueries / joins on Pinot tables. I'm having trouble with array fields
(singleValueField: false) in Pinot when querying through Trino. From reading
through the connector documentation, it seems to support arrays. Here is my
Pinot query that works (service is String array field): ```select service,
count(*) from immutable_unified_events group by service limit 10``` I've tried
this query in Trino using basic and passthrough syntax, but get the same error
either way. ```class java.lang.String cannot be cast to class java.util.List
``` Trino standard query: ```select service, count(*) from
pinot.default.immutable_unified_events group by service limit 10;``` Trino
passthrough query: ```select * from pinot.default."select service, count(*)
from immutable_unified_events group by service limit 10";```  
**@elon.azoulay:** Hi @aaron.weiss - we have a fix for this, I will mention
the pr for it here shortly.  
**@aaron.weiss:** thanks @elon.azoulay, so once that PR gets merged, it would
be live, or that would be a version update?  
**@elon.azoulay:** You can rebase from master once it's landed - we usually
wait until releases to rebase, but you can try it. I'll update this thread so
you can see the fix.  
**@aaron.weiss:** thanks a lot, appreciate it!  
 **@g.kishore:** @elon.azoulay any info on the above^^  
 **@elon.azoulay:** We have a fix for that - added support for array fields
and varbinary fields - let me forward the pull request.  
 **@hardik.joshi:** @hardik.joshi has joined the channel  
 **@weixiang.sun:** Does any see the following exception with the query
against upsert table? I do not see the problem with offline table. ```Caused
by: java.lang.IllegalArgumentException: The datetime zone id
'America/Los_Angeles' is not recognised at
org.joda.time.DateTimeZone.forID(DateTimeZone.java:247) ~[pinot-all-0.8.0-jar-
with-dependencies.jar:0.8.0-d53965c35d75bff2fbe92706129cac9ca563aac3] at
org.apache.pinot.common.function.scalar.DateTimeFunctions.year(DateTimeFunctions.java:335)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-d53965c35d75bff2fbe92706129cac9ca563aac3] at
jdk.internal.reflect.GeneratedMethodAccessor1636.invoke(Unknown Source) ~[?:?]
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at
org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:128)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-d53965c35d75bff2fbe92706129cac9ca563aac3] ... 19
more```  
**@xiaobing:** could you share an example query?  
**@xiaobing:** feels like this may be easy to reproduce on my side.  
**@weixiang.sun:** Thanks a lot for your help! @xiaobing The problem is that
the pinot server is in bad state  
 **@padma:** @padma has joined the channel  

###  _#presto-pinot-connector_

  
 **@francois:** @francois has joined the channel  

###  _#getting-started_

  
 **@ghanta.vardhan:** @ghanta.vardhan has joined the channel  
 **@francois:** Hi. I’m still working on a poc with trino / pinot. I’m able to
do pretty much all I want on the Pinot side. And I’m looking on a way to do
json filtering. On the pinot side I’ve found the JSON indexing pretty
efficient with a JSON MATCH. But when looking to filter on TRINO side I loose
all the power of the JSON Indexing. Any way to filter from Trino on JSON using
the power of the JSON indexing ? :confused:  
**@g.kishore:** Trino makes it really hard to push down udf's to Pinot
dynamically. Every UDF we add to Pinot, we need to change the code in pinot-
trino connector to get the power of Pinot. We are working with Trino folks to
change this and make it dynamic. For now, the only option is to add enhance
the connector. can you share the queries and performance numbers with pinot vs
via trino  
**@francois:** There is no performance comparsion as I neeed first to filter
values based on a json multival. I will maybe try to give a try to presto ans
see if I can achieve what I expect.  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org