You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/05/28 02:48:55 UTC
Apache Pinot Daily Email Digest (2022-05-27)

### _#general_

  
 **@octchristmas:** Hi.Team. Some pino-servers fail to connect to Zookeeper
and do nothing. But the daemon process is still running. I built a cluster
with 6 pino-servers on premises. Zookeeper is 3 generations old. The cluster
has 1 offline table (1 TB) and 2 upsert realtime tables(30 GB ~). This is the
last log written by the pinot-server. The pino-server is dead and there are no
logs for several hours. ```2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-
message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not
connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress:
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not
connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress:
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not
connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress:
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.messaging.handling.HelixTask:finalCleanup:390 Error to final
clean up for message : 723f5e30-1ffa-447b-a61a-72482e5d6ab4 2022-05-27
01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.messaging.handling.HelixTask:finalCleanup:390 Error to final
clean up for message : 47521252-6e9f-49d5-aaf0-1e1169b65cda 2022-05-27
01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.messaging.handling.HelixTask:finalCleanup:390 Error to final
clean up for message : aa0ae67b-6b92-46dd-b19d-b6bda195af18 2022-05-27
01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.messaging.handling.HelixTask:call:194 Exception after
executing a message, msgId:
723f5e30-1ffa-447b-a61a-72482e5d6ab4org.apache.helix.HelixException:
HelixManager is not connected within retry timeout for cluster PinotPoC_2
org.apache.helix.HelixException: HelixManager is not connected within retry
timeout for cluster PinotPoC_2 at
org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:172) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?] at java.lang.Thread.run(Thread.java:834) [?:?] 2022-05-27 01:32:43.487
ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.messaging.handling.HelixTask:call:194 Exception after
executing a message, msgId:
aa0ae67b-6b92-46dd-b19d-b6bda195af18org.apache.helix.HelixException:
HelixManager is not connected within retry timeout for cluster PinotPoC_2
org.apache.helix.HelixException: HelixManager is not connected within retry
timeout for cluster PinotPoC_2 at
org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:172) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?] at java.lang.Thread.run(Thread.java:834) [?:?] 2022-05-27 01:32:43.487
ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.messaging.handling.HelixTask:call:194 Exception after
executing a message, msgId:
47521252-6e9f-49d5-aaf0-1e1169b65cdaorg.apache.helix.HelixException:
HelixManager is not connected within retry timeout for cluster PinotPoC_2
org.apache.helix.HelixException: HelixManager is not connected within retry
timeout for cluster PinotPoC_2 at
org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:172) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?] at java.lang.Thread.run(Thread.java:834) [?:?] 2022-05-27 01:32:43.494
WARN [HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to  is
not connected, wait for 10000ms. 2022-05-27 01:32:43.494 WARN
[HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to  is
not connected, wait for 10000ms. 2022-05-27 01:32:43.494 WARN
[HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to  is
not connected, wait for 10000ms. 2022-05-27 01:32:57.146 ERROR
[HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not
connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress:
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not
connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress:
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not
connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress:
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread]
org.apache.helix.util.StatusUpdateUtil:logMessageStatusUpdateRecord:352
Exception while logging status update org.apache.helix.HelixException:
HelixManager is not connected within retry timeout for cluster PinotPoC_2 at
org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593)
~[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.util.StatusUpdateUtil.logMessageStatusUpdateRecord(StatusUpdateUtil.java:348)
[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.util.StatusUpdateUtil.logError(StatusUpdateUtil.java:392)
[pinot-all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:195) [pinot-
all-0.10.0-jar-with-
dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-
all-0.10.0-jar-with-depe```  
**@xiangfu0:** is zk up?  
**@xiangfu0:** can you check if zk disk is full or not ?  
**@xiangfu0:** from the logs seems that pinot cannot connect to zookeeper  
**@xiangfu0:** you may need to set up retention for zookeeper  
**@octchristmas:** @xiangfu0 zookeeper is up. disk is not full. similar to
this bug.  
**@octchristmas:** @xiangfu0 Zookeeper is ok, but pino-server stays stuck.  
**@octchristmas:** @xiangfu0 This cluster started crashing after we started
collecting data into two upsert realtime tables. Wouldn't it be possible to
use two upsert realtime tables?  
**@xiangfu0:** I don’t think that’s a problem  
**@xiangfu0:** cc: @jackie.jxt in case if you got time  
**@xiangfu0:** have you tried restart all zookeeper/controller/broker/servers?  
**@octchristmas:** @xiangfu0 Yes restarted but didn't solve it.  
**@octchristmas:** @xiangfu0 Here are more logs. After about 10 minutes of
restarting the pino-server, the Zookeeper session expires and the connection
is lost. It retries but keeps failing. ```... 2022-05-27 17:45:20.357 INFO []
org.apache.helix.manager.zk.CallbackHandler:subscribeForChanges:563
Subscribing to path:/PinotPoC_2/INSTANCES/Server_pinot-poc-
server6.company.com_8001/MESSAGES took:50945 ... 2022-05-27 17:45:54.219 INFO
[CallbackHandler-AsycSubscribe-Singleton]
org.apache.helix.manager.zk.CallbackHandler:subscribeForChanges:563
Subscribing to path:/PinotPoC_2/INSTANCES/Server_pinot-poc-
server6.company.com_8001/MESSAGES took:33862 ... 2022-05-27 17:46:28.048 DEBUG
[Start a Pinot [SERVER]-SendThread()]
org.apache.zookeeper.ClientCnxn:readResponse:923 Reading reply
sessionid:0x30106a604ba001c, packet:: clientPath:null serverPath:null
finished:false header:: 57006,5 replyHeader:: 57006,17179990511,0 request::
'/PinotPoC_2/INSTANCES/Server_pinot-poc-
server6.company.com_8001/CURRENTSTATES/30106a604ba001c/mydata_card_REALTIME,#7ba202022696422203a20226d79646174615f636172645f5245414c54494d45222ca20202273696d706c654669656c647322203a207ba20202020224255434b45545f53495
... 2022-05-27 17:46:28.049 DEBUG [pool-1-thread-3]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting
for keeper state SyncConnected 2022-05-27 17:46:28.051 DEBUG [pool-1-thread-3]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State
is SyncConnected 2022-05-27 17:46:28.051 WARN [Start a Pinot
[SERVER]-SendThread()] org.apache.zookeeper.ClientCnxn:run:1190 Client session
timed out, have not heard from server in 20948ms for sessionid
0x30106a604ba001c 2022-05-27 17:46:28.051 DEBUG [HelixTaskExecutor-
message_handle_thread]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting
for keeper state SyncConnected 2022-05-27 17:46:28.051 DEBUG
[HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting
for keeper state SyncConnected 2022-05-27 17:46:28.051 DEBUG
[HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State
is SyncConnected 2022-05-27 17:46:28.051 DEBUG [HelixTaskExecutor-
message_handle_thread]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State
is SyncConnected 2022-05-27 17:46:28.051 INFO [Start a Pinot
[SERVER]-SendThread()] org.apache.zookeeper.ClientCnxn:run:1238 Client session
timed out, have not heard from server in 20948ms for sessionid
0x30106a604ba001c, closing socket connection and attempting reconnect ...
2022-05-27 17:46:44.867 WARN [Start a Pinot [SERVER]-SendThread()]
org.apache.zookeeper.ClientCnxn:onConnected:1380 Unable to reconnect to
ZooKeeper service, session 0x30106a604ba001c has expired 2022-05-27
17:46:44.867 DEBUG [Start a Pinot [SERVER]-EventThread]
org.apache.helix.manager.zk.zookeeper.ZkClient:process:641 Received event:
WatchedEvent state:Expired type:None path:null 2022-05-27 17:46:44.867 INFO
[Start a Pinot [SERVER]-EventThread]
org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836
zookeeper state changed (Expired) 2022-05-27 17:46:44.868 DEBUG [Start a Pinot
[SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:send:92
New event: ZkEvent[State changed to Expired sent to
org.apache.helix.manager.zk.ZKHelixManager@7c8fecd9] 2022-05-27 17:46:44.868
INFO [Start a Pinot [SERVER]-SendThread()]
org.apache.zookeeper.ClientCnxn:run:1236 Unable to reconnect to ZooKeeper
service, session 0x30106a604ba001c has expired, closing socket connection
2022-05-27 17:46:44.870 DEBUG [Start a Pinot [SERVER]-EventThread]
org.apache.helix.manager.zk.zookeeper.ZkConnection:reconnect:94 Creating new
ZookKeeper instance to reconnect to . 2022-05-27 17:46:49.097 INFO [Start a
Pinot [SERVER]-EventThread] org.apache.zookeeper.ZooKeeper:<init>:868
Initiating client connection, connectString= sessionTimeout=30000
watcher=org.apache.helix.manager.zk.ZkClient@4cebec82 2022-05-27 17:46:49.097
INFO [Start a Pinot [SERVER]-EventThread]
org.apache.zookeeper.ClientCnxnSocket:initProperties:237 jute.maxbuffer value
is 4194304 Bytes 2022-05-27 17:46:49.097 DEBUG []
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting
for keeper state SyncConnected ... 2022-05-27 17:46:53.314 DEBUG [Start a
Pinot [SERVER]-SendThread()]
org.apache.zookeeper.SaslServerPrincipal:getServerPrincipal:80 Canonicalized
address to  2022-05-27 17:46:53.314 INFO [Start a Pinot [SERVER]-SendThread()]
org.apache.zookeeper.ClientCnxn:logStartConnect:1112 Opening socket connection
to server . Will not attempt to authenticate using SASL (unknown error)
2022-05-27 17:46:53.315 INFO [Start a Pinot [SERVER]-SendThread()]
org.apache.zookeeper.ClientCnxn:primeConnection:959 Socket connection
established, initiating session, client: /10.62.34.74:58040, server:
2022-05-27 17:46:53.315 DEBUG [Start a Pinot [SERVER]-SendThread()]
org.apache.zookeeper.ClientCnxn:primeConnection:1027 Session establishment
request sent on  2022-05-27 17:46:53.316 DEBUG [HelixTaskExecutor-
message_handle_thread]
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting
for keeper state SyncConnected 2022-05-27 17:46:53.316 WARN
[HelixTaskExecutor-message_handle_thread]
org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to  is
not connected, wait for 10000ms. ... 2022-05-27 17:46:53.317 INFO [Start a
Pinot [SERVER]-EventThread]
org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836
zookeeper state changed (SyncConnected) 2022-05-27 17:46:53.317 DEBUG [Start a
Pinot [SERVER]-EventThread]
org.apache.helix.manager.zk.zookeeper.ZkClient:send:92 New event:
ZkEvent[State changed to SyncConnected sent to
org.apache.helix.manager.zk.ZKHelixManager@7c8fecd9] 2022-05-27 17:46:53.318
DEBUG [Start a Pinot [SERVER]-EventThread]
org.apache.helix.manager.zk.zookeeper.ZkClient:process:703 Leaving process
event 2022-05-27 17:46:53.318 DEBUG []
org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State
is SyncConnected```  
**@jackie.jxt:** @octchristmas Do you see other suspicious ERROR/WARN logs
besides the ZK disconnection? We want to check if server throws exception
after reconnecting to the ZK. You may search `zookeeper state changed` in the
log to find the ZK connection state changes  
 **@valdamarin.d:** @valdamarin.d has joined the channel  
 **@diogo.baeder:** Hi folks! I'm a Pinot user (not a developer) and have been
amazed at the support I can get from the Pinot folks in this Slack workspace,
they're awesome! I'd like to share some suggestions to other users though, to
optimize things for them, but which also might help everyone else: 1\. I
noticed that usually to try to fix issues the <#C011C9JHN7R|troubleshooting>
channel is better, I noticed the Pinot devs more active there, so that's where
I suggest users ask for help, not here; 2\. One good practice I learned
actually here in this workspace (can't remember with whom) whenever I have an
issue, was to first start with a short question, and then in the thread for
that same question, I put more lengthy information, like config files,
stacktraces etc. This is super cool to avoid taking massive amounts of space
of the channel wall - it reduces visual pollution, while maintaining the
content for the people who want to get involved with that particular
discussion.  
**@ken:** Great input, @diogo.baeder . Another recommendation I’ve got is that
if you see someone ask a question via a GitHub issue, avoid answering it
there. Instead, gently point them at this Slack workspace (and the appropriate
channel), and ask them to close the issue.  
**@xiangfu0:** Agreed. Using thread for discussion will also help make the
thread to become a topic on and easier for searching  
**@mayanks:** Slack is great for fast iteration on discussions etc. But GH
issues are great for tracking. My recommendations would be to have GH issues
for tracking, and once the discussions converge on slack, update GH issue with
summary, to get best of both worlds.  
 **@bagi.priyank:** has anyone tried integrating pinot with looker?  
**@mayanks:** Not that I am aware of, Superset seems a lot more popular for
Pinot. If you have Presto/Trino + Pinot, that could be an option.  
 **@abhinav.wagle1:** Is there a doc available which discusses differences
between Pinot & Druid from a Cloud cost persspective  
 **@jorick:** @jorick has joined the channel  
 **@justin:** @justin has joined the channel  

###  _#random_

  
 **@valdamarin.d:** @valdamarin.d has joined the channel  
 **@jorick:** @jorick has joined the channel  
 **@justin:** @justin has joined the channel  

###  _#troubleshooting_

  
 **@valdamarin.d:** @valdamarin.d has joined the channel  
 **@zotyarex:** Hey Team, How exactly should one interpret this statement ()?
> PInot currently relies on Pulsar client version 2.7.2. Users should make
sure the Pulsar broker is compatible with the this client version. Say I have
a Pulsar cluster of version  place already; then I won't be able to consume
its traffic with the most up-to-date Pinot binary? Thanks in advance for any
help!  
**@mayanks:** @kharekartik ^^  
**@kharekartik:** Hi Zsolt, Pulsar docs are unclear on the compatibility
across different versions. Hence the line was added so that users can verify
themselves with a pulsar client 2.7x if it works with their broker or not.
Give me a few moments to test it out on my machine  
**@zotyarex:** Sure, thank you!  
**@mayanks:** Let’s also add this context in the doc  
**@zotyarex:** A bit of extra context about my experiments if it helps: •
Pinot is on `0.10.0` • The manually attached (official) Pulsar plugin is on
`0.11.0` • Additionally, I set up one more plugin that has a custom decoder
class to deal with the Pulsar messages Roughly what I tried: • After sending
to Pinot (using the admin tools) the new real-time table configuration (having
a Pulsar stream configured), the connection seems to get established (seeing
logs from the Pinot Server like: ```Connected to server Starting Pulsar
consumer status recorder with config: ...``` • Also, the subscription to the
desired topic seems OK: ```Subscribed to topic on ... Successfully
getLastMessageID xy:-1``` • Then, I see something like this twice, which seems
a bit strange: ```[topic_name][Pinot_subscription_id] Closed consumer [id:
some_id, L:/localhost:port ! R:/cluster/node:port] Disconnected``` • A couple
of extra error logs then: ```Received error from server: Error when resetting
subscription: xy:-1 Failed to reset subscription: Error when resetting
subscription: xy:-1 Error consuming records from Pulsar topic
org.apache.pulsar.client.api.PulsarClientException:
java.util.concurrent.ExecutionException:
org.apache.pulsar.client.api.PulsarClientException: Failed to seek the
subscription reader-ID of the topic topic_name to the message xy:-1:-1```  
**@zotyarex:** (Oh and then this gets into an infinite loop (due to the
reconnection attempts, I presume), so I see the errors/exception stack many
times)  
 **@fb:** [Authentication - SASL_SSL] - Hy everyone, hope you all having a
good day. I am trying to create a real time table in pinot. I have the
following: • `docker-compose.yml`: with zoopkeeper and pinot broker,
controller and server • `schema.json:` containing my table schema (after
transforming from ascv as pointed out in  • `table.json:` where I am using
`streamConfigs`to pass the *confluent* authentication keys Two questions: 1\.
Is it wrong to pass the credentials in that file (please disregard security
issues because this is a very local and small test) ? 2\. I keep getting a
return 500: that says the Consumer couldnt be formed. I would really really
appreciate your help. BTW I am following  :sos:  
**@mayanks:** Hi, I’d recommend reading this doc  
**@mayanks:** @kharekartik ^^  
 **@ysuo:** Hi, could Pinot ingest non-json format Kafka stream data?
:sweat_smile:  
**@ysuo:** I’d better customize a Decoder.:joy:  
**@npawar:** Can do avro and CSV. What's your format?  
**@ysuo:** Thanks. It’s a plain string, example data is like: “event_value
source=127.0.0.1 service=event_service key1=value1 key2=value2”  
**@npawar:** you could use CSV decoder (available in master though, not in the
release yet), and provide `" "` (space) as the delimiter  
**@npawar:** cc @prashant.pandey who contributed that feature  
**@ysuo:** Thank you. I’ll have a try.  
 **@fb:** Is is possible that these are the same?: ```
"stream.kafka.decoder.prop.schema.registry.rest.url": "",
"stream.kafka.schema.registry.url": "", ```  
**@fb:** fake data, obviously, just to illustrate what I am trying to figure
out  
 **@sderegt838:** Hey folks :wave:, I'm having some issues with `spark` Batch
Ingestion job when moving from `--master local --deploy-mode client` to
`--master yarn --deploy-mode cluster` (as suggested  for production
environments). I would greatly appreciate some guidance from others who have
successfully configured this spark job. Details in thread :thread:  
**@sderegt838:** Following this , I am able to successfully `spark-submit`
locally using the following command: ```sudo spark-submit --verbose \ \--class
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ \--master
local --deploy-mode client \ \--conf spark.local.dir=/mnt \ \--conf
"spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet
-Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/conf/pinot-ingestion-job-log4j2.xml" \ \--conf
"spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-
batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-
dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-
file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-
parquet-0.11.0-SNAPSHOT-shaded.jar" \ /mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \ -jobSpecFile
/mnt/pinot/daily_channel_user_metrics_20220502.yaml```  
**@sderegt838:** The problem occurs when switching to `--master yarn`, whether
using `--deploy-mode client` or `--deploy-mode cluster`.  
**@sderegt838:** To rule out issues with `yarn` being misconfigured on my EMR
cluster, I successfully ran this example: ```spark-submit --master yarn
--deploy-mode cluster --class "org.apache.spark.examples.JavaSparkPi"
/usr/lib/spark/examples/jars/spark-examples.jar```  
**@sderegt838:** My current WIP command is this: ```sudo spark-submit
--verbose \ \--class
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ \--master
yarn --deploy-mode cluster \ \--conf spark.local.dir=/mnt \ \--conf
"spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet
-Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/conf/pinot-ingestion-job-log4j2.xml" \ \--conf
"spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-
batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-
dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-
file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-
parquet-0.11.0-SNAPSHOT-shaded.jar" \ \--conf
"spark.executor.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet
-Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/conf/pinot-ingestion-job-log4j2.xml" \ \--conf
"spark.executor.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-
batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-
dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-
file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-
parquet-0.11.0-SNAPSHOT-shaded.jar" \ \--files
/mnt/pinot/daily_channel_user_metrics_20220502.yaml \ /mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-
dependencies.jar \ -jobSpecFile daily_channel_user_metrics_20220502.yaml```
which gets stuck when trying to add executor tasks, complaining that
`ApplicationMaster` has not yet registered: ```2022/05/27 16:03:34.967 INFO
[DAGScheduler] [dag-scheduler-event-loop] Submitting 1000 missing tasks from
ResultStage 0 (ParallelCollectionRDD[0] at parallelize at
SparkSegmentGenerationJobRunner.java:237) (first 15 tasks are for partitions
Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 2022/05/27
16:03:34.968 INFO [YarnClusterScheduler] [dag-scheduler-event-loop] Adding
task set 0.0 with 1000 tasks 2022/05/27 16:03:39.866 WARN
[YarnSchedulerBackend$YarnSchedulerEndpoint] [dispatcher-event-loop-2]
Attempted to request executors before the AM has registered! 2022/05/27
16:03:39.867 WARN [ExecutorAllocationManager] [spark-dynamic-executor-
allocation] Unable to reach the cluster manager to request 11 total
executors!``` It eventually times out and fails.  
**@sderegt838:** I'm thinking it is most likely an issue with providing all
dependencies/jars, but the log messages I'm seeing have not been super
helpful. I'm not seeing any obvious error messages related to
`java.lang.ClassNotFoundException` of pinot libs. Unclear to me how the Driver
seems to be able to execute a portion of class main (lists s3 files and tries
to start tasks) yet ApplicationMaster seems to fail to boot and register
properly.  
**@xiangfu0:** The current suggest way is to copy Pinot jars to the spark
class path if you can add jars when creating a Spark cluster. As the executor
worker node may not have the corresponding jars when you submit the job  
**@xiangfu0:** Or you can build a fat jar contains the necessary dependencies,
that will also help  
**@kharekartik:** Hi Just mention all the jars in
`spark.driver.extraClassPath` with `--jars` argument as well. That will solve
the issue.  
**@kharekartik:** Also remove `spark.driver.extraJavaOptions=`  
**@sderegt838:** Thanks for the responses :bow:, will give it a try  
**@sderegt838:** When removing `spark.driver.extraJavaOptions` , I appear to
lose stdout (I think due to the log4j config file). stderr is logging this
with new attempt: ```Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.yarn.api.records.Resource.newInstance(JJII)Lorg/apache/hadoop/yarn/api/records/Resource;
at org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:153)
at
org.apache.spark.deploy.yarn.YarnRMClient.createAllocator(YarnRMClient.scala:84)
at
org.apache.spark.deploy.yarn.ApplicationMaster.createAllocator(ApplicationMaster.scala:438)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:485)
at
$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:308)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:248)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:248)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:783)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:782)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:807)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)```  
**@sderegt838:** passed in the jars like so: ```sudo spark-submit --verbose \
\--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \
\--master yarn --deploy-mode cluster \ \--conf spark.local.dir=/mnt \ \--conf
"spark.driver.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-
batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-
dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-
file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-
parquet-0.11.0-SNAPSHOT-shaded.jar" \ \--conf
"spark.executor.extraClassPath=/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-
batch-ingestion-spark-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-
dependencies.jar:/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-
file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar:/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-
parquet-0.11.0-SNAPSHOT-shaded.jar" \ \--files
/mnt/pinot/daily_channel_user_metrics_20220502.yaml \ \--jars
/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-bin/plugins-external/pinot-batch-
ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-
spark-0.11.0-SNAPSHOT-shaded.jar,/mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-file-
system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,/mnt/pinot/apache-
pinot-0.11.0-SNAPSHOT-bin/plugins/pinot-input-format/pinot-parquet/pinot-
parquet-0.11.0-SNAPSHOT-shaded.jar \ /mnt/pinot/apache-pinot-0.11.0-SNAPSHOT-
bin/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar \ -jobSpecFile
daily_channel_user_metrics_20220502.yaml```  
**@kharekartik:** Seems like HADOOP_CLASSPATH is not set.  
**@sderegt838:** Not seeing `HADOOP_CLASSPATH` in logs here, so this seems
plausible: ```YARN executor launch context: env: CLASSPATH -> ...
SPARK_YARN_CONTAINER_CORES -> ... SPARK_DIST_CLASSPATH -> ...
SPARK_YARN_STAGING_DIR -> ... SPARK_USER -> ... JAVA_HOME -> ...
SPARK_PUBLIC_DNS -> ...```  
**@sderegt838:** I do see that var being set in `/etc/hadoop/conf/hadoop-
env.sh` though.  
**@sderegt838:** Running `hadoop classpath` from emr master node (where I am
submitting the application from) is giving me this result: ```$ hadoop
classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-
hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-
yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-
mapreduce/lib/*:/usr/lib/hadoop-
mapreduce/.//*::/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/hadoop-
lzo/lib/*:/usr/share/aws/aws-java-
sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/ddb/lib/emr-
ddb-hadoop.jar:/usr/share/aws/emr/goodies/lib/emr-hadoop-
goodies.jar:/usr/share/aws/emr/kinesis/lib/emr-kinesis-
hadoop.jar:/usr/share/aws/emr/cloudwatch-
sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*```  
 **@stuart.millholland:** I'm having trouble when using tiered storage
configuration when moving segments from one server to another. Here's the
error message I get: Segment fetcher is not configured for protocol: http,
using default Download and move segment immutable_events__0__0__20220527T1606Z
from peer with scheme http failed. java.lang.IllegalArgumentException: The
input uri list is null or empty  
 **@jorick:** @jorick has joined the channel  
 **@justin:** @justin has joined the channel  

###  _#pinot-dev_

  
 **@amrish.k.lal:** Hello, Is there an upper limit to how big bitmap inverted
index file can be? Seems to me like it can't exceed 2GB for now due to integer
offsets. I am wondering if this is accidental or intentional or if we can
consider larger size index files? One potential fix may be to change
BitmapInvertedIndexWriter._bytesWritten to a `long`? ```Error:
java.lang.RuntimeException: java.lang.IllegalArgumentException: Negative
position at
org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:310)
at
org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:66)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171) Caused by:
java.lang.IllegalArgumentException: Negative position at
sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863) at
org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.mapBitmapBuffer(BitmapInvertedIndexWriter.java:102)
at
org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.resizeIfNecessary(BitmapInvertedIndexWriter.java:95)
at
org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.add(BitmapInvertedIndexWriter.java:73)
at
org.apache.pinot.segment.local.segment.creator.impl.inv.json.OnHeapJsonIndexCreator.seal(OnHeapJsonIndexCreator.java:57)
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:560)
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:266)
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:238)
at
org.apache.pinot.hadoop.job.mappers.SegmentCreationMapper.map(SegmentCreationMapper.java:277)
... 9 more Suppressed: java.lang.IllegalArgumentException: Negative size at
sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:324) at
org.apache.pinot.segment.local.segment.creator.impl.inv.BitmapInvertedIndexWriter.close(BitmapInvertedIndexWriter.java:118)
at
org.apache.pinot.segment.local.segment.creator.impl.inv.json.OnHeapJsonIndexCreator.seal(OnHeapJsonIndexCreator.java:50)
... 13 more```  
**@richard892:** the file format has 32 bit offsets, so the suggested fix
wouldn't work  
**@richard892:** to double the limit to 4GB in a backward compatible way, the
offsets could be treated as `uint32` which would require reading 32 bits as a
`long` and masking with `0xFFFFFFFFL`  
**@amrish.k.lal:** I see, thanks. just wanted to check for now, otherwise I
don't have a strong reason to do this right now.  
**@richard892:** more than 2GB is a very large bitmap index  
**@amrish.k.lal:** true, it was more of a test that I was running rather than
a production usecase.  
**@richard892:** the change is trivial  
**@richard892:** but this is a limitation which has existed since bitmap
indexes were added to pinot  

###  _#announcements_

  
 **@justin:** @justin has joined the channel  

###  _#getting-started_

  
 **@valdamarin.d:** @valdamarin.d has joined the channel  
 **@francois:** Hi little getting started question as I need more preprocesing
on my fields. I’m facing a data error where I need to replace a few empty
string to default date let say 2999. I’m ingestion the majority of my data
using JSONPATH and I can’t directly use groovy as my cols names contains “.”
anyone have face this kind of issue ? By empty I mean the field exist in my
json so it’s not interpreted as null  
 **@jorick:** @jorick has joined the channel  
 **@justin:** @justin has joined the channel  

###  _#pinot-docsrus_

  
 **@vvivekiyer:** @vvivekiyer has joined the channel  

###  _#introductions_

  
 **@valdamarin.d:** @valdamarin.d has joined the channel  
 **@jorick:** @jorick has joined the channel  
 **@justin:** @justin has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org