You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/09/08 02:00:28 UTC
Apache Pinot Daily Email Digest (2021-09-07)

### _#general_

  
 **@vutruongxuan99:** @vutruongxuan99 has joined the channel  
 **@dadelcas:** @dadelcas has joined the channel  
 **@dadelcas:** hello everyone :slightly_smiling_face:  
 **@grace.lu:** Hello Pinot experts, I wonder if anyone here running Pinot on
k8s in production have suggestions for pinot disaster recovery plan from k8s
cluster downtime. Assume we are in a environment with multiple k8s clusters
running, which of the following would you recommend to let Pinot be resilient
to k8s cluster level outage or maintenance: 1\. Setting up Pinot cluster
across multiple k8s environment with each of them holding one set of data
replication. --- (not sure if it is feasible or easy to do) 2\. Setting up
fully replicated redundant Pinot clusters in different k8s environment, also
replicating the data ingestion and anything we did in main cluster. --- (seems
costly) 3\. Only setting up Pinot running in one k8s cluster, in the case of a
k8s cluster outage, rebuild the server, controller, broker in another healthy
k8s cluster and let it pick up the old states from kafka, zookeeper, s3, etc.
--- (How hard is it for a newly build pinot cluster to inherit and resume the
old states?) Any experience sharing on handling this in a prod environment is
much appreciated :pray::skin-tone-2:. Thanks in advance!  
**@mayanks:** You could have Pinot deployment across availability zones?
What's your cloud provider?  
**@xiangfu0:** current pinot k8s deployment is one cluster per k8s, which
means you will have N pinot clusters in your N k8s clusters. This is like
fully replicated all-active story.  
**@xiangfu0:** I would say do 2 replicates per k8s cluster and have a load
balancer on top of all pinot clusters  
**@xiangfu0:** btw, what’s the availability of your k8s? if it’s high enough,
you can just one pinot cluster on one k8s.  
 **@sina.tamizi:** @sina.tamizi has joined the channel  

###  _#random_

  
 **@vutruongxuan99:** @vutruongxuan99 has joined the channel  
 **@dadelcas:** @dadelcas has joined the channel  
 **@sina.tamizi:** @sina.tamizi has joined the channel  

###  _#troubleshooting_

  
 **@vutruongxuan99:** @vutruongxuan99 has joined the channel  
 **@nadeemsadim:** I can see the *pinot-zookeeper disk usage high* .. what
could be the root cause since metadata cant be 65 gb out of 95 gb given with
few million records in table .. is it database indexing causing some external
views to be stored on zookeeper disk? @mayanks @xiangfu0 @jackie.jxt @ssubrama
@g.kishore  
**@mrpringle:** You need to setup a zookeeper clean up job. To remove old
snapshots.  
**@mrpringle:**  
**@mayanks:** Thanks @mrpringle  
**@mrpringle:** I'm trying to use tenant tags with the kafka low level
consumer to split consuming v consumed partitions across servers. However the
offline servers don't seem to be getting any segments. Are there additional
steps needed to get this to work. Am also using upsert functionality.  
**@mayanks:** Consumed is still part of RT table. Offline is what is pushed
from offline ingestion flow  
**@mayanks:** You probably want to check out  
**@npawar:** Are you trying to use this?  
**@zsolt:** We are running pinot in kubernetes, and noticed that the servers
are considered ready too early, before the server has managed to start. This
causes the statefulset rolling restart to restart multiple servers
simultaneously, making segments inaccessible. The server api `/health`
endpoint should be used for readiness probing?  
**@mayanks:** Broker routes the query to a server for only segments that are
online.  
**@zsolt:** In our case 7 out of 8 servers were restarting at the same time  
**@mayanks:** Are you using replica groups? If so, you could do one replica at
a time?  
**@zsolt:** we are not using it  
**@zsolt:** and we are doing helm upgrades for config changes, so it's not
done manually  
**@mayanks:** @xiangfu0 Any suggestions? IIRC, there are deployments that have
hooks that wait for sometime (x minutes) before reporting healthy? cc:
@jackie.jxt  
**@jackie.jxt:** Which version of pinot are you running? How do you shut down
the servers? We need to ensure the shut down hook is called when shutting down
the servers  
**@zsolt:** Running 0.7.1 with the helm chart from the repo. When we do a helm
upgrade (i.e. last time I've configured s3 retries for the Servers), the pods
are restarted by the StatefulSet controller, using the default *RollingUpdate*
strategy. The controller waits for the restarted pod to be Ready, then
proceeds to restart the next one. The standard kubernetes termination is
SIGTERM followed by SIGKILL after 30s if not terminated.  
**@zsolt:** In the chart the Brokers have the /health readiness probe, that's
why I'm wondering why the Servers don't have it set.  
**@jackie.jxt:** Here is a fix for adding the shutdown hook for the server:  
**@jackie.jxt:** Seems it is not included in `0.8.0`, so you need to try
either the current master or wait for the next release  
**@jackie.jxt:** Adding @xiangfu0 to take a look as well  
 **@dadelcas:** @dadelcas has joined the channel  
 **@dadelcas:** Hi everyone, I have an issue with a k8s deployment. Basically
controllers are discovered twice: once via headless service and one more time
via regular service. The one discovered through the regular services is always
reported as "failed" as there is no ZK entry with the FQDN of the service. Is
there any way to fix this?  
**@xiangfu0:** headless svc is used for internal pod discovery, e.g. pinot-
controller-0, pinot-server-2 …  
**@xiangfu0:** zk svc/headless-svc are there but not exposed externally  
**@dadelcas:** I understand but I don't know why Helix is discovering one
controller per service. This doesn't happen with the the brokers or any other
node type, just the controllers  
**@dadelcas:** and I don't know whether this will have side effects  
**@dadelcas:** I'd like to have controllers reported correctly  
**@xiangfu0:** from helix side, each controller will register itself  
**@xiangfu0:** the svc side might be the deep store or vip config？  
**@dadelcas:** I'm not sure if I follow you  
**@dadelcas:** I have deploy the helm chart with the defaults so the FS is the
node HD  
**@dadelcas:** but I don't know how this could have anything to do with the
controllers showing twice  
**@xiangfu0:** hmm, where you find controller showing twice  
**@xiangfu0:** in k8s or logs or helix ?  
**@dadelcas:** in the /instances API and therefore in the UI as well  
**@dadelcas:** Helix manager returns 2 records for the controller as per my
first message  
**@dadelcas:** there is a single controller POD  
**@xiangfu0:** can you paste a screen shot?  
**@xiangfu0:** we will check on that  
**@dadelcas:** let me see if I can pull a screenshot, I may need to blur some
details though.. bear with me  
**@xiangfu0:** sure, in general, each pinot pod will register itself, and it’s
true for k8s world, fqdn is just its pod name, svc names are externally, so
shouldn’t be counted here  
**@dadelcas:**  
**@dadelcas:** I agree, the fact of using kubernetes should not change
anything here  
**@dadelcas:** Unfortunately I'm still too new to the code and I can't find a
solution myself  
**@dadelcas:** let me know if that helps  
**@xiangfu0:** hmm, seems something goes wrong, I’ll check  
**@dadelcas:** cheers!  
 **@gqian3:** Hi team, recently we had experienced Pinot server out of memory
issue in a 4 server Pinot cluster when issuing one query select distinct id on
entire table with only 1 billion record. We had to manually restart the Pinot
servers pods to recover. Is this normal? Is there some index or Pinot
cofigurariom we can add to this id or Pinot cluster to prevent it bring down
the entire cluster.  
**@ken:** We had run into a similar issue on 0.6 when doing a `DISTINCTCOUNT`
on a high cardinality field. Only the broker was jammed, so restarting that
process fixed things. From the stack trace, it seemed like the problem was
caused by very large responses from the server processes causing blockage at
the broker network layer. I don’t think there was a way to prevent this from
happening, at least with that version.  
 **@sina.tamizi:** @sina.tamizi has joined the channel  

###  _#pinot-dev_

  
 **@mosiac:** Hello, Im looking into writing a custom kafka connector (used
for connecting with a proprietary kafka service). What would be the best way
of doing this and also being able to pull changes from the public repo? My
first solution was to duplicate the pinot-kafka-2.0 module and adapt that, but
this results in a lot of duplicate files and if the original module changes
i'll have to reimplement those changes in mine. My changes aren't that big,
only to KafkaPartitionLevelConnectionHandler and
KafkaPartitionLevelStreamConfig, and a version bump for the kafka client to
2.5  
**@mayanks:** May be extend the existing impl and override whatever needs to
be?  
 **@dadelcas:** @dadelcas has joined the channel  

###  _#getting-started_

  
 **@luisfernandez:** when we insert data into pinot how is replication
achieved? is it when a segment is completed that we make this data available
to other nodes?  
**@kulbir.nijjer:** Depends on type of table - realtime vs. offline. For
realtime as many servers(consumers) as the replication factor, start consuming
data in parallel from the streaming source. Whenever segment is completed
controller gets notified and it picks one of the replica servers to commit the
segment to and also update the segment store. For offline servers -since
segment is already generated earlier, replication simply controls which
servers from the pool host the offline segment and it's decided by Controller.
More details are defined here:  as well as offline/batch data flow.  
 **@kangren.chia:** if i wish to use the native java client but only can have
my broker/controller exposed outside of the cluster, is my only option to use
`ConnectionFactory.fromHostList(brokerUrl)`? im not all that familiar with ZK
and i dont see a way in the API to retrieve broker addresses from the
zookeeper category of APIs exposed by the controller  
**@xiangfu0:** you need to expose broker externally then use broker list to
query  

### _#releases_

  
 **@dadelcas:** @dadelcas has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org