You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/11/11 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-11-10)

### _#general_

  
 **@xiangfu0:** Just fyi, I will prepare the 0.9.0 release candidate based on
commit .  
 **@nicholas.nezis:** @nicholas.nezis has joined the channel  
 **@karinwolok1:** Live in 5 min if anyone wants to join (Intro level on
Apache Pinot)  
**@bowenzhu:** @bowenzhu has joined the channel  
 **@brandon:** @brandon has joined the channel  

###  _#random_

  
 **@nicholas.nezis:** @nicholas.nezis has joined the channel  
 **@bowenzhu:** @bowenzhu has joined the channel  
 **@brandon:** @brandon has joined the channel  

###  _#troubleshooting_

  
 **@alihaydar.atil:** Hey everyone, i have managed to create a hybrid table. I
have few questions regarding to the subject. •Since segments are transferred
to offline table periodically, Is it a correct assumption that i don't need
those transferred realtime segments to be hosted in servers? •If that is the
case, Is it recommended to clean up those transferred segments and what is the
correct way to clean them up? What comes up to my mind is setting up
retentionTimeUnit and retentionTimeValue properties in realtime table
configuration. Does Pinot have a built-in clean up mechanism for hybrid
tables? Thanks in advance  
**@npawar:** Setting retention is the right way  
**@npawar:** Pinot has a periodic task which will cleanup the segments from
the table, which are older than renention time  
**@alihaydar.atil:** @npawar thank you :pray::skin-tone-2:  
 **@nicholas.nezis:** @nicholas.nezis has joined the channel  
 **@tony:** We have a Pinot / Kubernetes deployment with 6 controller pods. We
are seeing high CPU on one controller, very low on the others. Restarting pods
does not change this behavior. Our Pinot is now primarily ingesting one fairly
high volume Kafka stream with 128 partitions. Is this expected?  
**@xiangfu0:** It's expected I think. Pinot controller doesn't do actual
ingestion work but handles the management of segment assignment, you should
expect server side to ingest data.Also a lot of heavy lifting works are done
by lead controller, which is elected automatically. Typically we don't do more
than 3 controllers.  
**@tony:** Thanks. I did find that this is the leader. I will switch to 3
controllers but make them larger.  
**@xiangfu0:** :thumbsup:  
**@ssubrama:** How many tables do you have? If you have many tables, the load
will be divided roughly equally amongst the controllers for each table. If you
have only one table, then one of the controllers will be doing all the work,
of course.  
**@ssubrama:** Also, think of the work in the controller as "meta" work. So,
it is proportional to the frequency new segments added/deleted (roughly). And
then there are periodic jobs that are sort of proportional to the _number_ of
segments you have in tables (data size does not matter). It is useful to check
if your realtime tables are creating segments too frequently.  
**@ssubrama:** @tony ^^  
**@tony:** Currently we have essentially one table (2 tables but one is 98% of
the volume) but that will be changing over time. So as we add more tables we
will see load more distributed over controllers.  
 **@bowenzhu:** @bowenzhu has joined the channel  
 **@brandon:** @brandon has joined the channel  

###  _#pinot-dev_

  
 **@dunithd:** @dunithd has joined the channel  
 **@xiangfu0:** Just fyi, I will prepare the 0.9.0 release candidate based on
commit .  

###  _#announcements_

  
 **@dunithd:** @dunithd has joined the channel  

###  _#presto-pinot-connector_

  
 **@nakkul:** @nakkul has joined the channel  

###  _#pinot-perf-tuning_

  
 **@tony:** Question about server disk size - do server nodes need enough disk
space to store all segments? Or will segments get dropped from local disk and
re-read from deep storage as needed if the disk gets full?  
 **@g.kishore:** it needs enough disk space to store all the segments assigned
to it  
 **@tony:** Thanks. So deep storage is just a backup. Is this use case  is
meant to address? We have a AWS/EKS deployment and our cost is driven by
server storage (EBS) - it would be ideal to have older data in S3  
 **@ssubrama:** @tony perhaps you are looking for a solution being worked on
in this issue:  
**@ssubrama:** Tiered storage just moves some segments to a different set of
servers, but those servers now need to have enough storage to host these.  
 **@ssubrama:** Even in the issue that I mention, it is expected that the
storage use temporarily bumps up on the servers, and then reclaimed when the
segments "age". Pinot does not handle the case of serving data from segments
that cannot be stored on servers.  

###  _#getting-started_

  
 **@dunithd:** @dunithd has joined the channel  
 **@aaron.weiss:** @aaron.weiss has joined the channel  

###  _#releases_

  
 **@dunithd:** @dunithd has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org