You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/07/09 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-07-08)

### _#general_

  
 **@arunkumarc2010:** @arunkumarc2010 has joined the channel  
 **@sriramdas.sivasai:** Hello every one, I have doubt about the Storage and
query part of the pinot. Suppose if we hav 6 months of data as pinot segments
in deep storage (size of 500gb) and if i want to make the aggregate query on
last 6th months data. 1\. does my offline data server should have 500gb
memory(RAM) to processs the query ?? or Even with 100gb ram and storage of
500gb, the queries will work efficiently ?? 2\. Also does my query work, if i
didnt have the storage of 500gb ? 3\. memory required for loading segment file
from disk is same as the size of the file ? i meant, because of loading the
compressed file to memory will blow up the ram 3-4x. Also If want to read the
single record from previous 6 months, will it do on demand segment loading
from deep storage ??  
**@mayanks:** Servers maintain a local copy of segments on their disk. You do
need local disk big enough to store the per server data, but segments are
memory mapped, so you don’t need a big RAM, you can do away with 64GB ram for
example.  
**@sriramdas.sivasai:** understood. if my query is a groupBY query on 6months
dataset. does the lesser memory than the size of whole dataset to process the
query works ?  
**@mayanks:** Yes. You don't need large ram to match the data size.  
**@sriramdas.sivasai:** Ok. thanks. just to understand the performance of
pinot, all the benchmarks of pinot shown in various places are used memory
mapped mode segments from disk or HEAP mode ?  
**@mayanks:** MMAP mode only  
**@liranbri:** Hi everyone, we are evaluating Pinot and one of our
requirements is to be able to encrypt our client's data on the disk (in memory
it can be decrypted). is such a thing possible? and if so, we may also need to
encrypt it with a different encryption key per client (each client's data
would be encrypted with a unique key dedicated to that client). is there a way
to achieve that? thank you so much  
**@mayanks:** Pinot does support encryption of data copy on deepstore.
However, the local server copies on disk need to be decrypted to maintain low
latency. The per client encryption requirement is an interesting that I came
across in the past and opened an issue to track  
**@liranbri:** It would be a great feature! TBH i’m not sure what you mean by
“deepstore”. is that that storage consumed by Pinot, or the source of data
owned by us and ingested into Pinot?  
**@mayanks:** Pinot uses deep store to maintain a golden copy of the data
ingested. It supports deep stores like S3/ADLS/GCP/etc. That copy can be
encrypted.  
**@mayanks:** Pinot servers store a copy of the data on local disk for faster
serving (today), that copy does not support encryption.  
**@liranbri:** thanks for the explanation. and those are copies of all the
data, or just subsets of it ?  
**@liranbri:** because i’m trying to understand whats the actual value of
deep-store encryption, if the same data is decrypted on other disks?  
 **@savingoyal:** @savingoyal has joined the channel  
 **@orbit2:** @orbit2 has joined the channel  
 **@carlos:** ```Hi guys! I have a question regarding Kafka integration with
Pinot If I'm using a secured Kafka using SASL_SSL. Is there any way of
configuring that and use those credentials? Or there is another way of setting
security from Pinot to Kafka for data ingestion? Thanks in advance!```  
**@xiangfu0:** I think we have an issue open for SASL_SSL support:  
**@carlos:** Posted in troubleshooting as well  
 **@rkabir:** @rkabir has joined the channel  

###  _#random_

  
 **@arunkumarc2010:** @arunkumarc2010 has joined the channel  
 **@savingoyal:** @savingoyal has joined the channel  
 **@rkabir:** @rkabir has joined the channel  

###  _#feat-upsert_

  
 **@kchavda:** @kchavda has joined the channel  

###  _#troubleshooting_

  
 **@chxing:** Hi All , Can Pinot realtime table data transfer to offline table
directly, thx  
**@mayanks:** Yes  
**@chxing:** Thx Mayank  
 **@arunkumarc2010:** @arunkumarc2010 has joined the channel  
 **@ruslanrodriquez:** Hi everyone! I am researching realtime table evolution.
After updating pinot schema and reloading segments I see new columns in table
and null values in old data. But after consuming new data with not empty newly
added fields, new data are importing with null values too in new columns.
Kafka messages in avro formats. When I debug the code I see that
AvroRecordExtractor still uses old set of fields. Can I refresh fields set in
AvroRecordExtractior and start consuming messages with new columns?  
**@jackie.jxt:** The current consuming segment won't be able to pick up the
new values immediately because the writers for the new added columns are not
setup. The next consuming segment will pick the new fields up.  
**@jackie.jxt:** Can you please file an issue for the requirement? One work
around we can do is to drop the current consuming segment and replace with a
new one with all fields set up properly. But that also means the already
consumed data within the consuming segments are dropped and will be re-
consumed in the replacing segment  
 **@deemish2:** Hello everyone , i would like to understand how can backfill
offline data which contains multiple segment.  
**@jackie.jxt:** Do you mean replacing the current segments with a new set of
segments? Pinot automatically replaces the segment with the same name when a
new segment is pushed. One approach is to replace segment one by one with the
segment of the same name. We are also working on a feature to do atomic batch
replacement  
 **@savingoyal:** @savingoyal has joined the channel  
 **@orbit2:** @orbit2 has joined the channel  
 **@carlos:** Hi guys!  
 **@carlos:** I have a question regarding Kafka integration with Pinot  
 **@carlos:** If I’m using a secured Kafka using SASL_SSL. Is there any way of
configuring that and use those credentials? Or there is another way of setting
security from Pinot to Kafka for data ingestion?  
**@jackie.jxt:** @slack1 Can you please help answering this?  
**@jackie.jxt:** nvm.. Xiang already replied: I think we have an issue open
for SASL_SSL support:  
**@carlos:** Thanks in advance!  
 **@rkabir:** @rkabir has joined the channel  

###  _#pinot-s3_

  
 **@kchavda:** @kchavda has joined the channel  

###  _#aggregators_

  
 **@kchavda:** @kchavda has joined the channel  

###  _#pinot-dev_

  
 **@agnihotrisuryansh55:** Under the section `setting up pinot cluster` manual
cluster setup link is not accessible ` `  
**@xiangfu0:** fixed  
 **@atri.sharma:** @mayanks @jackie.jxt please review the distinct PR and let
me know if it looks ok  
**@mayanks:** Will do  

### _#pinot-docs_

  
 **@kchavda:** @kchavda has joined the channel  

###  _#getting-started_

  
 **@kchavda:** @kchavda has joined the channel  

###  _#debug_upsert_

  
 **@deemish2:** Hello everyone , i would like to understand how can backfill
offline data which contains multiple segment.  
 **@yupeng:** upsert table takes real time data only  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org