You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/10/29 02:00:12 UTC

Apache Pinot Daily Email Digest (2020-10-28)

### _#general_

  
 **@djspatoulas:** @djspatoulas has joined the channel  
 **@noahprince8:** Does a segment always consist of `columns.psf,
creation.meta, index_map, metadata.properties` ? I’m thinking for the s3 lazy
loading, it might make sense to have separate caching settings for metadata vs
`columns.psf`. Like you may want to eagerly load all or most of the metadata
since it’s small and means segments can be eliminated quickly.  
**@mayanks:** Yes, all segments have these file. But these are not exposed as
individual files. One issue I can think of with the approach is when a segment
is refreshed, the cached metadata can get out of sync, and would need some
sort of invalidation/reload.  
**@noahprince8:** How does a segment get refreshed? I thought the idea was
that data is immutable?  
**@noahprince8:** And what do you mean they aren’t exposed as individual
files? Do they get compressed at some point?  
**@mayanks:** Having said that, I do see some merit in eager loading of
metadata, Perhaps it would make sense to write down the idea and check against
cases to handle.  
**@mayanks:** As in, the interface doesn’t allow you to query a file from
segment  
**@noahprince8:** Oh. The interface expects the full segment to be there?  
**@mayanks:** I mean there is no api grtColumnPsfFile()  
**@noahprince8:** Added it as a comment on the lazy loading issue. I think
first we do lazy loading of the whole segment. Then add this as an
optimization later.  
**@mayanks:** There is getSegmentMetadata() though  
**@mayanks:** Yeah, I think your idea is good. Just saying we need to think
through to design the right apis, and ensure all cases handled  
 **@ravibabu.chikkam:** @ravibabu.chikkam has joined the channel  
 **@noahprince8:** Is it possible to pause kafka collection on a table, but
not querying? Seems like ChangeTableState disable makes queries return empty
as well as pausing kafka  
**@ssubrama:** That is not possible currently.  
**@ssubrama:** Although, we have had rquests for the feature  
**@ssubrama:** what is the use case for you?  
**@noahprince8:** Not really a production use case. I have a 100GB SSD and
it’s about to pop, testing a large dataset locally.  
**@noahprince8:** Though I think certainly it could be useful in production.
Maybe a producer goes haywire and we want to stop consuming that data and do a
repair, while still leaving the table accessible  
**@noahprince8:** Is there an issue for this? I can create one  
 **@g.kishore:** Quick reminder about tomorrow's meetup at 5 PM PST. We have
amazing talks lined up. @tingchen - Uber, @pradeepgv42 - Confluera @afilipchik
@elon.azoulay from City Storage Systems.  

### _#random_

  
 **@djspatoulas:** @djspatoulas has joined the channel  
 **@ravibabu.chikkam:** @ravibabu.chikkam has joined the channel  

###  _#troubleshooting_

  
 **@nguyenhoanglam1990:** @nguyenhoanglam1990 has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org