You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2020/11/04 02:00:13 UTC

Apache Pinot Daily Email Digest (2020-11-03)

### _#general_

  
 **@matety:** @matety has joined the channel  

###  _#random_

  
 **@matety:** @matety has joined the channel  
 **@karinwolok1:** Hahaha  

###  _#troubleshooting_

  
 **@ravibabu.chikkam:** @ravibabu.chikkam has joined the channel  

###  _#onboarding_

  
 **@ravibabu.chikkam:** @ravibabu.chikkam has joined the channel  

###  _#community_

  
 **@ravibabu.chikkam:** @ravibabu.chikkam has joined the channel  

###  _#discuss-validation_

  
 **@chinmay.cerebro:** @snlee @mayanks wanted to check with you folks quickly
before I open a PR. Someone pointed me to this reallly nice validation
technique : (). This allows us to declaratively express our "ideal state" for
things like table config. For eg: ```{ "$schema": "", "$id": "", "title":
"Product", "description": "A product from Acme's catalog", "type": "object",
"properties": { "tableName": { "description": "Name of the table", "type":
"string" }, "tableType": { "description": "Type of the table", "type":
"string" }, "quota": { "description": "Specifies quota for storage and queries
per second", "type": "object", "properties": { "maxQueriesPerSecond" :
{"type": "integer"}, "storage": {"type": "string"} } }, "routing": { "type":
"object", "properties": { "segmentPrunerTypes" : { "type": "array", "items": {
"type": "string", "enum": ["partition"] } }, "instanceSelectorType": {"type":
"string", "enum": ["replicaGroup"]} } }, "segmentsConfig": { "type": "object",
"properties": { "schemaName": {"type": "string"}, "timeColumnName": {"type":
"string"}, "timeType": {"type": "string"}, "replication": {"type": "string"},
"retentionTimeUnit": {"type": "string", "enum": ["DAYS", "HOURS", "MINUTES",
"SECONDS"]}, "retentionTimeValue": {"type": "string"}, "segmentPushFrequency":
{"type": "string", "enum": ["HOURLY", "DAILY", "WEEKLY", "MONTHLY"]},
"segmentPushType": {"type": "string", "enum": ["APPEND", "REFRESH"]} } },
"tableIndexConfig": { "type": "object" }, "tenants": { "type": "object" },
"ingestionConfig": { "type": "object" }, "metadata": { "type": "object" } },
"required": [ "tableName", "tableType", "segmentsConfig", "tableIndexConfig" ]
}```  
**@ssubrama:** Seems like a neat idea. It also gets easier to ensure that
backward incompatible changes do not go through? (or, is that by manual review
when the schema file is updated)?  
**@chinmay.cerebro:** that's manual review  
**@ssubrama:** Are there plugins to validate one value vs another? (e.g. if
offline table then `replication` needs to be validated, otherwise
`replicasPerPartition`)  
**@chinmay.cerebro:** I'm afraid not, this is mostly syntax, range check, enum
checks etc...  
**@chinmay.cerebro:** we need other validations on top  
**@chinmay.cerebro:** eg: dependent config as you mentioned  
**@chinmay.cerebro:** or things like nodictionary columns and sorted index
columns don't go together, things like that  
 **@chinmay.cerebro:** I think we should adopt this  
 **@chinmay.cerebro:** we obviously need more validations on top of this - but
this will save us a lot of manual efforts  
 **@chinmay.cerebro:** thoughts ?  
 **@chinmay.cerebro:** I've also started this Google doc to capture all the
validations we want with table config:  
**@chinmay.cerebro:** please take a look at that as well  
 **@mayanks:** Thanks @chinmay.cerebro , will take a look  

###  _#segment-cold-storage_

  
 **@noahprince8:** Might be a little more difficult than I had originally
imagined. There’s really two entry points to downloading a segment,
`SegmentFetcherAndLoader` and `RealtimeTableDataManager` . Unifying those two
seems like it may be difficult, as the realtime use case has some backup peer
downloading.  
 **@jackie.jxt:** The peer downloading should be applicable to both offline
and realtime (might not be the case right now)  
 **@jackie.jxt:** And all segment download should be handled within the same
class  
 **@noahprince8:** Yeah, appears it is not handled that way now. The only way
it knows the uri for the deep store download is from a realtime specific
metadata class  
 **@noahprince8:** This bit of the codebase could use a refactor, but I’m not
sure I have the time  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org