You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/03/31 02:00:20 UTC

Apache Pinot Daily Email Digest (2021-03-30)

### _#general_

  
 **@kmvb.tau:** @kmvb.tau has joined the channel  
 **@joshhighley:** ```If there are multiple controllers, Pinot expects that
all of them are configured with the same back-end storage system so that they
have a common view of the segments (e.g. NFS). Pinot can use other storage
systems such as HDFS or ADLS``` I can't find more info about this. Would any
mountable NFS work? S3 for example?  
**@mayanks:** Yes. NFS would work. So would any deep store  
**@mayanks:** We have implementations for NFS, ADLS, S3 and GCS  
**@g.kishore:** If you have S3, you don’t need NFS  
**@mayanks:** Correct. S3/ADL/GCS as deep-stores can be shared across
controllers without any further need of NFS on top of that.  
**@fx19880617:**  

###  _#random_

  
 **@kmvb.tau:** @kmvb.tau has joined the channel  

###  _#troubleshooting_

  
 **@kmvb.tau:** @kmvb.tau has joined the channel  
 **@chxing:**  
 **@chxing:**  
 **@mayanks:** What's the data type for `webConferenceId`?  
**@chxing:** LONG  
**@mayanks:** Is this a hybrid table?  
**@chxing:**  
**@chxing:** It should be a realtime table  
**@mayanks:** You have 90 days retention, so my guess is there's an offline
component. But that is ok  
**@mayanks:** At the face of it, this seems like a bug  
**@mayanks:** Trying to understand what might be causing it  
**@chxing:** So it should be a bug?  
**@mayanks:** Can you try with `webSiteId`?  
**@chxing:** String Type is ok, let me try again  
**@jackie.jxt:** Can you please paste the entire table config?  
**@chxing:** schema< `{` “schemaName”:“realtime_sjc_wmequality_report”,
“dimensionFieldSpecs”:[ { “name”:“webexSiteName”, “dataType”:“STRING” }, {
“name”:“webexConferenceId”, “dataType”:“LONG” }, { “name”:“webexSiteId”,
“dataType”:“LONG” }, { “name”:“correlationId”, “dataType”:“STRING” }, {
“name”:“metadataOsType”, “dataType”:“STRING” }, { “name”:“metadataOsVersion”,
“dataType”:“STRING” }, { “name”:“metadataBrowserType”, “dataType”:“STRING” },
{ “name”:“metadataClientType”, “dataType”:“STRING” }, {
“name”:“metadataClientVersion”, “dataType”:“STRING” }, {
“name”:“metadataHardwareType”, “dataType”:“STRING” }, {
“name”:“metadataNetworkType”, “dataType”:“STRING” }, {
“name”:“audioMainReportTransportType”, “dataType”:“STRING” }, {
“name”:“videoMainReportTransportType”, “dataType”:“STRING” }, { “name”:“day”,
“dataType”:“STRING” } ], “metricFieldSpecs”:[ { “name”:“systemAverageCPU”,
“dataType”:“LONG”, “defaultNullValue”:0 }, { “name”:“processAverageCPU”,
“dataType”:“LONG”, “defaultNullValue”:0 }, { “name”:“osBitWidth”,
“dataType”:“LONG”, “defaultNullValue”:0 }, { “name”:“cpuBitWidth”,
“dataType”:“LONG”, “defaultNullValue”:0 }, {
“name”:“audioMainReportRxE2eLostPercent”, “dataType”:“FLOAT”,
“defaultNullValue”:0 }, { “name”:“audioMainReportRxE2eJitter”,
“dataType”:“LONG”, “defaultNullValue”:0 }, {
“name”:“audioMainReportTxHbhLostPercent”, “dataType”:“FLOAT”,
“defaultNullValue”:0 }, { “name”:“audioMainReportTxHbhJitter”,
“dataType”:“LONG”, “defaultNullValue”:0 }, {
“name”:“audioMainReportRxHbhLostPercent”, “dataType”:“FLOAT”,
“defaultNullValue”:0 }, { “name”:“audioMainReportRoundTripTime”,
“dataType”:“LONG”, “defaultNullValue”:0 }, {
“name”:“videoMainReportRxE2eLostPercent”, “dataType”:“FLOAT”,
“defaultNullValue”:0 }, { “name”:“videoMainReportRxE2eJitter”,
“dataType”:“LONG”, “defaultNullValue”:0 }, {
“name”:“videoMainReportTxHbhLostPercent”, “dataType”:“FLOAT”,
“defaultNullValue”:0 }, { “name”:“videoMainReportTxHbhJitter”,
“dataType”:“LONG”, “defaultNullValue”:0 }, {
“name”:“videoMainReportRxHbhLostPercent”, “dataType”:“FLOAT”,
“defaultNullValue”:0 }, { “name”:“videoMainReportRoundTripTime”,
“dataType”:“LONG”, “defaultNullValue”:0 } ], “dateTimeFieldSpecs”:[ {
“name”:“timestamp”, “dataType”:“STRING”,
“format”:“1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd’T’HH:mm:ss.SSS’Z’“,
“granularity”:“1:MILLISECONDS” } ] }  
**@chxing:** table: `{` “tableName”:“realtime_sjc_wmequality_report”,
“tableType”:“REALTIME”, “segmentsConfig”:{ “timeColumnName”:“timestamp”,
“timeType”:“DAYS”, “retentionTimeUnit”:“DAYS”, “retentionTimeValue”:“90”,
“segmentPushType”:“APPEND”,
“segmentAssignmentStrategy”:“BalanceNumSegmentAssignmentStrategy”,
“schemaName”:“realtime_sjc_wmequality_report”, “replication”:“2”,
“replicasPerPartition”:“2” }, “tenants”:{}, “tableIndexConfig”:{
“loadMode”:“MMAP”, “streamConfigs”:{ “streamType”:“kafka”,
“stream.kafka.consumer.type”:“LowLevel”,
“stream.kafka.topic.name”:“sj1_mqa_telemetry_wmequality_report”,
“stream.kafka.decoder.class.name”:“org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder”,
“stream.kafka.consumer.factory.class.name”:“org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory”,
“stream.kafka.broker.list”:“10.241.89.130:9092",
“realtime.segment.flush.threshold.time”: “24h”,
“realtime.segment.flush.threshold.size”: “300M”,
“stream.kafka.consumer.prop.auto.offset.reset”: “largest” },
“invertedIndexColumns”:[ “webexSiteName”, “webexConferenceId”, “webexSiteId”,
“correlationId”, “metadataOsType”, “metadataBrowserType”,
“metadataClientType”, “metadataHardwareType”,
“metadataNetworkType”,“audioMainReportTransportType”,“videoMainReportTransportType”,“day”
],
“sortedColumn”:[“audioMainReportRxE2eLostPercent”,“audioMainReportRxE2eJitter”]
}, “metadata”:{“customConfigs”:{}} }  
**@jackie.jxt:** Do you have time for a quick zoom? Need to try more queries
to identify the issue  
**@chxing:** webSiteId is ok, also LONG type  
**@jackie.jxt:** How about `select * from table where websiteId = 8049967
limit 1000`? Want to see if the missing conferenceId is returned here  
**@chxing:** ok  
**@mayanks:** Yeah, that is what I also meant earlier  
**@chxing:**  
**@chxing:** No response?  
**@chxing:**  
**@chxing:** The status of segments, seems normal  
**@jackie.jxt:** Sorry, should be `webexSiteId = 8049967`  
**@chxing:** ok  
**@chxing:**  
**@chxing:** has response  
**@jackie.jxt:** Let's try `select * from realtime_sjc_wmequality_report where
webexConferenceId = '189852985506937900' limit 1000` first to rule out the
possibility of compilation problem  
**@chxing:**  
**@chxing:** No response  
**@jackie.jxt:** @chxing Can you join this zoom? We can try some queries
together to track down the problem  
**@chxing:** Wait a minute, I need to ask my manager  
**@jackie.jxt:** Sure  
**@chxing:** ok  
**@chxing:** Joined  
 **@chxing:** Hi All, I got one issue, I select * from db to make sure we db
have this item in db  
 **@chxing:** But I can’t use this “select * from
realtime_sjc_wmequality_report where webexConferenceId=189852985506937900
limit 1000” to get it  
**@fx19880617:** What's the error?  
**@fx19880617:** Use single quote for big int?  
**@chxing:** webexConferenceId is LONG type ,I just use where
webexConferenceId=189852985506937900  
**@chxing:** but no response  

###  _#pinot-dev_

  
 **@khushbu.agarwal:** Hi When a server is in dead state(update of deployment
in kubernetes) pinot doesn't rebalance the segments among existing/new severs.
Even manual rebalance is not helping(result: already balanced). Tried deleting
the instance. It fails with error :"server is in ideal state of xyz table" .
How do I resolve this?  
 **@oren:** @oren has joined the channel  
 **@npawar:** You have to untag that server then rebalance:  
**@khushbu.agarwal:** Thanks @npawar this fixed the issue. Although wondering
why it didn't resolve automatically?  
**@npawar:** Rebalance is not designed to adjust automatically  
**@khushbu.agarwal:** On deployment update?  
**@khushbu.agarwal:** What about when a server is in dead state for a long
period?  
**@npawar:** If the server is still in zk, and tagged with the same tag used
by the table, then it will continue to be used. @fx19880617 is there a way to
achieve automatic removal of the server or rebalance when using k8 deployment?
I would guess not?  
**@fx19880617:** no, pinot cannot figure out if this server is dead for a long
time or should be recycled, it has to be human intervention, however, user can
build some script or tooling to do periodically check and perform the action  
**@fx19880617:** for k8s, usually the server will be restarted and back to
normal  
**@khushbu.agarwal:** This is the case where after deployment the pod ip
changes  
**@fx19880617:** but service name doesn’t change right?  
**@fx19880617:** k8s should handle the dns  
 **@npawar:** @khushbu.agarwal ^^  
 **@g.kishore:** Let’s repost this in troubleshooting..  
**@amrish.k.lal:** Question about JSON functions described in . Are the
following functions supported in SQL or does the documentation need to be
modified? • TOJSONMAPSTR • JSONFORMAT • JSONPATHLONG • JSONPATHDOUBLE •
JSONPATHSTRING • JSONPATHARRAY They seem to take Java Objects as inputs?  
**@jackie.jxt:** Seems it only works for data ingestion but not in SQL. Can
you please submit an issue about this?  
**@ken:** @amrish.k.lal thanks for reporting this! Also note there’s a
<#C01822DR7UP|pinot-docs> Slack channel that’s good for doc-related issues
like this.  

###  _#pinot-rack-awareness_

  
 **@jaydesai.jd:** Hey @g.kishore @rkanumul Thanks for reviewing the doc. Can
we sign off on it today ?  
 **@dlavoie:** Design is LGTM! Do we plan on providing an out of the box pinot
property provider in addition to the azure specific provider?  
**@rkanumul:** Not part of the plan atm.. But I thought we might need a Noop
impl probably… With your new suggestion, the out of the box property based
option will just work.. so leaning towards it  
 **@g.kishore:** we need a better name for plugin folder  
 **@g.kishore:** everything else looks good to me  
 **@g.kishore:** what is a good term to describe where Pinot is deployed  
 **@g.kishore:** on prem, k8s, gcp, azure, aws  
 **@dlavoie:** I like the zone awareness term  
 **@g.kishore:** i am looking for a more generic term  
 **@g.kishore:** zoneawareness is a subset of it  
 **@dlavoie:** Zone feels generic to me. Can be a rack, a datacenter room, a
region, a cloud provider or a continent  
 **@g.kishore:** future proof it a bit  
 **@g.kishore:** this will be a pinot-plugin  
 **@g.kishore:** pinot-plugins/pinot-zone/pinot-azure ?  
 **@g.kishore:** i dont think that makes sense  
 **@dlavoie:** For the plugin name, I agree  
 **@dlavoie:** zone-discovery-provider ?  
 **@dlavoie:** Yeah, took a step back and actually I have design comment, I’ll
share them in the doc.  
 **@jaydesai.jd:** @dlavoie Updated the document with your suggestion. Can u
review it again. Thanks :slightly_smiling_face: cc @g.kishore  
 **@dlavoie:** Looks good :slightly_smiling_face:  
**@jaydesai.jd:** Can u sign off at the bottom of the Document. I have added
your name to the reviewers list. Thanks :slightly_smiling_face:  

###  _#minion-improvements_

  
 **@laxman:** @laxman has joined the channel  
 **@laxman:** @laxman set the channel description: Minion improvements  
 **@g.kishore:** @g.kishore has joined the channel  
 **@npawar:** @npawar has joined the channel  
 **@jackie.jxt:** @jackie.jxt has joined the channel  
 **@fx19880617:** @fx19880617 has joined the channel  
 **@laxman:** Hi Team, I’m Laxman from Traceable. We use Pinot in our system.
We want to collaborate and contribute to Pinot Minion project. Our major
product requirement is “Data deletion for a specific filter criteria”  
 **@laxman:** Created this channel and added you people as suggested by
Kishore.  
 **@jackie.jxt:** @fx19880617 How's the progress on the minion pluggable
tasks? This can be modeled as a purge task  
 **@fx19880617:** it’s there  
 **@fx19880617:** you and add your own minion tasks in parallel to this
minion-builtin-tasks:  
**@fx19880617:** just follow examples of existing built-in tasks and create
your shaded jars  
 **@jackie.jxt:** In order to use the existing `PurgeTaskExecutor`, the
`RecordPurgerFactory` and `RecordModifierFactory` need to be registered into
the `MinionContext`, which cannot be done via config yet  
 **@fx19880617:** you can also follow this PR to see what I touched for pinot-
distribution/assemble.xml file:  
**@laxman:** I see lot of work done in  release. Am trying to catchup going
through release notes.  
 **@laxman:** Do we have any epic/parent jira where all this Minion work is
tracked?  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org