You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/12 02:00:36 UTC

Apache Pinot Daily Email Digest (2021-10-11)

### _#general_

  
 **@otiennosharon:** @otiennosharon has joined the channel  
 **@msoni6226:** Hi Team, I, along with my team mate @bajpai.arpita746462 and
@vibhor.jain we have documented our understanding around how to handle
duplicate data in Pinot in this . Please review it and let us know if there
are any feedback  
 **@msoni6226:** Hi Team, I was going through the Pinot Upsert flow
documentation and youtube video() and I have couple of questions regarding
this: Questions: 1\. Why do we really need to partition the input stream based
on the primary key? 2\. When we are maintaining a primary key Index we can
have the records going to any of the segments and update the same in the
primary key index, why should we ensure that record with same primary key
should go to the same segment?  
**@yupeng:** that's a cool blog. thanks for sharing. for your questions, you
can read the design doc on the whys  
**@msoni6226:** Thanks Yupeng for pointing out this document. It has all the
answers to my questions  
 **@karinwolok1:** :speaker: DeveloperWeek is looking for speakers!!!
Interested in presenting about what you're doing with Pinot? Submit here:  
**@karinwolok1:** DevNexus also looking for speakers! Let's get you all in the
developer speaker circuit! :smile:  
**@karinwolok1:** SIGMOD is another one! All about databases. Get your talk
proposals submitted!  
**@roland.vink:** @roland.vink has joined the channel  
 **@rohan.a.suri:** @rohan.a.suri has joined the channel  

###  _#random_

  
 **@otiennosharon:** @otiennosharon has joined the channel  
 **@roland.vink:** @roland.vink has joined the channel  
 **@rohan.a.suri:** @rohan.a.suri has joined the channel  

###  _#feat-compound-types_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#apa-16824_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#troubleshooting_

  
 **@otiennosharon:** @otiennosharon has joined the channel  
 **@zsolt:** We are using the offlineSegmentDelayHours metric for monitoring
if the RealtimeToOffline task is stuck, and since upgrading to 0.8.0 we see
stale values for it. Prior to 0.8.0 the metrics were present only on one
controller, but now they can be on multiple controllers. I've found that 0.8.0
enables Controller Resource by default, so the tables can have different
controllers as leaders. We couldn't find a metric to decide which controller
is the leader for a table, so we can't filter out the stale metric for alerts.
IMO these metrics should be removed for the table once leadership is lost, or
there should be a gauge which can be used to decide if a controller is leader
for a table.  
**@g.kishore:** Good point. @jlli any thoughts on this?  
**@jlli:** Yeah, that sounds fair. Let me try to make the change  
**@jlli:** @zsolt the `PinotLeadControllerRestletResource` specifies the APIs
to check the leadership of pinot tables. Please take a look  
**@g.kishore:** @jlli that may not help. I think we should have a solution
around not emitting metrics for a table if the controller is not the leader
for that table  
**@g.kishore:** otherwise, monitoring and alerting will hard  
**@g.kishore:** @zsolt what tool are you using for monitoring? will adding up
the metrics across all controllers help?  
**@zsolt:** We are using the JMX metrics with prometheus agent  
**@jlli:** @g.kishore That’s also true. We should also clean up metrics if the
current controller is not the leader for that table. Since controller periodic
tasks are run periodically, we can do the cleanup there  
**@mayanks:** Thanks @jlli, mind filing an issue to capture the problem and
track progress?  
**@jlli:** Sure, will file an issue  
**@zsolt:** Thanks!  
**@jlli:** Here it is:  
**@jlli:** This is the PR for the issue above:  
**@deemish2:** Hello , we are using spark batch ingestion job to push data
into pinot offline table using pinot-0.8.0 , we are getting this kind of
exception - Caused by: groovy.lang.MissingPropertyException: No such property:
date for class: SimpleTemplateScript1\n\tat
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:66)  
**@xiangfu0:** i think your ingestion job spec has groovy templates but
propeties are not set  
 **@roland.vink:** @roland.vink has joined the channel  
 **@rohan.a.suri:** @rohan.a.suri has joined the channel  

###  _#feat-geo-spatial-index_

  
 **@otiennosharon:** @otiennosharon has joined the channel  
 **@kchavda:** Does anyone have a working example of using anything other than
the stPoint & toSphericalGeography functions?  
**@yupeng:** you can read this blog  
**@kchavda:** Thanks for sharing @yupeng.  
**@kchavda:** This query bombs on me from query console ```select
toGeometry(base64Decode('AQEAACDmEAAAT0wojs27YsA0a4TZX5hOQA==')) from
meetupRsvp limit 10``` Am I using the function correctly here?
```org.apache.pinot.sql.parsers.SqlCompilationException: Caught exception
while invoking method: public static byte[]
org.apache.pinot.core.geospatial.transform.function.ScalarFunctions.toGeometry(byte[])
with arguments: [[B@f7f6884]```  

###  _#docs_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#aggregators_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#dhill-date-seg_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#enable-generic-offsets_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#community_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#announcements_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#discuss-validation_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#config-tuner_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#getting-started_

  
 **@sirsh:** Im running my first batch ingestion job ingestion from S3 parquet
files - the task was kicked off and the 8 rows of the input sample are read
but then it fails and im not sure what the error message is telling me ...
what is the illegal argument in this context? I did not get any closer looking
at the source for Segment Name Generator... ```RecordReader initialized will
read a total of 8 records. at row 0. reading next block block read in memory
in 1 ms. row count = 8 Start building IndexCreator! Finished records indexing
in IndexCreator! Failed to generate Pinot segment for file -
java.lang.IllegalArgumentException: null at
shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-
dependencies.jar:0.9.0-SNAPSHOT-11f8550b9b2881ede4d105416ed970a5dd708463] at
org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-
dependencies.jar:0.9.0-SNAPSHOT-11f8550b9b2881ede 4d105416ed970a5dd708463]```
Can anyone suggest what illegal thing i am doing from this error message?
adding jobSpec in thread...  
**@sirsh:** ``` executionFrameworkSpec: name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
segmentMetadataPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
jobType: SegmentCreationAndUriPush inputDirURI: 's3://...'
includeFileNamePattern: 'glob:**/*.parquet' outputDirURI: 's3://...'
overwriteOutput: true pinotFSSpecs: \- scheme: s3 className:
org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'us-east-1'
recordReaderSpec: dataFormat: 'parquet' className:
'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader' tableSpec:
tableName: 'MY_TABLE' schemaURI: '' tableConfigURI: '' pinotClusterSpecs: \-
controllerURI: '' pushJobSpec: pushAttempts: 2 pushRetryIntervalMillis: 1000
```  
**@sirsh:** OK - im missing a `segmentNameGeneratorSpec` I realize its helpful
to scan through the logs above the error and observe where some parameters are
null and sometimes it matters!  
 **@roland.vink:** @roland.vink has joined the channel  
 **@sirsh:** When i query presto when there is a column with a reserved
keyword like `timestamp` even though the spec for presto suggests that it can
be escaped with double quotes, i cannot seem to submit a query that includes
`"timestamp"` It might be specific to the clients I am using; i have tried the
presto-cli freshly downloaded and a python client and both result in a
PQLParsingError. What to do in this situation? (this is testing the presto-
pinot connector but maybe not a Pinot question for this channel)  
**@xiangfu0:** right, the generated pinot query also need to be escaped, but
that is not set in presto pinot connector  
**@xiangfu0:** so it requires a fix for that  
**@sirsh:** thanks for the context @xiangfu0 :pray:  

###  _#feat-partial-upsert_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#debug_upsert_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#complex-type-support_

  
 **@otiennosharon:** @otiennosharon has joined the channel  

###  _#kinesis_help_

  
 **@abhijeet.kushe:** It actually does work as the moment it consumes the
latest message the iterator age drops to 0.I am not aware how the iterage age
is supposed to reflect when the shard iterator is AT_SEQUENCE_NUMBER.  
 **@abhijeet.kushe:**  
 **@npawar:** cool  
 **@npawar:** created issue for this  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org