You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/09 02:00:18 UTC

Apache Pinot Daily Email Digest (2021-10-08)

### _#general_

  
 **@brunobrandaovn:** @brunobrandaovn has joined the channel  
 **@nolefp:** @nolefp has joined the channel  
 **@suman:** @suman has joined the channel  
 **@ss68374:** @ss68374 has joined the channel  
 **@hristo:** @hristo has joined the channel  

###  _#random_

  
 **@brunobrandaovn:** @brunobrandaovn has joined the channel  
 **@nolefp:** @nolefp has joined the channel  
 **@suman:** @suman has joined the channel  
 **@ss68374:** @ss68374 has joined the channel  
 **@hristo:** @hristo has joined the channel  

###  _#troubleshooting_

  
 **@brunobrandaovn:** @brunobrandaovn has joined the channel  
 **@nolefp:** @nolefp has joined the channel  
 **@suman:** @suman has joined the channel  
 **@luisfernandez:** hey I have this query that i’m issuing in pinot `select *
from ads_metrics where user_id=x and serve_time >= 1633651200` when I use this
query like this the `numEntriesScannedInFilter` shoots up quite considerably
if I don’t use serve_time I get 0 anyone knows why that may be? I currenly
have a rangeindex in the `serve_time` column and an invertedIndex +
partitioning on the user_id  
**@luisfernandez:** ```You should also think about partitioning the incoming
data based on the dimension most heavily used in your filter queries.```
that’s what i see in the documentation about high `numEntriesScannedInFilter`
but i’m already partitioning by the user_id  
**@g.kishore:** We have a new version of range index that will reduce the
latency @richard892 ^^  
**@richard892:** hi @luisfernandez you can try the new range index by setting
`"rangeIndexVersion" : 2` in `"tableConfig"` you'd have to build off OSS
master though  
**@richard892:** there's no automatic migration path yet because we want to
get some feedback first before we start automating the upgrade  
**@mayanks:** BTW, this is available in latest master and will be part of the
next 0.9.0 release.  
 **@ss68374:** @ss68374 has joined the channel  
 **@hristo:** @hristo has joined the channel  
 **@bowenwan:** Hi. How do I know if star-tree index is ready ? There seems to
be no improvement. `numDocsScanned` remains the same. My index config and
query are like follow: ```"starTreeIndexConfigs": [ { "dimensionsSplitOrder":
[ "id", "A", "B", "C", "D" ], "functionColumnPairs": [
"DISTINCT_COUNT_HLL__id" ], "maxLeafRecords": 10000 } ]``` Query: ```SELECT
DISTINCTCOUNTHLL(id), A FROM MyTable WHERE B = 'a' GROUP BY A ORDER BY
DISTINCTCOUNTHLL(id) DESC LIMIT 20```  

### _#docs_

  
 **@ss68374:** @ss68374 has joined the channel  

###  _#pinot-dev_

  
 **@lrhadoop143:** @lrhadoop143 has joined the channel  
 **@hristo:** @hristo has joined the channel  

###  _#community_

  
 **@ss68374:** @ss68374 has joined the channel  

###  _#announcements_

  
 **@ss68374:** @ss68374 has joined the channel  

###  _#roadmap_

  
 **@ss68374:** @ss68374 has joined the channel  

###  _#getting-started_

  
 **@lrhadoop143:** @lrhadoop143 has joined the channel  
 **@ss68374:** @ss68374 has joined the channel  
 **@kchavda:** When using ingestionconfig > transformconfigs, does it HAVE to
be a new column for the transformFunction? I would like to transform an
existing column from source and keep the same column name for the Pinot table
instead of creating a new column.  
 **@npawar:** it has to be a new name, you cannot transform a column and put
it into the same name  
**@kchavda:** follow-up on this. I'm reading from a kafka topic which has
dates as EPOCH. I want to load as TIMESTAMP in my Pinot realtime table. Do I
have to use a transformfunction and load as a new column? @npawar  
**@npawar:** when you say EPOCH, is it millisSinceEpoch? and what is the
format of TIMESTAMP?  
**@kchavda:** I'm using debezium and using the time.precision.mode = connect
so it's in millisSinceEpoch. Would like to load it as yyyy-MM-dd HH:mm:ss'
(ex:2019-07-02 12:21:04).  

###  _#releases_

  
 **@hristo:** @hristo has joined the channel  

###  _#minion-improvements_

  
 **@ss68374:** @ss68374 has joined the channel  

###  _#kinesis_help_

  
 **@abhijeet.kushe:** I did delete the OFFline table and removed the realtime
to offline task in QA but I still see the Task listed in Zookeeper console
.Also the controller keeps printing ``` No job to purge for the queue
TaskQueue_RealtimeToOfflineSegmentsTask``` I just restarted the entire cluster
I still see the above message.I have started Table again with shard iterator
at AT_SEQUENCE_NUMBER.I see the iterator is stuck at 66K seconds ago (since we
have 1 day retention).I have noticed iin the past if this iterator does not
shift for a long time.Will update later on if I dont see it change  
 **@abhijeet.kushe:**  
 **@abhijeet.kushe:**  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org