You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/09 02:00:18 UTC
Apache Pinot Daily Email Digest (2021-10-08)
### _#general_
**@brunobrandaovn:** @brunobrandaovn has joined the channel
**@nolefp:** @nolefp has joined the channel
**@suman:** @suman has joined the channel
**@ss68374:** @ss68374 has joined the channel
**@hristo:** @hristo has joined the channel
### _#random_
**@brunobrandaovn:** @brunobrandaovn has joined the channel
**@nolefp:** @nolefp has joined the channel
**@suman:** @suman has joined the channel
**@ss68374:** @ss68374 has joined the channel
**@hristo:** @hristo has joined the channel
### _#troubleshooting_
**@brunobrandaovn:** @brunobrandaovn has joined the channel
**@nolefp:** @nolefp has joined the channel
**@suman:** @suman has joined the channel
**@luisfernandez:** hey I have this query that i’m issuing in pinot `select *
from ads_metrics where user_id=x and serve_time >= 1633651200` when I use this
query like this the `numEntriesScannedInFilter` shoots up quite considerably
if I don’t use serve_time I get 0 anyone knows why that may be? I currenly
have a rangeindex in the `serve_time` column and an invertedIndex +
partitioning on the user_id
**@luisfernandez:** ```You should also think about partitioning the incoming
data based on the dimension most heavily used in your filter queries.```
that’s what i see in the documentation about high `numEntriesScannedInFilter`
but i’m already partitioning by the user_id
**@g.kishore:** We have a new version of range index that will reduce the
latency @richard892 ^^
**@richard892:** hi @luisfernandez you can try the new range index by setting
`"rangeIndexVersion" : 2` in `"tableConfig"` you'd have to build off OSS
master though
**@richard892:** there's no automatic migration path yet because we want to
get some feedback first before we start automating the upgrade
**@mayanks:** BTW, this is available in latest master and will be part of the
next 0.9.0 release.
**@ss68374:** @ss68374 has joined the channel
**@hristo:** @hristo has joined the channel
**@bowenwan:** Hi. How do I know if star-tree index is ready ? There seems to
be no improvement. `numDocsScanned` remains the same. My index config and
query are like follow: ```"starTreeIndexConfigs": [ { "dimensionsSplitOrder":
[ "id", "A", "B", "C", "D" ], "functionColumnPairs": [
"DISTINCT_COUNT_HLL__id" ], "maxLeafRecords": 10000 } ]``` Query: ```SELECT
DISTINCTCOUNTHLL(id), A FROM MyTable WHERE B = 'a' GROUP BY A ORDER BY
DISTINCTCOUNTHLL(id) DESC LIMIT 20```
### _#docs_
**@ss68374:** @ss68374 has joined the channel
### _#pinot-dev_
**@lrhadoop143:** @lrhadoop143 has joined the channel
**@hristo:** @hristo has joined the channel
### _#community_
**@ss68374:** @ss68374 has joined the channel
### _#announcements_
**@ss68374:** @ss68374 has joined the channel
### _#roadmap_
**@ss68374:** @ss68374 has joined the channel
### _#getting-started_
**@lrhadoop143:** @lrhadoop143 has joined the channel
**@ss68374:** @ss68374 has joined the channel
**@kchavda:** When using ingestionconfig > transformconfigs, does it HAVE to
be a new column for the transformFunction? I would like to transform an
existing column from source and keep the same column name for the Pinot table
instead of creating a new column.
**@npawar:** it has to be a new name, you cannot transform a column and put
it into the same name
**@kchavda:** follow-up on this. I'm reading from a kafka topic which has
dates as EPOCH. I want to load as TIMESTAMP in my Pinot realtime table. Do I
have to use a transformfunction and load as a new column? @npawar
**@npawar:** when you say EPOCH, is it millisSinceEpoch? and what is the
format of TIMESTAMP?
**@kchavda:** I'm using debezium and using the time.precision.mode = connect
so it's in millisSinceEpoch. Would like to load it as yyyy-MM-dd HH:mm:ss'
(ex:2019-07-02 12:21:04).
### _#releases_
**@hristo:** @hristo has joined the channel
### _#minion-improvements_
**@ss68374:** @ss68374 has joined the channel
### _#kinesis_help_
**@abhijeet.kushe:** I did delete the OFFline table and removed the realtime
to offline task in QA but I still see the Task listed in Zookeeper console
.Also the controller keeps printing ``` No job to purge for the queue
TaskQueue_RealtimeToOfflineSegmentsTask``` I just restarted the entire cluster
I still see the above message.I have started Table again with shard iterator
at AT_SEQUENCE_NUMBER.I see the iterator is stuck at 66K seconds ago (since we
have 1 day retention).I have noticed iin the past if this iterator does not
shift for a long time.Will update later on if I dont see it change
**@abhijeet.kushe:**
**@abhijeet.kushe:**
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org