You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/08/27 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-08-26)

### _#general_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  
 **@zixi.bwang:** @zixi.bwang has joined the channel  
 **@qianbo.wang:** Hi Pinot team, what do you recommend to optimize a range
search on a “date” column? Specifically, we have a column `created_at` which
records the epoch time in seconds, we have a use case needs to search for
entries created in the last 30 days, 60 days, 90 days, etc. I wonder if there
is a best practice I could follow to optimize this, for example: 1\. What type
is best for this `created_at` column? Timestamp, Long, etc.? 2\. What kind of
indexing would help? range index? 3\. anything else could help? Thanks in
advance!  
**@g.kishore:** our typical recommendation is to start with no indexes.. try
out the query first and looking at the response metadata, we can guide you on
the indexing.. You can add/remove indexes dynamically after data is ingested  
**@qianbo.wang:** @g.kishore Thanks. What data type do you suggest for
`created_at` for this kind of use case considering we need to query something
like “in the past 30 days”? we could achieve it by using epoch time in seond
like `where created_at > timestamp_30_days_before and created_at <
current_timestamp` . However I wonder if there is a better way?  
**@ken:** For a similar situation we use an int field that represents days.
And if the segment data is sorted by this field, you’ll automatically get a
sorted index, which should make range queries fast.  
**@qianbo.wang:** the number of days since 1970-1-1?  
**@qianbo.wang:** @ken ^  
**@ken:** yes  
**@ken:** If you need timezone-aware query by days then resolution would have
to be in hours, not days (for most timezones, other than whacky timezones like
Nepal (15 minutes off) :slightly_smiling_face:  
**@qianbo.wang:** very interesting. Thanks very much!  
**@g.kishore:** please use long or int.. my recommendation is to have it
milliseconds (long) but round it to nearest day/hour etc  
**@yupeng:** Does segment pruner take this into account by default or it needs
some config?  
**@g.kishore:** pruner uses it automatically  
**@yupeng:** :+1:  

### _#random_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  
 **@zixi.bwang:** @zixi.bwang has joined the channel  

###  _#troubleshooting_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  
 **@will.gan:** Hi, does anyone know why I might have a weird / corrupted
ideal state after moving a realtime tables to a different set of servers (via
rebalance)? The table looks fine from the UI (all the segments are Good, it
says they are all on the new servers, it can be queried) but I think there are
issues when I try to do things like reload all the segments.  
**@will.gan:** Screenshot of the ideal state  
**@npawar:** this happens when the znode gets too large. @mayanks i remember
you were discussing this with someone else on this channel the other day? do
we have a resolution/issue created?  
**@mayanks:** Yes, this happens as large znodes get compressed. The UI has
been enhanced to handle compressed nodes in the latest master cc @xiangfu0  
**@xiangfu0:** yes:  
**@will.gan:** ok I see thanks guys  
 **@zixi.bwang:** @zixi.bwang has joined the channel  
 **@ken:** Is anyone else using Pinot’s Hadoop map-reduce support for building
segments? Asking because after switching to 0.8 (from 0.7.1) it no longer
works (issue with not finding HDFS plugin), plus some other odd issues.  
**@mayanks:** @jlli ^^  
**@jlli:** LinkedIn uses the one in `/pinot-plugins/pinot-batch-
ingestion/v0_deprecated/pinot-hadoop` dir  
**@jlli:** And the latest version we’re using is
`de2f0e04dca8130a09ea902787a75997b70cc16d` in github repo.  
**@ken:** Thanks @jlli  
**@g.kishore:** whats the issue?  

###  _#pinot-dev_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  

###  _#community_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  

###  _#announcements_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  

###  _#pinot-perf-tuning_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  

###  _#getting-started_

  
 **@albertobeiz:** @albertobeiz has joined the channel  
 **@thiago.pereira.net:** @thiago.pereira.net has joined the channel  
 **@tiger:** By default, is the broker supposed to limit queries to 10
results?  
**@tiger:** In order to get more than 10 results, I have to add a LIMIT in my
query. One strange thing I noticed is that as I arbitrarily increase this
limit to something like LIMIT 100000000, the query becomes significantly
slower even though it still only returns 20 rows or so. Just wondering if this
is expected behavior?  
**@npawar:** is this an ORDER BY query?  
**@tiger:** I see the same behavior with both ORDER BY and without it.  
**@npawar:** but for GROUP BY or just plain selection queries also ?  
**@tiger:** This seems to be the case for GROUP BY only  
**@npawar:** what version are you on?  
**@npawar:** looping in @jackie.jxt for more insights  
**@jackie.jxt:** Please check the version. I remember fixing an issue recently
where we pre-allocate the buffer based on the limit  

###  _#releases_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  

###  _#pinot-docsrus_

  
 **@farheenanjumb786:** @farheenanjumb786 has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org