You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/01/09 02:00:14 UTC

Apache Pinot Daily Email Digest (2021-01-08)

### _#general_

  
 **@pandey.mayuresh367:** @pandey.mayuresh367 has joined the channel  
 **@karinwolok1:** If you've ever wanted to get into the conference speaking
circuit, now is your chance (if you're using Kafka :smirk: ) :tada:*The Kafka
Summit Europe 2021 CFP is open*! :tada: The deadline is in 9 days. Go ahead
and submit a talk. You have nothing to lose -- only to gain the possibility of
being a thought leader in the innovations around real time analytics.
:wine_glass: :heart: If you need some feedback on your submission, I am happy
to help. Also, the Kafka Summit people are available for real-time feedback on
submissions in one of their slack channels. More details in link below.  
**@karinwolok1:** We also have a bunch of data-centric meetups groups globally
that are looking for speakers. If you're interested in presenting in one of
our future meetups (Pinot or others), please send me a DM :dancer:  

###  _#random_

  
 **@pandey.mayuresh367:** @pandey.mayuresh367 has joined the channel  

###  _#feat-text-search_

  
 **@gamparohit:** @gamparohit has joined the channel  

###  _#feat-presto-connector_

  
 **@gamparohit:** @gamparohit has joined the channel  

###  _#pql-2-calcite_

  
 **@gamparohit:** @gamparohit has joined the channel  

###  _#troubleshooting_

  
 **@yash.agarwal:** Is there a way we can do mode calculation in Pinot ?  
**@g.kishore:** whats the use case? what do you want the result to be if there
are multiple modes  
**@yash.agarwal:** For our business purpose, we want the smallest value.  
**@g.kishore:** dont think we have a udf for that right now  
**@g.kishore:** two work arounds  
**@g.kishore:** • select count(*) as count, x from T order by count asc top 10  
**@yash.agarwal:** It would be hard for us to split the query into multiple
queries, would it be possible to create a udf for the same ?  
**@g.kishore:** yes,  
 **@yash.agarwal:** If for a table we have set `nullHandlingEnabled` as true,
and we do distinct count on a column that has nulls, does it filter out the
null values and only show count of non null distinct values ?  
**@g.kishore:** you need to add column != NULL as of now  
**@yash.agarwal:** That would be difficult as we would have other aggregations
in the same query which are not filtered.  
**@yash.agarwal:** I would assume we would be able to add a udf to handle the
same as well ?  
**@yash.agarwal:** or should we change the way distinct count / all other
aggregations work when null handling is enabled.  
**@g.kishore:** yes, but checking for null in the udf will make hurt
performance. you can use defaultNullValue and filter it out on the client side  
**@g.kishore:** the problem is its not clear what should be the default
behavior  
**@yash.agarwal:** we could potentially filter it out, but when it comes to
aggregations like distinct count, we dont have a way to be certain if the
aggregation counted the null/default value or not and might skew our metrics.  
 **@pandey.mayuresh367:** @pandey.mayuresh367 has joined the channel  
 **@pabraham.usa:** @pabraham.usa has joined the channel  

###  _#pinot-s3_

  
 **@pabraham.usa:** @pabraham.usa has joined the channel  
 **@pabraham.usa:** Hello, Is Pinot deep storage S3 is in query path. So that
I could store data in S3 and query as normal?  

###  _#pinot-perf-tuning_

  
 **@elon.azoulay:** pr to use java11:  it's been working for us. We also use
`-XX:SoftRefLRUPolicyMSPerMB=0` to fix the issue where soft references are not
sufficiently cleaned up in gc.  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org