You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/05/19 03:05:29 UTC

Apache Pinot Daily Email Digest (2022-05-18)

### _#general_

  
 **@gunnar.morling:** @gunnar.morling has joined the channel  
 **@harrysingh.nitj:** @harrysingh.nitj has joined the channel  
 **@harrysingh.nitj:** Hi , Wanted to understand what should we do in regards
to handling "nulls" in aggregation queries. So Pinot saves default values
instead of nulls but it will effect the final result where the default value
coincides with a data point, how are other folks handling this and what can we
do here?  
**@diogo.baeder:** At the company I work for we use null as the null value,
and then it's possible to aggregate values filtered by not null.  
**@mayanks:** If you use default nulls, you will need to filter them out.
There is also native null support, but it does not yet support groups-by to
filter them out  
**@diogo.baeder:** @mayanks but we can use `WHERE x IS NOT NULL` to filter
them out even when grouping the non-null values, right?  
**@g.kishore:** You can also use filter in aggregation function  
**@mayanks:** @diogo.baeder I meant implicit filtering. Yes you can always use
explicit Filter clause  
**@diogo.baeder:** Ah, ok then :slightly_smiling_face:  
 **@sebastian:** @sebastian has joined the channel  
 **@thunderwav:** @thunderwav has joined the channel  
 **@apoorvupadhyaya07:** @apoorvupadhyaya07 has joined the channel  
 **@abanwasi:** @abanwasi has joined the channel  
 **@sunhee.bigdata:** Hi all, when stopping components of pinot (controller,
broker, server) using admin script, sometimes component is not stopped and
even is not logged. ```apache-pinot-0.9.3-bin/bin/pinot-admin.sh StopProcess
-controller/-server/-broker ``` How do I know why it's not shutting down when
executing stop command ? Also, is there any other way to stop pinot component
safely? Thank you in advance.  
**@mayanks:** Do you see anything in the logs?  
**@sunhee.bigdata:** @mayanks when executing stop server command , it shows
like this. but server is still running . and it is not logged in pinot server
log file.  

###  _#random_

  
 **@gunnar.morling:** @gunnar.morling has joined the channel  
 **@harrysingh.nitj:** @harrysingh.nitj has joined the channel  
 **@sebastian:** @sebastian has joined the channel  
 **@thunderwav:** @thunderwav has joined the channel  
 **@apoorvupadhyaya07:** @apoorvupadhyaya07 has joined the channel  
 **@abanwasi:** @abanwasi has joined the channel  

###  _#troubleshooting_

  
 **@gunnar.morling:** @gunnar.morling has joined the channel  
 **@dadelcas:** I'm still not able to get this sorted, if anyone has any
pointers I'll appreciate it  
 **@harrysingh.nitj:** @harrysingh.nitj has joined the channel  
 **@sebastian:** @sebastian has joined the channel  
 **@lars-kristian_svenoy:** Hello team :wave: . I have a problem that I'm
trying to think through. I added a bunch of thoughts around it in my thread
above, but I'll summarise under this thread. Any help appreciated.  
**@lars-kristian_svenoy:** Consider the following segment partitioning
configuration ```"segmentPartitionConfig": { "columnPartitionMap": {
"customerId": { "functionName": "Murmur", "numPartitions": 10 } } }```
customerId gets logically partitioned into 10 buckets, but in my case, some of
these buckets are hotter than others, as some customers produce a lot more
data than others. This means that during batch ingestion, the entire pipeline
gets congested (using flink, the flink connector can only do one segment at a
time currently). Instead of having a 1:1 relationship of Partition -> Sink, I
could have multiple sinks per partition. I am currently experimenting with
this using the following approach. Let's say I have 10 partitions, and 40
sinks. With this configuration, I could assign 4 sinks per actual segment
partition. The way to achieve this would be ```var sinksPerPartition = 40 /
10; // 4 var primaryPartition = murmur2(document.getCustomerId()) % 10; // 0 -
9 var secondaryPartition = murmur2(document.getRecordId()) %
sinksPerPartition; var finalPartition = primaryPartition * sinksPerPartition +
secondaryPartition;``` This would assign 4 sinks per actual partition,
allowing me to scale ingestion for large customers. However, there is one
problem, and that is that for small customers, the amount of records per
segment might end up very small. First of all, would this approach work in the
first place? I guess even though this may work, it probably would be better to
change the flink sink () to be able to build multiple segments concurrently,
as this would have virtually the same benefit without having the limitation of
potentially ending up with very small segments for some partitions.  
**@lars-kristian_svenoy:** Any thoughts on this?  
**@mayanks:** Interesting. Both approaches make sense to me.  
 **@zhixun.hong:** Hello team. I'm trying to integrate my pinot data into
ThirdEye. I've already uploaded csv data into Pinot dataset, but can't have a
way to ingest it in ThirdEye. I tried this in data-sources-config.yml. ```#
Please put the mock data source as the first in this configuration.
dataSourceConfigs: \- className:
org.apache.pinot.thirdeye.datasource.pinot.PinotThirdEyeDataSource properties:
zookeeperUrl: 'localhost:2181' clusterName: 'PinotCluster'
controllerConnectionScheme: 'http' controllerHost: '127.0.0.1' controllerPort:
9000 cacheLoaderClassName:
org.apache.pinot.thirdeye.datasource.pinot.PinotControllerResponseCacheLoader
metadataSourceConfigs: \- className:
org.apache.pinot.thirdeye.auto.onboard.AutoOnboardPinotMetadataSource``` And I
got an error below. ```2022-05-17 18:26:33.104 [main] INFO
org.apache.pinot.thirdeye.datalayer.util.DaoProviderUtil - Using existing
database at 'jdbc:mysql:///thirdeye?autoReconnect=true' May 17, 2022 6:26:37
PM org.apache.tomcat.jdbc.pool.ConnectionPool init SEVERE: Unable to create
initial connections of pool.
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could
not create connection to database server. Attempted reconnect 3 times. Giving
up.``` I asked about this in other channels, but didn't get any answer. Any
guidance will be really helpful.  
 **@thunderwav:** @thunderwav has joined the channel  
 **@kathuriakritihp:** Hi all, I am having trouble running Pinot 0.11. I get
the following error, and would appreciate any help ```Failed to start a Pinot
[CONTROLLER] at 7.722 since launch java.lang.NullPointerException: Cannot
invoke "java.lang.reflect.Method.invoke(Object, Object[])" because
"com.sun.xml.bind.v2.runtime.reflect.opt.Injector.defineClass" is null```  
**@kathuriakritihp:** Full log:  
**@kathuriakritihp:** I tried adding versions to the dependencies like below,
but that did not fix the error ```<dependency>
<groupId>javax.xml.bind</groupId> <artifactId>jaxb-api</artifactId>
<version>2.3.0</version> </dependency> <dependency>
<groupId>com.sun.xml.bind</groupId> <artifactId>jaxb-core</artifactId>
<version>2.3.0</version> </dependency> <dependency>
<groupId>com.sun.xml.bind</groupId> <artifactId>jaxb-impl</artifactId>
<version>2.3.0</version> </dependency>```  
**@npawar:** is this from IDE? we were seeing similar issues recently, and the
problem ended up being that the java used from IDE run profile was 17. can you
check that?  
**@kathuriakritihp:** No. But my system is using Java 17.  
**@kathuriakritihp:** Missed mentioning this detail. Pinot 0.10 was working
fine with Java 17 though  
**@npawar:** hmm, to get that out of the way, would you be able to confirm if
it works with java 11?  
**@npawar:** then we can check the regression  
**@kathuriakritihp:** I compiled it with java 11  
**@kathuriakritihp:** Got a new error. Posting a bit.  
**@kathuriakritihp:**  
**@kathuriakritihp:** Mine is the M1 mac. So to ensure that is not the
culprit, currently building on an ubuntu machine.  
**@kathuriakritihp:** It shouldn't be. Because I was able to run 0.10 on my
mac, with java 17 :woman-shrugging:  
**@kathuriakritihp:** This worked with JDK 11 and on Ubuntu :thumbsup:  
**@npawar:** @kharekartik @haitao there was something that you did to make
this work on M1 right?  
**@haitao:** it's for building M1 binary:  
**@apoorvupadhyaya07:** @apoorvupadhyaya07 has joined the channel  
 **@abanwasi:** @abanwasi has joined the channel  

###  _#getting-started_

  
 **@gunnar.morling:** @gunnar.morling has joined the channel  
 **@harrysingh.nitj:** @harrysingh.nitj has joined the channel  
 **@sebastian:** @sebastian has joined the channel  
 **@thunderwav:** @thunderwav has joined the channel  
 **@apoorvupadhyaya07:** @apoorvupadhyaya07 has joined the channel  
 **@abanwasi:** @abanwasi has joined the channel  

###  _#introductions_

  
 **@gunnar.morling:** @gunnar.morling has joined the channel  
 **@harrysingh.nitj:** @harrysingh.nitj has joined the channel  
 **@sebastian:** @sebastian has joined the channel  
 **@thunderwav:** @thunderwav has joined the channel  
 **@apoorvupadhyaya07:** @apoorvupadhyaya07 has joined the channel  
 **@abanwasi:** @abanwasi has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org