You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Pinot Slack Email Digest <> on 2021/03/18 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-03-17)

### _#general_

 **@pranasblk:** @pranasblk has joined the channel  
 **@chad.preisler:** I need to transform an encrypted Kafka message before
Pinot processes it. Right now for our stream apps we use a custom serde to do
it. How can I do it in Pinot? Looks like it would be fairly easy to change
Pinot to allow a deserializer to be plugged in. Thoughts?  
**@g.kishore:** Is the custom serde on related to the format  
**@g.kishore:** Pinot has the ability to write a decoder  
**@chad.preisler:** Does the decoder get the raw message from the topic? We
will need the entire message unaltered. If so are there any docs on how to use
a custom decoder?  
 **@chad.preisler:** Seems like Pinot is stuck on an older version of the JDK
due to its use of off memory heap APIs that no longer exist. The code does not
compile on JDK 15. Also the “shade” plugin does not work on JDK 15. I read JDK
16 has some new methods for using off heap memory. Is there a plan to move to
a modern JDK? Is off heap even necessary now that ZGC can handle 16TB of heap
with little to no pause time?  
**@g.kishore:** We plan to deprecate support for Java 8 and support newer
JDKs. We still need off heap but we can definitely benefit from ZGC  
**@chad.preisler:** I’m curios, what is the reason for off heap? Is it the
size limitation on arrays?  
**@chad.preisler:** Also does Pinot build with JDK 15/16? I had trouble
building with all tests using JDK 15. The. first issue I saw was with
 **@humengyuk18:** Is there a way to specify the group id for Kafka realtime
ingestion? What’s the ingestion config key should be?  
**@fx19880617:** low-level consumer mode is per-partition based, so no need to
specify group-id  
**@humengyuk18:** Is there a way to monitor the latency for consuming when
using the low-level consumer?  
**@fx19880617:** if u want to specify the group id you can try:
**@fx19880617:** basically whatever after `stream.[streamType].consumer.prop.`
will be put into kafka consumer configs  
**@humengyuk18:** I see, so the group id has no effect on low-level consumer?  
**@fx19880617:** no, but it may help report the consuming latency  
**@fx19880617:** just you cannot use it to reset the consumer  
**@humengyuk18:** I specified the group id using
`` , it’s not working, that group id is not
shown in kafka.  
**@fx19880617:** I see, then in that case, we need to implement the API
internally to fetch the delay  
 **@ronak:** I was exploring TEXT_MATCH functionality with pinot-0.7.0/0.6.0
and had configured one of the columns for it. Is there any configuration for
the refresh time interval for the index -  After enabling indexing (with
`index type: Text` and `encoding type: RAW`) on the column and doing
TEXT_MATCH, I was first getting an empty result, but after some time, I was
getting the result. So, what is the initial delay for such a column to be
searchable? Is any settings/configuration (e.g num of docs, indexed size, etc)
for the same?  
**@steotia:** Hi Ronak, The refresh threshold isn't yet configurable. We can
make it though. However, there was a bug I recently fixed. It was causing the
lag. Just realized it's in my branch. Will create a PR  
**@rishbits1994:** So by design/indexing there won’t be any significant lag?  
 **@brianolsen87:** @brianolsen87 has joined the channel  
 **@ali:** @ali has joined the channel  
 **@brianolsen87:** Hey all :wave: Just jumping into this awesome tech called
Pinot! I'm a developer advocate from the Trino project (). Tomorrow we're
having an episode of the  with @fx19880617 and @elon.azoulay about . We're
covering the benefits of Trino + Pinot and why you really need Pinot to speed
up your common aggregation queries for predictable response times but also
gaining the benefit of federated queries over your data lake or other data
sources. We'll cover a bit of the specific limitations and current work going
on in the Trino-Pinot connector, and finally i'll run a simple demo with the
connector! Come watch me crash my docker containers @11am EDT on .  
**@srini:** don’t miss Brian’s guitar solo at the beginning :musical_note:  
**@joshhighley:** will you be discussing the 50000 row limit (evaluated rows,
not returned rows)  Given the size of the datasets intended for Pinot and
Trino, this seems like a _really_ low default limit  
**@elon.azoulay:** Yep  
**@brianolsen87:** @joshhighley We'll be discussing  tommorrow and
@elon.azoulay has a pretty neat solution coming in future versions of Trino.
See you all tomorrow @11am EDT! :rabbit2::rabbit2:  
**@ali:** :wave: Pinot community! I work with the Presto community, wanted to
share PrestoCon Day is next week, feat. some great Pinot talks - @fx19880617
is doing a session on Realtime Analytics with Presto & Apache Pinot and
@g.kishore is on The Presto Ecosystem Panel. Super excited for the event -
it's virtual & reg is free - hope to see many of you :wine_glass: there
**@asif:** @asif has joined the channel  
 **@brianolsen87:** @joshhighley We'll be discussing  tommorrow and
@elon.azoulay has a pretty neat solution coming in future versions of Trino.
See you all tomorrow @11am EDT! :rabbit2::rabbit2:  
**@amherman:** @amherman has joined the channel  

###  _#random_

 **@pranasblk:** @pranasblk has joined the channel  
 **@brianolsen87:** @brianolsen87 has joined the channel  
 **@ali:** @ali has joined the channel  
 **@asif:** @asif has joined the channel  
 **@amherman:** @amherman has joined the channel  

###  _#troubleshooting_

 **@pranasblk:** @pranasblk has joined the channel  
 **@ravi.maddi:** *Hi All,* I started quick-start-batch instead quick-start-
streaming, how to stop quick-start-batch, any idea?  
**@fx19880617:** ```➜ ps wuax|grep "" xiangfu
79906 100.1 3.0 9461188 1002132 s007 R+ 12:57AM 0:56.62 /Library/Internet
Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -Xms1G -Xmx1G
incubating-0.6.0-bin/plugins -classpath /Users/xiangfu/Downloads/apache-pinot-
-Dbasedir=/Users/xiangfu/Downloads/apache-pinot-incubating-0.6.0-bin ➜ kill 79906```  
**@fx19880617:** also `ctrl` + `c` if you are on the same terminal session  
**@ravi.maddi:** Thanks @fx19880617 or bin/ StopProcess
-controller -server -broker -zooKeeper  
 **@ravi.maddi:** Hi All, I am trying to create schema, but I am getting
error, this: bin/ AddTable -tableConfigFile $PDATA_HOME table-
config.json -schemaFile schema-config.json -controllerPort 9000 -exec
Executing command: AddTable -tableConfigFile table-config.json -schemaFile
schema-config.json -controllerProtocol http -controllerHost
-controllerPort 9000 -exec Sending request:  to controller: localhost,
version: Unknown Got Exception to upload Pinot Schema: myschema
org.apache.pinot.common.exception.HttpErrorStatusException: Got error status
code: 400 (Bad Request) with reason: "Cannot add invalid schema: myschema.
Reason: null" while sending request:  to controller: localhost, version:
Unknown at
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] Need
Help :slightly_smiling_face: Table Config: ```{ "tableName": "mytable",
"tableType": "REALTIME", "segmentsConfig": { "timeColumnName":
"_source.sDate", "timeType": "MILLISECONDS", "schemaName": "myschema",
"replicasPerPartition": "1" }, "tenants": {}, "tableIndexConfig": {
"loadMode": "MMAP", "streamConfigs": { "streamType": "kafka",
"stream.kafka.consumer.type": "lowlevel", "":
"mytopic", "":
"": "localhost:9876",
"realtime.segment.flush.threshold.time": "3600000",
"realtime.segment.flush.threshold.size": "50000",
"": "smallest" } }, "metadata": {
"customConfigs": {} } }``` And Schema Config: ``` { "schemaName": "myschema",
"eventflow": [ { "name": "_index", "dataType": "INT" }, { "name": "_type",
"dataType": "STRING" }, { "name": "id", "dataType": "INT" }, { "name":
"_source.madids", "datatype": "INT", "singleValueField": false }, ],
"dateTimeFieldSpecs": [ { "name": "_source.sDate", "dataType": "STRING",
"format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity":
"1:DAYS" } ] }```  
**@ravi.maddi:** Need help Friends  
**@ken:** What’s the `eventflow` field? Shouldn’t that be named
`dimensionFieldSpecs`? And without a `metricFieldSpecs` array of fields for
doing aggregations, I’m not sure how you’re going to effectively use Pinot :)  
**@chinmay.cerebro:** @ravi.maddi here's a good reference for creating a
schema:  (sample schema included)  
**@chinmay.cerebro:** In terms of the "Reason:null " I don't see that on the
latest master, I'm investigating to see why this could've happened on 0.7  
 **@matteo.santero:** Hello, is there a document that explains the “cutoff”
time in detail for the data handled by time (and pk in case)? I am asking
because it seems that I ve a record that is present in both OFFLINE and
REALTIME (with same “primary key”) But when I am looking for it in the final
table I am not finding it at all. OFFLINE record TIME 1615891108000 (ms) — max
1615939199415 — min 1612137600000 REALTIME record TIME 1615723114000(ms) — max
1615981903000 — min 1515496517000 FINAL record not present — max 1615981930000
— min 1612137600000  
**@mayanks:** You can refer to time boundary:  
**@mayanks:** @jackie.jxt are there any limitations/issues with time boundary
computation when time unit is millis?  
**@matteo.santero:** Thank you for the doc  
**@jackie.jxt:** There is no limitations. @matteo.santero Does the OFFLINE
table has the same records as the REALTIME table for the overlapping time?
They should be the same in order to return the correct result  
**@matteo.santero:** offline and realtime have a set of records that are
overlapped during the time, in that overlap some record can have same primary
key but different data and different time  
**@matteo.santero:** in this case the pk was the same in both, the data and
the time was different and the result in the “final” one was empty  
**@jackie.jxt:** I don't fully follow here. Because you mentioned primary key
here, I assume you enabled upsert for the table? Upsert only works on realtime
only table, but not hybrid table. Also, Why does the real-time records have
much wider span than offline records? FYI, here are the definition of the
hybrid table:  
**@yash.agarwal:** Hello, Is there any performance difference between the
following two queries for pinot `select distinct city from transactions limit
100000` `select city from transactions group by city limit 100000`  
**@ken:** My guess was no, as implementation-wise it’s a similar operation.
But just for grins I tried it on a large dataset (1.7b records) and got
similar performance. I’d guess that memory usage would also be similar.  
**@mayanks:** Hey distinct and group by use different engines internally, even
though semantically they mean the same thing and might end up doing similar
amount of work.  
 **@brianolsen87:** @brianolsen87 has joined the channel  
 **@ali:** @ali has joined the channel  
 **@qiaochu:** @qiaochu has joined the channel  
 **@ujwala.tulshigiri:** @ujwala.tulshigiri has joined the channel  
 **@qiaochu:** hello team, i got some error after rebased lastest pinot master
when running `mvn test` I think it’s related to the dependency:
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test mvn clean install
works perfectly. Is there a solution to fix this error? ```[ERROR] Failed to
execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test
(default-test) on project pinot-spi: There are test failures. [ERROR] [ERROR]
Please refer to /Users/qiaochu/Fork/incubator-pinot/pinot-spi/target/surefire-
reports for the individual test results. [ERROR] Please refer to dump files
(if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] The forked VM terminated without properly saying goodbye. VM crash or
System.exit called? [ERROR] Command was /bin/sh -c cd
/Users/qiaochu/Fork/incubator-pinot/pinot-spi &&
pinot/pinot-spi/target/jacoco.exec -Xms4g -Xmx4g -jar
2021-03-17T10-07-59_551-jvmRun1 surefire6422534819305887743tmp
surefire_05875304854941226633tmp [ERROR] Error occurred in starting fork,
check output in log [ERROR] Process Exit Code: 134 [ERROR]
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM
terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/qiaochu/Fork/incubator-pinot/pinot-
spi &&
pinot/pinot-spi/target/jacoco.exec -Xms4g -Xmx4g -jar
2021-03-17T10-07-59_551-jvmRun1 surefire6422534819305887743tmp
surefire_05875304854941226633tmp [ERROR] Error occurred in starting fork,
check output in log [ERROR] Process Exit Code: 134 [ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] at org.apache.maven.DefaultMaven.doExecute(
[ERROR] at org.apache.maven.DefaultMaven.doExecute(
[ERROR] at org.apache.maven.DefaultMaven.execute(
[ERROR] at org.apache.maven.cli.MavenCli.execute( [ERROR] at
org.apache.maven.cli.MavenCli.doMain( [ERROR] at
org.apache.maven.cli.MavenCli.main( [ERROR] at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at
[ERROR] at
[ERROR] at java.base/java.lang.reflect.Method.invoke( [ERROR]
[ERROR] at
[ERROR] at
[ERROR] at
[ERROR] [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the
errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X
switch to enable full debug logging. [ERROR] [ERROR] For more information
about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1]  [ERROR] [ERROR] After correcting the problems, you can
resume the build with the command [ERROR] mvn <args> -rf :pinot-spi```  
**@fx19880617:** this is issue of the test jar using java 11  
**@fx19880617:** change jdk to 8 should solve it  
**@qiaochu:** gotcha, thanks @fx19880617! recently i updated java 11 for
another project. I will change back  
**@fx19880617:** sure, I’m thinking of upgrade project jdk from 8 to 11  
**@fx19880617:** but need some time :stuck_out_tongue:  
**@qiaochu:** gotcha, thanks for information!  
 **@asif:** @asif has joined the channel  
 **@amherman:** @amherman has joined the channel  

###  _#pinot-dev_

 **@ravi.maddi:** @ravi.maddi has joined the channel  
 **@ravi.maddi:** *Hi All,* I am getting an error while stating zookeeper with
pinot admin. zookeeper state changed (SyncConnected) Waiting for keeper state
SyncConnected Terminate ZkClient event thread. Session: 0x10003506d770000
closed Start zookeeper at localhost:2181 in thread main EventThread shut down
for session: 0x10003506d770000 Expiring session 0x10002b33f080005, timeout of
30000ms exceeded Expiring session 0x10002b33f080006, timeout of 30000ms
exceeded Expiring session 0x10002b33f080007, timeout of 30000ms exceeded
Expiring session 0x10002b33f080004, timeout of 30000ms exceeded Expiring
session 0x10002b33f080008, timeout of 30000ms exceeded Expiring session
0x10002b33f080002, timeout of 30000ms exceeded Expiring session
0x10002b33f08000b, timeout of 60000ms exceeded I am facing this issue from
yesterday morning. And becouse of zookeeper not ready, other components also
not working properly. *Need Help* :slightly_smiling_face:  
**@fx19880617:** how do you start zookeeper?  
**@ravi.maddi:** _cd /home/ubuntu/pinot_ _bin/ StartZookeeper
-zkPort 2181_  
**@fx19880617:** it’s on ur local linux  
**@fx19880617:** which java version?  
**@fx19880617:** can you also check if you can start zk on a different port?  
**@ravi.maddi:** openjdk version "1.8.0_282"  
**@ravi.maddi:** I check with ps command but I found process running,  
**@fx19880617:** can you kill it then rerun ?  
**@fx19880617:** it could be the port is occupied  
**@ravi.maddi:** i restated the server and tried, same issue again and again.  
**@ravi.maddi:** even no luck  
**@ravi.maddi:** I started quick-start-batch instead quick-start-streaming,
how to stop quick-start-batch, any idea?  
**@ravi.maddi:** stop cluster  
**@fx19880617:** kill the process  
**@fx19880617:** ctrl+c  
**@fx19880617:** you can run
bin/ StartZookeeper -zkPort 2181```  
**@fx19880617:** and check the detailed error log  
**@ravi.maddi:** sure, I will check now  
**@ravi.maddi:** ctrl +c -- is to stop the cluster?  
**@fx19880617:** yes  
**@ravi.maddi:** thanks, stoped  
 **@fx19880617:** wanna start a session on this topic:  
**@fx19880617:** upgrade JDK from 8 to 11(?)  
**@dlavoie:** I guess 11 is the only realistic target since only 17 will be
the next LTS  
**@g.kishore:** does this require a code change and it will make it
incompatible with jdk8?  
**@dlavoie:** Yes and yes, lots of removed modules from Java 11.  
**@fx19880617:** Yes, it requires code changes  
**@fx19880617:** I’m on this PR() to make JDK 11 pass the tests  

###  _#community_

 **@ali:** @ali has joined the channel  
 **@slatermegank:** @slatermegank has joined the channel  

###  _#announcements_

 **@ali:** @ali has joined the channel  

###  _#getting-started_

 **@slatermegank:** @slatermegank has joined the channel  

###  _#segment-write-api_

 **@yupeng:** updated the doc. PTAL  
 **@chinmay.cerebro:** will do  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: For additional commands,