You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/04/05 02:00:28 UTC

Apache Pinot Daily Email Digest (2022-04-04)

### _#general_

  
 **@satyam.raj:** @satyam.raj has joined the channel  
 **@satyam.raj:** Hi everyone! I’ve been doing POC on Pinot, and currently
facing issue while ingestion orc file data to pinot. Filed an GH issue as
well:  Can anyone help?  
**@kharekartik:** Hi can you add your schema and table config in the issue as
well. Do remove the secret values.  
**@satyam.raj:** Updated @kharekartik  
**@kharekartik:** To me it seems like, the column names in your orc file and
the column names in your schema file, do not match. They should be the same.  
**@satyam.raj:** How can I get the exact column name from the orc file?  
**@kharekartik:** `java -jar orc-tools-X.Y.Z-uber.jar meta your-file.orc`
should print the schema  
**@satyam.raj:** I guess the columns are named as `_col0, _col2` and so on  
**@kharekartik:** `java -jar orc-tools-1.5.5-uber.jar meta 000000_0` this
should work in your case  
**@kharekartik:** Can you paste the metadata you got from command here?  
**@satyam.raj:** ```➜ batchjob-spec java -jar orc-tools-1.5.5-uber.jar meta
000000_0 log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system
properly. log4j:WARN See  for more info. WARNING: An illegal reflective access
operation has occurred WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/Users/satyam.raj/dataplatform/pinot-dist/batchjob-spec/orc-
tools-1.5.5-uber.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil WARNING: Use
--illegal-access=warn to enable warnings of further illegal reflective access
operations WARNING: All illegal access operations will be denied in a future
release Processing data file 000000_0 [length: 8321467] Structure for 000000_0
File Version: 0.12 with HIVE_8732 Rows: 723010 Compression: ZLIB Compression
size: 262144 Type:
struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:string,_col7:string,_col8:string,_col9:string,_col10:string,_col11:string,_col12:string,_col13:int,_col14:int,_col15:int,_col16:string,_col17:string,_col18:date,_col19:date,_col20:date,_col21:string,_col22:string>
Stripe Statistics: Stripe 1: Column 0: count: 723010 hasNull: false Column 1:
count: 723010 hasNull: false min: 1000 max: 99999750 sum: 6114370 Column 2:
count: 723010 hasNull: false min: customer max: customer sum: 5784080 Column
3: count: 723010 hasNull: false min: Birmingham max: wollongong sum:
2285843```  
**@kharekartik:** Yep, then you will have to use same columnNames in schema.
If you want the new column names, you can use `transformConfigs` in table
config file  
**@satyam.raj:** alright, thanks! one more question. what should i be using as
datatype for the `date` fields in orc  
**@kharekartik:** long  
**@satyam.raj:** It worked :tada:  
 **@goyal3593:** @goyal3593 has joined the channel  
 **@zaikhan:** Does Pinot support using `pinot-jdbc-client` in JMeter and Perf
testing queries performance of Pinot using jmeter’s JDBC Request Sampler?  
**@mayanks:** Please use pinot-java-client for querying pinot for any perf
testing.  
 **@varun.j:** @varun.j has joined the channel  
 **@satyammast:** @satyammast has joined the channel  
 **@skondapalli:** @skondapalli has joined the channel  
 **@drojas:** @drojas has joined the channel  
 **@tonya:** Hey folks! @dunithd is doing a virtual meetup tomorrow on
analyzing IoT data with Apache Pinot and Kafka. Please register if you can
join us! :speaking_head_in_silhouette: :computer:  

### _#random_

  
 **@satyam.raj:** @satyam.raj has joined the channel  
 **@goyal3593:** @goyal3593 has joined the channel  
 **@varun.j:** @varun.j has joined the channel  
 **@satyammast:** @satyammast has joined the channel  
 **@skondapalli:** @skondapalli has joined the channel  
 **@drojas:** @drojas has joined the channel  

###  _#troubleshooting_

  
 **@jmeyer:** Hello all ^^ I was looking into Slack history, trying to find an
answer to my question - couldn't seem to find any so here I go We're doing
standalone (apache/pinot Docker image in a WF) batch integrations - and we're
seeing queries hitting Pinot before integrated data is available ("stale
data") My use case is that we're doing data integration, firing off a Kafka
event (after the `pinot-admin` step is finished), then querying Pinot, that's
where we're seeing stale data Is there any way to • Have `./bin/pinot-admin.sh
LaunchDataIngestionJob` wait for the data to be fully query-able ? • Have
Pinot somehow notify when data becomes fully query-able ? NOTE: Job type is
`SegmentCreationAndTarPush`  
**@mayanks:** Is this for production or for testing? If for testing, you
probably have some options: ```1. Wait for a fixed time (might be a bit
brittle). 2\. Wait for IS == EV (might need to write some checks for this).```  
**@jmeyer:** It is for production  
**@mayanks:** In production, what does it mean to be fully queryable? You will
have data constantly being pushed right?  
**@jmeyer:** We've got batch data that comes in, and some materialized view
that needs to be updated by taking into account the newly integrated data  
**@mayanks:** So you need atomic push? If so, @snlee added this feature?  
**@snlee:** @mayanks the building blocks are there but we need to implement
the client. Also, we support REFRESH only.  
**@jmeyer:** > So you need atomic push? If so, @snlee added this feature? >
Not quite Data -> Pinot job -> Kafka message saying ''new data available'' ->
Other service queries service backed by Pinot (which is ''stale'' until it
completes ingestion / indexing)  
**@snlee:** @jmeyer If you have realtime table, you won’t have this staleness
since the data will be updated in near realtime fashion whenever there’s new
data gets ingested. You will have this issue when you have offline table only.
We currently don’t provide a way to notify the new data being available. A
generic way to provide the functionality would be that we provide the
interface so that the user can provide the function that is executed after the
offline ingestion. Feel free to file the issue on github for the feature
request.  
**@jmeyer:** Yes my use case is with offline table What you propose sounds
like a good way to solve my issue, thanks @snlee, will do !  
 **@satyam.raj:** @satyam.raj has joined the channel  
 **@goyal3593:** @goyal3593 has joined the channel  
 **@ysuo:** Hi, team: ‘java.lang.IllegalArgumentException: must provide a
password for admin’ error occurred when I use ‘./bin/pinot-admin.sh
StartBroker -configFileName ./conf/pinot-broker-7011.conf’ to start Pinot
broker. I have the same config as listed in the example. I have started Pinot
controller according to the example config and it started successfully. Any
idea what’s wrong the the broker config?
pinot.broker.access.control.class=org.apache.pinot.broker.broker.BasicAuthAccessControlFactory
pinot.broker.access.control.principals=admin,user
pinot.broker.access.control.principals.admin.password=verysecret
pinot.broker.access.control.principals.user.password=secret  
**@mayanks:** From the code it seems like it is unable to find the password
for `admin` in the property. However, your settings in the conf seem correct.
Also, the exact same configs work in the integration test, so this is a bit
confusing. My guess is there is something else going on in your setup that is
causing the property to be not set?  
 **@varun.j:** @varun.j has joined the channel  
 **@satyammast:** @satyammast has joined the channel  
 **@skondapalli:** @skondapalli has joined the channel  
 **@drojas:** @drojas has joined the channel  

###  _#pinot-dev_

  
 **@haitao:** @haitao has joined the channel  

###  _#getting-started_

  
 **@satyam.raj:** @satyam.raj has joined the channel  
 **@goyal3593:** @goyal3593 has joined the channel  
 **@varun.j:** @varun.j has joined the channel  
 **@satyammast:** @satyammast has joined the channel  
 **@skondapalli:** @skondapalli has joined the channel  
 **@drojas:** @drojas has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org