You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/02 02:00:21 UTC

Apache Pinot Daily Email Digest (2021-10-01)

### _#general_

  
 **@shish:** I have parquet data placed in the S3 under prefix such as . I
want to use these partitions(year, month, day) to filter data by partition
value in pinot. How can I do it?  
**@shish:** AWS Athena:  
**@mayanks:** Currently, you need to push data to Pinot to be able to query
it:  
**@shish:** Yes I am able to push parquet files but during push I want to
create partition based on S3 prefix (data is already partitioned in s3 and i
want to take benefit of that: ) eg here year, month and day  
**@shish:** In Athena during table creation we can pass partitioned by and it
will handle it. Please check scenario 1 in below doc. I am looking for a way
to do it in pinot.  
**@nanda.yugandhar:** This is general path format supported by spark, it won't
store folder path values (partition values like year, month, day, country) in
parquet file.  
**@nanda.yugandhar:** This also applies to CSV, Json, Text, ... not just
parquet  
 **@prabha.cloud:** Does pinot supports arm arch which can leverage AWS EC2
Gravtion 2 processor ?  
**@mayanks:** Pinot is built using Java, and runs on JVM. As long as you have
that.  
**@prabha.cloud:** will try in few mins  
**@prabha.cloud:** docker image needs to be available in arm64 along with
amd64  
**@mayanks:** I see. @xiangfu0 ^^  
**@prabha.cloud:** something like this docker pull trinodb/trino:362-arm64  
**@prabha.cloud:** quickstart works fine. will evaluate and let you know if
any issues, Thank you @mayanks  
**@mayanks:** You mean on arm64?  
**@prabha.cloud:** yes EC2 with Graviton 2  
**@xiangfu0:** this is interesting. you can also build the docker image by
yourself from the docker script:  
**@iamluckysharma.0910:** @iamluckysharma.0910 has joined the channel  
 **@nanda.yugandhar:** @nanda.yugandhar has joined the channel  
 **@son.nguyen.nam:** @son.nguyen.nam has joined the channel  
 **@dadelcas:** I'm looking through the code to see if I can load config
properties from the env vars instead of the files. It doesn't seems like this
is supported at the moment, Can someone confirm? This is particularly useful
for credentials  
**@mayanks:** I think @xiangfu0 had added a while back?  
**@xiangfu0:** for ingestion jobs, you can do that  
**@xiangfu0:** but not for pinot instances  
**@xiangfu0:**  
**@dadelcas:** Yup, I was referring to server configuration. I should rise a
github issue, may be worth discussing  

###  _#random_

  
 **@iamluckysharma.0910:** @iamluckysharma.0910 has joined the channel  
 **@nanda.yugandhar:** @nanda.yugandhar has joined the channel  
 **@son.nguyen.nam:** @son.nguyen.nam has joined the channel  

###  _#troubleshooting_

  
 **@amol:** Hii Pinot Team, my pinot cluster is running inside docker
container. I want to monitor the Pinot cluster with Prometheus and for that I
have tried to configure Prometheus JMX Exporter inside pinot-controller.conf ,
pinot-broker.conf and pinot-server.conf respectively like -
controller.jvmOpts=
"-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml
-Xms256M -Xmx1G" broker.jvmOpts=
"-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml
-Xms256M -Xmx1G" server.jvmOpts=
"-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml
-Xms256M -Xmx1G" But unable to get the metrics. what should I do? Kindly help.
@mayanks  
**@mayanks:** Hey @amol does this doc help:  
**@iamluckysharma.0910:** @iamluckysharma.0910 has joined the channel  
 **@nanda.yugandhar:** @nanda.yugandhar has joined the channel  
 **@son.nguyen.nam:** @son.nguyen.nam has joined the channel  
 **@gabuglc:** Hello, is there an optimal table config for upserts?. I'm able
to consume all my kafka topic without the upsert config (93M+ messsages).
However, when I put the upsert config on my table it stops consuming at a
certain offset around (23M+ messages)  
 **@gabuglc:**  
 **@qianbo.wang:** Hi pinot team, I’m getting this error `Catalog 'pinot' does
not support table property 'time_field'` when creating table with this query:
```CREATE IF NOT EXIST ... WITH ( pinot_table_name = 'enriched_invoices',
time_field = 'created_at', offline_replication = 3, offline_retention = 365,
index_inverted = ARRAY['licensee_id','facility_id'], index_bloom_filter =
ARRAY['licensee_id','facility_id'], index_sorted = 'created_at',
index_aggregate_metrics = true, index_create_during_segment_generation = true,
index_auto_generated_inverted = false, index_enable_default_star_tree =
false);```  
**@qianbo.wang:** I need to double check but I think it worked in 0.6.x
version but not right now in 0.8.x, were there any changes could cause this?  
**@mayanks:** Can you explain the query in Pinot to see what is the Pinot side
query?  
**@qianbo.wang:** Hi sorry, no worries, it is actually caused by an infras
change on our side  

###  _#feat-geo-spatial-index_

  
 **@kchavda:** @kchavda has joined the channel  
 **@kchavda:** Hi all, I'm working on creating a schema for a realtime table
(using Kafka) and have a geo column which is already formatted in the kafka
topic
```"location_st_point":{"wkb":"AQEAACDmEAAArS5MS1GXXcD0lychov5AQA==","srid":4326}```
Do I need to do a transform on this in the schema? ```{ "dataType": "BYTES",
"name": "location_st_point", "transformFunction":
"toSphericalGeography(point)" },```  
 **@kchavda:** Since not a lot of ppl in here I hope it's okay for  
**@yupeng:** If you do not transform it during ingestion, you can transform it
at query time  

###  _#pinot-dev_

  
 **@son.nguyen.nam:** @son.nguyen.nam has joined the channel  

###  _#getting-started_

  
 **@son.nguyen.nam:** @son.nguyen.nam has joined the channel  
 **@tiger:** Is there a way to view the number of nodes that are generated for
a star tree? (I'm exploring various indexing configs and was wondering how
different setups affects the storage and performance)  
**@tiger:** Also is there any documentation that goes more in depth about how
the star tree works and exactly what data is generated?  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org