You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/05/21 02:00:20 UTC

Apache Pinot Daily Email Digest (2021-05-20)

### _#general_

  
 **@rraguram:** @rraguram has joined the channel  
 **@neilteng233:** Hey, do we have any example querying/working on the multi-
value column? I cannot find one in the document. e.g. get the first element in
the multi-val column.  
**@mayanks:** How do you want to use it? Note that it not necessarily be
ordered.  
**@neilteng233:** I see. I have a column with a list of addresses of the
customer, where each one is a json.  
**@mayanks:** @jackie.jxt do we preserve MV column order? Rather should client
rely on ordering?  
**@jackie.jxt:** We do preserver the order in MV column  
**@jackie.jxt:** Currently we don't have a function to pick the first element
within a MV column, but it should be easy to add or plug in  
**@neilteng233:** The ordering is not important. I want to know how to fetch
the first or the second element in the MV column. Do we have a pinot function
to do that?  
**@jackie.jxt:** In `ArrayFunctions` we have several functions that applies to
array. You may plug-in your own function with `ScalarFunction` annotation  
**@neilteng233:** Thanks, I will check that out. One more question, do you
happen to know the behavior of trino/prestodb when it query a MV column? Any
function I should applied on the result to be recognized as a array in trino?  
**@jackie.jxt:** @fx19880617 ^^  
**@npawar:** You could use a groovy function to fetch a particular element
from the array  
**@mayanks:** I think the ask is for fetching it during query time.  
**@npawar:** we have groovy for query time too  
**@fx19880617:** Prestodb should treat mv column as an array  
**@npawar:** several examples of getting elements from MV columns using
groovy:  
**@mayanks:** Thanks @npawar  
 **@karinwolok1:** :wine_glass: New to Apache Pinot and want to understand the
basics? Join us for Intro to Apache Pinot meetup today! 10am PDT | 1pm EDT
:slightly_smiling_face:  
**@allison:** @allison has joined the channel  
 **@jcwillia:** @jcwillia has joined the channel  
 **@sgarud:** @sgarud has joined the channel  
 **@vbondugula:** @vbondugula has joined the channel  
 **@kelvin:** Hi, i'm streaming from Kafka and would like to have a way to
uniquely identify messages. In Kafka consumer, I can do that with offset. Is
it possible to expose Kafka metadata such as offset/timestamp to Pinot
clients?  
**@mayanks:** I suppose you could write a transform function that reads that
metadata and populates columns in Pinot schema.  
**@fx19880617:** actually I found it’s a good ask which can be part as a
hidden column for msg consumed from kafka  
**@mapshen:** @mayanks is there an example for reading that metadata? Not sure
what the available fields are.  
**@g.kishore:** This is an amazing idea and easy to add as part of Kafka
decoder  
**@g.kishore:** @mayanks this is not available as part of the data which means
transform function cannot be do this. What we need is Kafka decoder to read
this metadata and add it to generic row.. we already have access to this and
use it for checkpointing.. should be easy to add this.. very good beginner
task  
 **@mapshen:** @mapshen has joined the channel  

###  _#random_

  
 **@rraguram:** @rraguram has joined the channel  
 **@allison:** @allison has joined the channel  
 **@jcwillia:** @jcwillia has joined the channel  
 **@sgarud:** @sgarud has joined the channel  
 **@vbondugula:** @vbondugula has joined the channel  
 **@mapshen:** @mapshen has joined the channel  

###  _#feat-presto-connector_

  
 **@nadeemsadim:** @nadeemsadim has joined the channel  

###  _#troubleshooting_

  
 **@rraguram:** @rraguram has joined the channel  
 **@allison:** @allison has joined the channel  
 **@jcwillia:** @jcwillia has joined the channel  
 **@jmeyer:** Hello :wave: What's the meaning of `{'errorCode': 410,
'message': 'BrokerResourceMissingError'}` ? Got it from `pinot-db` Python
client The query runs fine from the UI though  
**@mayanks:** This means that broker for table was not found. One common cause
I have seen in the past is incorrect table name in the query.  
**@jmeyer:** Thanks @mayanks, I'll check that !  
**@mayanks:** I added a bit more details in the FAQ:  
**@jmeyer:** Perfect  
**@fx19880617:** Just curious, which python client are you using?  
**@jmeyer:** Official pinot-db  
**@jmeyer:** Is there any other recommended Python client ?  
**@fx19880617:** no I think that’s good  
**@jmeyer:** :ok_hand:  
 **@sgarud:** @sgarud has joined the channel  
 **@vbondugula:** @vbondugula has joined the channel  
 **@mike.davis:** Hello, are transform configs supported when generating
OFFLINE segments? I'm trying to add a new column via a date transformation and
getting: ```Caught exception while gathering stats
org.apache.parquet.io.InvalidRecordException: NEW_FIELD_NAME not found in
message schema {``` ingestionConfig: ``` "ingestionConfig": {
"transformConfigs": [ { "columnName": "NEW_FIELD_NAME", "transformFunction":
"fromEpochDays(OLD_FIELD_NAME)" } ] },```  
**@npawar:** yes it is supported.  
**@npawar:** the exception looks like it’s coming from parquet? can you share
the whole stack trace?  
**@mike.davis:** yeah I thought it might be a parquet issue: ```Caught
exception while gathering stats org.apache.parquet.io.InvalidRecordException:
NEW_FIELD_NAME not found in message schema { <...schema omitted...> } at
org.apache.parquet.schema.GroupType.getFieldIndex(GroupType.java:175) ~[pinot-
all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordExtractor.extract(ParquetNativeRecordExtractor.java:117)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader.next(ParquetNativeRecordReader.java:106)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader.next(ParquetRecordReader.java:64)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:67)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:42)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:172)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:153)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:102)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.tools.admin.command.CreateSegmentCommand.lambda$execute$0(CreateSegmentCommand.java:247)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_292] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_292] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292] Exception
caught: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Caught exception while generating segment from file:
/data/data_019c5bcb-0401-e7fc-0019-bd01cc97e583_906_6_0.snappy.parquet at
java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_292] at
java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_292] at
org.apache.pinot.tools.admin.command.CreateSegmentCommand.execute(CreateSegmentCommand.java:274)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164)
[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184)
[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6]
Caused by: java.lang.RuntimeException: Caught exception while generating
segment from file:
/data/data_019c5bcb-0401-e7fc-0019-bd01cc97e583_906_6_0.snappy.parquet at
org.apache.pinot.tools.admin.command.CreateSegmentCommand.lambda$execute$0(CreateSegmentCommand.java:265)
~[pinot-all-0.8.0-SNAPSHOT-jar-with-
dependencies.jar:0.8.0-SNAPSHOT-5b7023a4e75d91ea75d4f5f575d440b602bf3df6] at
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_292] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]```  
**@mike.davis:** specifically I'm using the `ParquetNativeRecordExtractor`  
**@mike.davis:** I can optionally switch to using plain Avro (non-parquet) if
for some reason Native Parquet is lacking some functionality.  
**@mike.davis:** FWIW the original source is a Snowflake table so I'm
exporting into Parquet purely for ingestion into Pinot so the format is
somewhat arbitrary.  
**@npawar:** and what’s the pinot schema?  
**@mike.davis:** The new field was part of the Pinot schema as a datetime
field: ``` { "name": "NEW_FIELD_NAME", "dataType": "LONG", "format":
"1:MILLISECONDS:EPOCH", "granularity": "1:DAYS" },```  
**@mike.davis:** I can dig more into on my end, good to know that support is
there, but maybe there's an issue with the parquet reader.  
 **@mapshen:** @mapshen has joined the channel  

###  _#pinot-dev_

  
 **@nadeemsadim:** @nadeemsadim has joined the channel  
 **@mapshen:** @mapshen has joined the channel  

###  _#community_

  
 **@mapshen:** @mapshen has joined the channel  

###  _#getting-started_

  
 **@lochanie1987:** @lochanie1987 has joined the channel  
 **@nadeemsadim:** @nadeemsadim has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org