You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/14 02:00:22 UTC

Apache Pinot Daily Email Digest (2021-10-13)

### _#general_

  
 **@nhas3007:** @nhas3007 has joined the channel  
 **@nemanja:** @nemanja has joined the channel  
 **@lalitbhagtani01:** @lalitbhagtani01 has joined the channel  
 **@lalitbhagtani01:** Hi all, I have one question regarding failing of a
server node. So here is the situation, my pinot cluster is ingesting data from
kafka, and before my segment is completed that server hosting consuming
segment is died. Now when my new server will be up, will it start consuming
these lost records from kafka again or not, and if yes how it will know that
it has to consume from this index from this partition. Thanks  
**@mayanks:** Yes, it will consume from the last checkpoint that Pinot saved,
so no data loss  
**@lalitbhagtani01:** Thanks for quick response. So I only have to make sure
my kafka save this data long enough. Thanks  
 **@courage.noko:** @courage.noko has joined the channel  
 **@singhal.prateek3:** @singhal.prateek3 has joined the channel  
 **@benshahbaz:** @benshahbaz has joined the channel  
 **@singhal.prateek3:** Hi Folks, I have a couple of questions regarding Star-
Tree index: 1\. In the image attached, is it possible for me to get D1-V1 and
D1-Star as results of the same query? The assumption here is that since nodes
in star-tree index are pre-aggregated, can I somehow pull two of them out in
one-go. (I guess subqueries with 2 different filter conditions would be one
solution, but Pinot does not support that) 2\. In the realtime table, is it
possible to use star-tree index? My understanding is that since star-tree
index requires pre-aggregation, it may not be applicable to real-time tables.
If that’s the case, is it possible to activate star-tree index without
upserts?  
**@mayanks:** 1\. What do you mean by pull out the nodes in one-go? 2\. You
can configure star-tree index for realtime, but yes upsert isn't supported
with that. cc: @jackie.jxt  
**@singhal.prateek3:** @mayanks … By pull-out, I basically mean to get D1-V1
and D1-Star as results of the same query  
**@mayanks:** The start node is mutually exclusive with other nodes, so the
answer to 1 is no.  
**@singhal.prateek3:** Thanks for the reply!  
**@g.kishore:** Its better to run two queries since they will end up using
different nodes in the tree anyway  

###  _#random_

  
 **@nhas3007:** @nhas3007 has joined the channel  
 **@nemanja:** @nemanja has joined the channel  
 **@lalitbhagtani01:** @lalitbhagtani01 has joined the channel  
 **@courage.noko:** @courage.noko has joined the channel  
 **@singhal.prateek3:** @singhal.prateek3 has joined the channel  
 **@benshahbaz:** @benshahbaz has joined the channel  
 **@benshahbaz:** @benshahbaz has left the channel  

###  _#troubleshooting_

  
 **@dunithd:** Folks, I have a sample data set like this: `"9/1/2014
6:04:00",40.7513,-73.935,"B02512"` `"9/1/2014
6:08:00",40.7291,-73.9813,"B02512"` `"9/1/2014
6:14:00",40.7674,-73.9841,"B02512"` Time is in minute granularity throughout
the data set. So I mapped the time column like this in my schema file:
`"dateTimeFieldSpecs": [{` `"name": "pickupTime",` `"dataType": "STRING",`
`"format" : "1:MINUTES:SIMPLE_DATE_FORMAT:MM/dd/yyyy HH:mm:ss",`
`"granularity": "1:MINUTES"` `}` And then in the table configuration:
`"segmentsConfig" : {` `"timeColumnName": "pickupTime",` `"timeType":
"MINUTES",` `"replication" : "1",` `"schemaName" : "pickups"` `},` Hope this
is fine?  
 **@dunithd:** Then my ingestion job failed with this: `Failed to generate
Pinot segment for file - file:/Users/dunith/Projects/streamlit/rawdata/uber-
raw-data-sep14.csv` `java.lang.IllegalArgumentException: Invalid format:
"null"` `at
org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:552)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:512)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:284)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:257)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:111)
~[pinot-all-0.8.0-jar-with-
dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]` `at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263)
~[pinot-batch-ingestion-
standalone-0.8.0-shaded.jar:0.8.0-9a0f41bc24243ff74315723b0153b534c2596e30]`
`at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[?:?]` `at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]` `at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]` `at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]` `at java.lang.Thread.run(Thread.java:834) [?:?]`  
 **@dunithd:** I can see the schema and table created in the data explorer.
But not sure what went wrong. I guess something to do with the time
formatting?  
**@npawar:** Could there be a null value for time column in some row? Time
column doesn't allow that  
**@npawar:** Also you might want to change HH to just H in your pattern
string, since your values can contain single or double digit in hour  
**@dunithd:** I will check the time column for a null value. Also, will do the
HH -> H change and see whether the issue persists. Thanks.  
 **@nhas3007:** @nhas3007 has joined the channel  
 **@msoni6226:** Hi Team, We are running a hybrid table setup in our Pinot
cluster. We have configured task to move data from RealTimeToOffline table.
However, we are not seeing any data being moved from Realtime to Offline
table. On checking the controller logs, I see the below errors. ```2021-10-13
07:41:57.360 ERROR [ZkBaseDataAccessor] [grizzly-http-server-4] paths is null
or empty 2021-10-13 07:41:58.956 ERROR [ZkBaseDataAccessor] [grizzly-http-
server-17] paths is null or empty 2021-10-13 00:06:19.325 ERROR
[JobDispatcher] [HelixController-pipeline-task-PinotCluster-(275fe39b_TASK)]
Job configuration is NULL for
TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1633995887529```  
 **@nemanja:** @nemanja has joined the channel  
 **@lalitbhagtani01:** @lalitbhagtani01 has joined the channel  
 **@courage.noko:** @courage.noko has joined the channel  
 **@singhal.prateek3:** @singhal.prateek3 has joined the channel  
 **@benshahbaz:** @benshahbaz has joined the channel  

###  _#docs_

  
 **@benshahbaz:** @benshahbaz has joined the channel  

###  _#pinot-dev_

  
 **@tharun.3c:** @tharun.3c has joined the channel  
 **@benshahbaz:** @benshahbaz has joined the channel  

###  _#announcements_

  
 **@albertobeiz:** @albertobeiz has joined the channel  
 **@nemanja:** @nemanja has joined the channel  

###  _#getting-started_

  
 **@tharun.3c:** @tharun.3c has joined the channel  
 **@otiennosharon:** @otiennosharon has joined the channel  
 **@otiennosharon:** Hello I am new in using Apache Pinot. I am trying to
learn more about Pinot operators. Would anyone help me in getting to
unserstand how it works and how to go about it?  
**@mayanks:** Hi @otiennosharon this is a good starting point  
**@karinwolok1:** Welcome, @otiennosharon! Happy to have you here. Let us know
if that documentation is helpful and what else we can do to make your learning
journey smoother. :slightly_smiling_face:  
**@otiennosharon:** Thanks so much @mayanks and @karinwolok1 I believe I can
start from here and when I encounter challenges I will definetly reach out  
 **@nemanja:** @nemanja has joined the channel  
 **@courage.noko:** @courage.noko has joined the channel  
 **@courage.noko:** hey, I deployed pinot on Kubernetes, is there a way to set
Google Cloud Storage configs such as
`pinot.controller.storage.factory.gs.projectId` on the server/controller
during deployment or update these?  
**@g.kishore:** Does this help -  

### _#flink-pinot-connector_

  
 **@singhal.prateek3:** @singhal.prateek3 has joined the channel  
 **@singhal.prateek3:** @singhal.prateek3 has left the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org