You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/03/20 02:00:19 UTC

Apache Pinot Daily Email Digest (2021-03-19)

### _#general_

  
 **@lam010210:** @lam010210 has joined the channel  
 **@kc2005au:** @kc2005au has joined the channel  
 **@shvetadogra:** @shvetadogra has joined the channel  
 **@saurabh:** @saurabh has joined the channel  
 **@mike.davis:** hello, I see that `0.7.0` was released, congrats! But there
does not appear to be a corresponding `0.7.0-jdk11` image available via docker
hub, only SNAPSHOT versions. Any chance that can get published?  
 **@ken:** Interesting,  also doesn’t show `0.7.0`. Also thinking the 0.1.0
through 0.5.0 downloads could be removed from that page…  
 **@g.kishore:** It’s not yet officially released.. there was a mix up in ASF
process.. please stay tuned. We will update soon  
 **@ken:** OK - but it’s in Maven Central :slightly_smiling_face: Should we
avoid upgrading to that version?  
**@g.kishore:** Yes.. please wait. We accidentally pushed it before getting
the approval from ASF. We need official confirmation  
**@aaron:** Does Pinot's batch insert have any way to avoid inserting
duplicate data? Say that ever day I want to batch-insert the previous day of
data, and I have multiple batches of data per day (say each batch of data
corresponds to data from a different ice cream flavor). If I'm generating +
batch inserting yesterday's data for each ice cream flavor in parallel, and
the "strawberry" job fails, so I rerun it, how do I make sure I'm not batch-
inserting "strawberry" data that was already inserting?  
**@g.kishore:** Segment name is unique across the table. As long as you
maintain idempotent across multiple runs. It will be fine  
**@g.kishore:** So in your case, make sure you encode value of the flavor in
segment name  
**@g.kishore:** So even if you push the same data again, it will be overridden  
**@aaron:** Ok, super cool. So I just need to make sure I set the segment name
correctly -- like in `segmentNameGeneratorSpec`?  
**@g.kishore:** Right  
**@g.kishore:** We typically use date and partition Some kind of partition id
as the convention  
**@aaron:** Awesome, thank you  
**@aaron:** Also just for my understanding -- is there any point in time where
partially complete segments or partially overwritten segments are visible to
consumers?  
**@g.kishore:** Is this hybrid or batch only table  
**@aaron:** I'm curious about the answer for both!  
**@g.kishore:** with batch only, its visible as soon as a segment is pushed  
**@g.kishore:** in hybrid, its only visible after a time boundary moves from
one day to another.  
 **@sunxiaohui.bj:** @sunxiaohui.bj has joined the channel  
 **@savannahjenglish:** @savannahjenglish has joined the channel  
 **@ken:** My ops guy is trying to validate JMX metrics, and he asked me how
to trigger NUM_MISSING_SEGMENTS. Any suggestions?  
**@g.kishore:** NUM_MISSING_SEGMENT?  
**@ken:** NUM_MISSING_SEGMENTS, I think -  
**@ken:** It looks like this can happen in the window between when a segment
is removed from the server, and a broker sees the ExternalView change. Maybe
this is too challenging to manually trigger, so I should tell ops not to worry
about trying to validate?  
**@g.kishore:** yeah, you can skip this for now  
**@g.kishore:** some of these were probably added when there was a bug to
monitor frequency of the occurrence. Its probably not useful anymore  

###  _#random_

  
 **@lam010210:** @lam010210 has joined the channel  
 **@kc2005au:** @kc2005au has joined the channel  
 **@shvetadogra:** @shvetadogra has joined the channel  
 **@saurabh:** @saurabh has joined the channel  
 **@sunxiaohui.bj:** @sunxiaohui.bj has joined the channel  
 **@savannahjenglish:** @savannahjenglish has joined the channel  

###  _#troubleshooting_

  
 **@ravi.maddi:** Hi All I am trying to ingres data through kafka and json
file and running this command: ```bin/kafka-console-producer.sh --broker-list
localhost:19092 --topic mytopic < $PDATA_HOME/opt_flatten_json.json``` _But Ia
m getting error:_ ```Exception while executing a state transition task
mystats__0__0__20210319T0430Z java.lang.reflect.InvocationTargetException:
null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_282] at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_282] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_282] at java.lang.reflect.Method.invoke(Method.java:498)
~[?:1.8.0_282] at
org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331)
[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-
all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-
all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_282] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] Caused
by: java.lang.OutOfMemoryError: Direct buffer memory at
java.nio.Bits.reserveMemory(Bits.java:695) ~[?:1.8.0_282] at
java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_282] at
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_282] at
org.apache.pinot.core.segment.memory.PinotByteBuffer.allocateDirect(PinotByteBuffer.java:39)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.segment.memory.PinotDataBuffer.allocateDirect(PinotDataBuffer.java:116)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.io.writer.impl.DirectMemoryManager.allocateInternal(DirectMemoryManager.java:53)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.io.readerwriter.RealtimeIndexOffHeapMemoryManager.allocate(RealtimeIndexOffHeapMemoryManager.java:79)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.realtime.impl.forward.FixedByteMVMutableForwardIndex.addDataBuffer(FixedByteMVMutableForwardIndex.java:162)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.realtime.impl.forward.FixedByteMVMutableForwardIndex.<init>(FixedByteMVMutableForwardIndex.java:137)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.indexsegment.mutable.MutableSegmentImpl.<init>(MutableSegmentImpl.java:307)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1270)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:324)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:132)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:88)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] ...
12 more Default rollback method invoked on error. Error Code: ERROR Message
execution failed. msgId: eed5b297-ea20-437e-a0b5-ad4d0be75c3c, errorMsg:
java.lang.reflect.InvocationTargetException Skip internal error. errCode:
ERROR, errMsg: null Event bcbad381_DEFAULT : Unable to find a next state for
resource: mystats_REALTIME partition: mystats__0__0__20210319T0430Z from
stateModelDefinitionclass org.apache.helix.model.StateModelDefinition
from:ERROR to:CONSUMING Event c910d226_DEFAULT : Unable to find a next state
for resource: mystats_REALTIME partition: mystats__0__0__20210319T0430Z from
stateModelDefinitionclass org.apache.helix.model.StateModelDefinition
from:ERROR to:CONSUMING Event d194950f_DEFAULT : Unable to find a next state
for resource: mystats_REALTIME partition: mystats__0__0__20210319T0430Z from
stateModelDefinitionclass org.apache.helix.model.StateModelDefinition
from:ERROR to:CONSUMING``` Need Help :slightly_smiling_face:  
**@fx19880617:** ```Caused by: java.lang.OutOfMemoryError: Direct buffer
memory``` try to give larger memory  
**@fx19880617:** increase JVM and your VM or container memory setting  
**@ravi.maddi:** Thanks , increased size, and resolved the issue  
 **@lam010210:** @lam010210 has joined the channel  
 **@ravi.maddi:** Hi Team *Data not appearing in Pinot Query Console.* I am
pushing data to Pinot through kafka, By command bin/kafka-console-producer.sh
--broker-list localhost:19092 --topic mytopic < $PDATA_HOME/data.json I check
all logs, there is no exceptions, but my data not appearing in query tool.
*Need Help* :slightly_smiling_face: My Schema look like this: ```{
"schemaName": "eventflowstats", "dimensionFieldSpecs": [ { "name": "_index",
"dataType": "STRING" }, { "name": "_type", "dataType": "STRING", "maxLength":
5 }, { "name": "_id", "dataType": "STRING" }, { "name": "_source.aExpIds",
"dataType": "INT", "singleValueField": false } ] "dateTimeFieldSpecs": [ {
"name": "_source.sDate", "dataType": "LONG", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:SECONDS:SIMPLE_DATE_FORMAT", "granularity":
"1:DAYS" } ] }``` My Table Config like this: ```{ "tableName": "mytable",
"tableType": "REALTIME", "tenants": {}, "segmentsConfig": { "timeColumnName":
"_source.sDate", "timeType": "MILLISECONDS", "segmentPushType": "APPEND",
"replicasPerPartition": "1", "retentionTimeUnit": "DAYS",
"retentionTimeValue": "1" }, "tableIndexConfig": { "loadMode": "MMAP",
"streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type":
"lowLevel", "stream.kafka.topic.name": "mytopic",
"stream.kafka.decoder.class.name":
"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.hlc.zk.connect.string": "localhost:2191/kafka",
"stream.kafka.consumer.factory.class.name":
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.zk.broker.url": "localhost:2191/kafka",
"stream.kafka.broker.list": "localhost:19092" } }, "metadata": {
"customConfigs": {} } }``` And Data like this:
```{"_index":"dhfkdfkdsjfk","_type":"_doc","_id":"68767677989hjhjkhkjh","_source.aExpIds":[815850,815857,821331],"_source.sDate":"2021-01-04
00:00:00"}``` I check all logs, I did not find any exceptions. But data is not
appearing in Pinot controller portal.  
**@fx19880617:** Kafka topic `event-count-stats-topic` you are producing to
and the topic in your table configs are not same `"stream.kafka.topic.name":
"mytopic",`  
**@ravi.maddi:** sorry, both I am using same, here(post) I forgot to change
both places as mytopic.  
**@fx19880617:** the schema format is wrong here ```dateTimeFieldSpecs": [ {
"name": "_source.sDate", "dataType": "LONG", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:SECONDS:SIMPLE_DATE_FORMAT", "granularity":
"1:DAYS" }```  
**@ravi.maddi:** can correct me please  
**@fx19880617:** ```"format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd
HH:mm:ss",```  
**@ravi.maddi:** i thinks , "dataType: LONG" also wrong? it should be STRING ,
am I right?  
**@fx19880617:** right  
 **@kc2005au:** @kc2005au has joined the channel  
 **@ravi.maddi:** *Can you help me* -- how to check, sample data is valid to
defined schema?  
 **@shvetadogra:** @shvetadogra has joined the channel  
 **@1705ayush:** Hi everyone, I am facing an issue while ingesting batch data
into Pinot. The command to ingest the data executes successfully, ```$ pinot-
admin.sh LaunchDataIngestionJob -jobSpecFile
/home/ayush/ayush_workspace/iVoyant/analytics/data/hospital_data/job-spec.yml
..... Pushing segment: hospital to location:  for table hospital Sending
request:  to controller: 4cb684aaf215, version: Unknown Response for pushing
table hospital segment hospital to location  \- 200: {"status":"Successfully
uploaded segment: hospital of table: hospital"}``` But, the *table status* on
UI turns *BAD* Here is the Error logged in pinot-server: `2021/03/19
15:03:24.082 ERROR
[SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel]
[HelixTaskExecutor-message_handle_thread] Caught exception in state transition
from OFFLINE -> ONLINE for resource: hospital_OFFLINE, partition: hospital`
`java.lang.IllegalStateException: Key separator not found: APR, segment:
/tmp/pinotServerData/hospital_OFFLINE/hospital/v3` `at
shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)`
Any idea? what could be wrong here ? I have attached the error log. Any help
is appreciated !  
**@1705ayush:** I did not realize that I was dealing with column names having
spaces in it. Removing the spaces in the column names, worked out  
**@fx19880617:** which column has space? in schema?  
**@1705ayush:** Most of the column names had space in it. As well the column
names mentioned in the schema had space in it. The column names were exactly
same in both the csv file and schema.json  
**@fx19880617:** oic, then we should try to prevent creating schema in Pinot
then :stuck_out_tongue:  
**@fx19880617:** and give the error msg  
 **@saurabh:** @saurabh has joined the channel  
 **@sunxiaohui.bj:** @sunxiaohui.bj has joined the channel  
 **@savannahjenglish:** @savannahjenglish has joined the channel  
 **@tisantos:** @tisantos has joined the channel  
 **@pabraham.usa:** Hello, Just wondering is it normal MMAP going very high ?
Also do this means I need to have ~1.5TB free space to hold the MMAP?  
**@mayanks:** The servers memory map the indexes. So this should reflect the
size of segments you have on the server. Is that not the case?  

###  _#getting-started_

  
 **@brianolsen87:** @brianolsen87 has joined the channel  
 **@kc2005au:** @kc2005au has joined the channel  

###  _#pinot-rack-awareness_

  
 **@xulinnankai:** @xulinnankai has joined the channel  
 **@xulinnankai:** @xulinnankai set the channel purpose: Server Rack Metadata
Retrieval and Persistence on Azure Environment  
 **@ssubrama:** @ssubrama has joined the channel  
 **@rkanumul:** @rkanumul has joined the channel  
 **@docchial:** @docchial has joined the channel  
 **@g.kishore:** @g.kishore has joined the channel  
 **@fx19880617:** @fx19880617 has joined the channel  
 **@dlavoie:** @dlavoie has joined the channel  
 **@pabraham.usa:** @pabraham.usa has joined the channel  
 **@ssubrama:** Thanks for creating the channel, Lin. Can we rename this
channel to (say) pinot-rack-awareness  
**@xulinnankai:** Sure. Will do. I will invite Jay once he join Pinot oss
slack.  
 **@xulinnankai:** @xulinnankai has renamed the channel from "issue-6532" to
"pinot-rack-awareness"  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org