You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/11/23 02:00:14 UTC

Apache Pinot Daily Email Digest (2021-11-22)

### _#general_

  
 **@momento.corto:** @momento.corto has joined the channel  
 **@momento.corto:** Hi, do you know if it’s possible to query Pinot from
Apache Drill or Dremio?  
**@mayanks:** Hello, currently no. Pinot currently has connectors to Presto &
Trino on that front.  
**@momento.corto:** Thank you  
 **@xiaoman:** @xiaoman has joined the channel  

###  _#random_

  
 **@momento.corto:** @momento.corto has joined the channel  
 **@xiaoman:** @xiaoman has joined the channel  

###  _#troubleshooting_

  
 **@lrhadoop143:** Hi Team,  
 **@lrhadoop143:** Hi Team ,I'm trying to setup pinot in docker and load table
. I'm Facing issues while loading data into table. ERROR:
java.lang.RuntimeException: Failed to read from Schema URI - '', . can you
please help me to fix this issue. I'm using this yml
file.executionFrameworkSpec: name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush inputDirURI: '/tmp/pinot-quick-
start/rawdata/' includeFileNamePattern: 'glob:**/*.csv' outputDirURI:
'/tmp/pinot-quick-start/segments/' overwriteOutput: true pinotFSSpecs: \-
scheme: file className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec: dataFormat: 'csv' className:
'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader' configClassName:
'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig' tableSpec:
tableName: 'transcript' schemaURI: '' tableConfigURI: '' pinotClusterSpecs: \-
controllerURI: ''  
**@mark.needham:** Can you see any more information on the error message in
the logs?  
**@lrhadoop143:** at
org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:161)
[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-6b33448da58992773ee23b863da029650e9ec37f] at
org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:192)
[pinot-all-0.10.0-SNAPSHOT-jar-with-
dependencies.jar:0.10.0-SNAPSHOT-6b33448da58992773ee23b863da029650e9ec37f]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at .PlainSocketImpl.socketConnect(Native Method) ~[?:?] at
.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[?:?]  
**@mark.needham:** So does that mean that you also can't navigate to ?  
**@lrhadoop143:** i can navigate and open that pinot UI  
**@lrhadoop143:** but not able to load data into table even i created table
and schema  
**@mark.needham:** hmmmm ok, I'm not sure why you'd get a connection refused
exception in that case  
**@mark.needham:** Can you try this spec: ```executionFrameworkSpec: name:
'standalone' segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush inputDirURI: '/tmp/pinot-quick-
start/rawdata/' includeFileNamePattern: 'glob:**/*.csv' outputDirURI:
'/tmp/pinot-quick-start/segments/' overwriteOutput: true pinotFSSpecs: \-
scheme: file className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec: dataFormat: 'csv' className:
'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader' configClassName:
'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig' tableSpec:
tableName: 'transcript' pinotClusterSpecs: \- controllerURI: ''```  
**@mark.needham:** also which command did you run to ingest the data?  
**@mayanks:** @lrhadoop143 If you are unable to access the Pinot console, then
most likely the controller is not even running?  
**@mayanks:** Can you confirm if Pinot is up and running?  
 **@lrhadoop143:** docker run --rm -ti --network=pinot-demo -v /tmp/pinot-
quick-start:/tmp/pinot-quick-start --name pinot-data-ingestion-job
apachepinot/pinot:latest LaunchDataIngestionJob -jobSpecFile /tmp/pinot-quick-
start/docker-job-spec.yml  
**@mark.needham:** looks good to me  
**@lrhadoop143:** Ok  
**@mark.needham:** can you try the spec I posted on the other thread?  
**@lrhadoop143:** No I will try and update you thanks for replying  
 **@bagi.priyank:** i am running into this stack trace in log with in a second
after adding a real-time table ```021/11/20 00:18:41.296 ERROR
[HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread]
Exception while executing a state transition task
km_mp_play_startree__103__0__20211120T0018Z
java.lang.reflect.InvocationTargetException: null at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:?] at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at
org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331)
[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-
all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-
all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by:
java.lang.OutOfMemoryError: Direct buffer memory at
java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?] at
java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?] at
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?] at
org.apache.pinot.segment.spi.memory.PinotByteBuffer.allocateDirect(PinotByteBuffer.java:38)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.segment.spi.memory.PinotDataBuffer.allocateDirect(PinotDataBuffer.java:115)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.segment.local.io.writer.impl.DirectMemoryManager.allocateInternal(DirectMemoryManager.java:53)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.segment.local.io.readerwriter.RealtimeIndexOffHeapMemoryManager.allocate(RealtimeIndexOffHeapMemoryManager.java:80)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.segment.local.realtime.impl.forward.FixedByteSVMutableForwardIndex.addBuffer(FixedByteSVMutableForwardIndex.java:208)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.segment.local.realtime.impl.forward.FixedByteSVMutableForwardIndex.<init>(FixedByteSVMutableForwardIndex.java:77)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.<init>(MutableSegmentImpl.java:308)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1364)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:344)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:162)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:86)
~[pinot-all-0.9.0-jar-with-
dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] ... 12
more``` i have tried increasing heap size (right now at 16G) and i am still
running into this issue. i am using 5 servers to consume from a topic with 128
partitions, with an event rate of about 7M events per minute. I see 26
segments on 3 servers and 25 on 2 servers in Bad state.  
**@mayanks:** It is running out of direct memory. What's the jvm configs you
are using and also what how much memory available on the instance running
Pinot server?  
**@bagi.priyank:** -Xms4G -Xmx16G -Dpinot.admin.system.exit=false  
**@mayanks:** How much memory available on the instance @bagi.priyank  
**@bagi.priyank:** 30.5 G. sorry was grabbing the info  
**@npawar:** i wonder if this is because the offheap.alloc is not set on the
servers @mayanks?  
**@bagi.priyank:** i am using pinot version 0.9.0 on java 11  
**@npawar:** not sure if it is default false or true  
**@mayanks:** Are all 128 partitions on a single server @bagi.priyank?  
**@bagi.priyank:** 5 servers  
**@bagi.priyank:** ``` "realtime.segment.flush.threshold.rows": "300000000",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "300M",```  
**@mayanks:** @npawar yes we should switch to off heap if not already there.  
**@mayanks:** `realtime.segment.flush.threshold.rows` is too high  
**@mayanks:** Set it to 0, so desired size kicks in  
**@bagi.priyank:** i see, ok  
**@bagi.priyank:** do you think i should try setting `XX:MaxDirectMemorySize`
?  
**@npawar:** i recommend setting this on servers instead  
**@bagi.priyank:** ok. thanks guys!  
**@bagi.priyank:** i am not seeing oom anymore. however now there are ~200
segments per server in an hour. is that a decent / high / low number?  
**@npawar:** That will keep reducing. When you enable segment size based
thresholds, the first segment will be 100k rows, and then the number of rows
in segments will keep getting bigger and bigger, in order to reach 300M size  
**@bagi.priyank:** got it. i do see 100k rows right now.  
**@npawar:** If you want, we can increase the initial number from 100k to
something more  
**@bagi.priyank:** i am still trying to understand the impact of it. more
segments means more segments to scan for a query right? so trying to
understand if what i have looks like a decent place to start or if there is
anything else i can optimize.  
**@npawar:** 200 per server per hour is certainly high. Might be worth setting
the `realtime.segment.flush.autotune.initialRows` to 1000000 (or something
proportional to make your segment size bigger).  
**@bagi.priyank:** got it. what is the sweet spot in your experience?  
**@npawar:** 300-500M segments is a good number. So just make it
(300*100k)/currentlySeenSegmentSIze  
**@npawar:** maybe a little lesser than that  
**@npawar:** anyway this is all to simply avoid initial surge of segments. I
2-3 iterations, the segment size should stabilise on its own  
**@bagi.priyank:** got it. thank you once again!  
 **@momento.corto:** @momento.corto has joined the channel  
 **@xiaoman:** @xiaoman has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org