You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "andres-torti (via GitHub)" <gi...@apache.org> on 2023/11/08 02:29:00 UTC

[I] Error building segments on real-time table with inverted index [pinot]

andres-torti opened a new issue, #11968:
URL: https://github.com/apache/pinot/issues/11968

   Using Apache Pinot 1.0.0 and I'm having some trouble when creating an inverted index on a multi-value column. This is my table config:
   
   ```
   {
       "tableName": "devices",
       "tableType": "REALTIME",
       "upsertConfig": {
           "mode": "FULL",
           "comparisonColumn": "timestamp",
           "enableSnapshot": true,
           "enablePreload": true
       },
       "tenants": {},
       "segmentsConfig": {
           "timeColumnName": "timestamp",
           "timeType": "SECONDS",
           "retentionTimeUnit": "DAYS",
           "retentionTimeValue": "90",
           "replication": "1"
       },
       "tableIndexConfig": {
           "loadMode": "MMAP",
           "invertedIndexColumns": [
               "segments"
           ],
           "streamConfigs": {
               "streamType": "kafka",
               "stream.kafka.topic.name": "events-realtime",
               "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
               "stream.kafka.consumer.type": "simple",
               "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
               "stream.kafka.broker.list": "{brokers}",
               "security.protocol": "SSL",
               "realtime.segment.flush.threshold.time": "3600000",
               "realtime.segment.flush.threshold.size": "20000"
           }
       },
       "routing": {
           "instanceSelectorType": "strictReplicaGroup"
       },
       "fieldConfigList": [],
       "metadata": {
           "customConfigs": {}
       }
   }
   ```
   
   And this is my schema:
   
   ```
   {
       "metricFieldSpecs": [],
       "primaryKeyColumns": ["device_id"],
       "dimensionFieldSpecs": [
           {
               "name": "country",
               "dataType": "STRING"
           },
           {
               "name": "device_id",
               "dataType": "STRING"
           },
           {
               "name": "device_type",
               "dataType": "STRING"
           },
           {
               "name": "segments",
               "dataType": "INT",
               "singleValueField": false
           },
           {
               "name": "options",
               "dataType": "INT",
               "singleValueField": false
           },
           {
               "name": "relation_id",
               "dataType": "STRING"
           },
           {
               "name": "client",
               "dataType": "INT"
           }
       ],
       "dateTimeFieldSpecs": [
           {
               "name": "timestamp",
               "dataType": "LONG",
               "format": "1:SECONDS:EPOCH",
               "granularity": "1:DAYS"
           }
       ],
       "schemaName": "devices"
   }
   ```
   
   As soon as Pinot starts consuming events from Kafka I get these errors:
   
   ```
   pinot-server        | 2023/11/04 23:41:17.587 ERROR [LLRealtimeSegmentDataManager_devices__5__1__20231104T2341Z] [devices__5__1__20231104T2341Z] Could not build segment
   pinot-server        | java.lang.RuntimeException: Error occurred while reading row during indexing
   pinot-server        |   at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:232) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:121) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:935) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:842) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:754) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at java.lang.Thread.run(Thread.java:829) [?:?]
   pinot-server        | Caused by: java.lang.IndexOutOfBoundsException
   pinot-server        |   at java.nio.Buffer.checkIndex(Buffer.java:693) ~[?:?]
   pinot-server        |   at java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:758) ~[?:?]
   pinot-server        |   at org.apache.pinot.segment.spi.memory.PinotByteBuffer.getInt(PinotByteBuffer.java:137) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.io.reader.impl.FixedByteSingleValueMultiColReader.getInt(FixedByteSingleValueMultiColReader.java:105) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.realtime.impl.forward.FixedByteMVMutableForwardIndex.getDictIdMV(FixedByteMVMutableForwardIndex.java:250) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.spi.index.mutable.MutableForwardIndex.getDictIdMV(MutableForwardIndex.java:225) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.readers.PinotSegmentColumnReader.getValue(PinotSegmentColumnReader.java:98) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.readers.PinotSegmentRecordReader.getRecord(PinotSegmentRecordReader.java:227) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.readers.PinotSegmentRecordReader.next(PinotSegmentRecordReader.java:210) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:225) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   ... 5 more
   pinot-server        | 2023/11/04 23:41:17.589 ERROR [LLRealtimeSegmentDataManager_devices__5__1__20231104T2341Z] [devices__5__1__20231104T2341Z] Could not build segment for devices__5__1__20231104T2341Z
   pinot-server        | 2023/11/04 23:41:17.635 ERROR [LLRealtimeSegmentDataManager_devices__2__1__20231104T2341Z] [devices__2__1__20231104T2341Z] Could not build segment
   pinot-server        | java.lang.RuntimeException: Error occurred while reading row during indexing
   pinot-server        |   at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:232) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:121) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:935) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:842) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:754) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at java.lang.Thread.run(Thread.java:829) [?:?]
   pinot-server        | Caused by: java.lang.IndexOutOfBoundsException
   pinot-server        |   at java.nio.Buffer.checkIndex(Buffer.java:693) ~[?:?]
   pinot-server        |   at java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:758) ~[?:?]
   pinot-server        |   at org.apache.pinot.segment.spi.memory.PinotByteBuffer.getInt(PinotByteBuffer.java:137) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.io.reader.impl.FixedByteSingleValueMultiColReader.getInt(FixedByteSingleValueMultiColReader.java:105) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.realtime.impl.forward.FixedByteMVMutableForwardIndex.getDictIdMV(FixedByteMVMutableForwardIndex.java:250) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.spi.index.mutable.MutableForwardIndex.getDictIdMV(MutableForwardIndex.java:225) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.readers.PinotSegmentColumnReader.getValue(PinotSegmentColumnReader.java:98) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.readers.PinotSegmentRecordReader.getRecord(PinotSegmentRecordReader.java:227) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.readers.PinotSegmentRecordReader.next(PinotSegmentRecordReader.java:210) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:225) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
   pinot-server        |   ... 5 more
   ```
   
   The data is ingested into the table anyways, but when running a query like `select count(*) from devices where segments = 560` I get this error:
   
   ```
   Error Code: 200
   
   QueryExecutionError:
   java.lang.IndexOutOfBoundsException
   	at java.base/java.nio.Buffer.checkIndex(Buffer.java:693)
   	at java.base/java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:758)
   	at org.apache.pinot.segment.spi.memory.PinotByteBuffer.getInt(PinotByteBuffer.java:137)
   	at org.apache.pinot.segment.local.io.reader.impl.FixedByteSingleValueMultiColReader.getInt(FixedByteSingleValueMultiColReader.java:105)
   ```
   
   If I change the table flush size from:
   
   `"realtime.segment.flush.threshold.size": "20000"`
   
   To:
   
   `"realtime.segment.flush.threshold.size": "200000"`
   
   The errors are gone and everything works as expected. This some sample data in case it's useful: [sample_data.csv.zip](https://github.com/apache/pinot/files/13291350/sample_data.csv.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Error building segments on real-time table with inverted index [pinot]

Posted by "andres-torti (via GitHub)" <gi...@apache.org>.
andres-torti closed issue #11968: Error building segments on real-time table with inverted index
URL: https://github.com/apache/pinot/issues/11968


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org