You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jhyao (via GitHub)" <gi...@apache.org> on 2023/11/03 14:38:13 UTC

[I] Count number of upsert table is incorrect when a new segment is created [pinot]

jhyao opened a new issue, #11948:
URL: https://github.com/apache/pinot/issues/11948

   We found an issue on pinot release-1.0.0, when a new segment is created, select count(*) on the table may result in a much smaller value than the actual value.
   
   In my testing, I firstly ingested 1 million data to pinot with all unique ids ranged from 0 to 1000000, then repeat the process to send data with same ids from 0 to 1000000 but with different column values. At the same time, another process execute count(*) query every 100ms to record count number.
   
   Here is the count number result, we can see many anomalous numbers, and their timing is exactly when new segments were created.
   ![image](https://github.com/apache/pinot/assets/17529008/9c9d5ec4-0e4b-4308-88bf-842a7cf320c5)
   
   I found in broker logs, most issues happened between IDEAL_STATE change and EXTERNAL_VIEW change. So I suspect the problem occurs between these two state changes. Do you have any idea?
   ```
   023/11/03 17:50:06.769 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003375,table=issuerrisk,timeMs=1,docs=1000000/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/1/0/0/0,consumingFreshnessTimeMs=1699005001041,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:06.771 INFO [CallbackHandler] [ZkClient-EventThread-73-localhost:2181] 73 START: CallbackHandler 0, INVOKE /PinotCluster/IDEALSTATES listener: org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: CALLBACK
   2023/11/03 17:50:06.771 INFO [ClusterChangeMediator] [ZkClient-EventThread-73-localhost:2181] Enqueueing IDEAL_STATE change
   2023/11/03 17:50:06.771 INFO [CallbackHandler] [ZkClient-EventThread-73-localhost:2181] 73 END:INVOKE CallbackHandler 0, /PinotCluster/IDEALSTATES listener: org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: CALLBACK Took: 0ms
   2023/11/03 17:50:06.771 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Start processing IDEAL_STATE change
   2023/11/03 17:50:06.771 INFO [BrokerRoutingManager] [ClusterChangeHandlingThread] Processing segment assignment change
   2023/11/03 17:50:06.774 INFO [BaseInstanceSelector] [ClusterChangeHandlingThread] Got 1 new segments: {issuerrisk__0__8__20231103T0950Z=1699005006774} for table: issuerrisk_REALTIME by processing existing states, current time: 1699005006774
   2023/11/03 17:50:06.774 INFO [BrokerRoutingManager] [ClusterChangeHandlingThread] Processed segment assignment change in 3ms (fetch ideal state and external view stats for 1 tables: 0ms, update routing entry for 1 tables ([issuerrisk_REALTIME]): 3ms)
   2023/11/03 17:50:06.774 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish handling IDEAL_STATE change for handler: BrokerRoutingManager in 3ms
   2023/11/03 17:50:06.774 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish processing IDEAL_STATE change in 3ms
   2023/11/03 17:50:06.876 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003376,table=issuerrisk,timeMs=0,docs=1000000/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/1/0/0/0,consumingFreshnessTimeMs=1699005001041,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:06.998 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003377,table=issuerrisk,timeMs=1,docs=740101/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/0/0/0/0,consumingFreshnessTimeMs=1699005002130,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,256,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.094 INFO [CallbackHandler] [ZkClient-EventThread-73-localhost:2181] 73 START: CallbackHandler 1, INVOKE /PinotCluster/EXTERNALVIEW listener: org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: CALLBACK
   2023/11/03 17:50:07.094 INFO [ClusterChangeMediator] [ZkClient-EventThread-73-localhost:2181] Enqueueing EXTERNAL_VIEW change
   2023/11/03 17:50:07.094 INFO [CallbackHandler] [ZkClient-EventThread-73-localhost:2181] 73 END:INVOKE CallbackHandler 1, /PinotCluster/EXTERNALVIEW listener: org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: CALLBACK Took: 0ms
   2023/11/03 17:50:07.094 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Start processing EXTERNAL_VIEW change
   2023/11/03 17:50:07.094 INFO [BrokerRoutingManager] [ClusterChangeHandlingThread] Processing segment assignment change
   2023/11/03 17:50:07.096 INFO [BaseInstanceSelector] [ClusterChangeHandlingThread] Got 0 new segments: {} for table: issuerrisk_REALTIME by processing existing states, current time: 1699005007096
   2023/11/03 17:50:07.096 INFO [BrokerRoutingManager] [ClusterChangeHandlingThread] Processed segment assignment change in 2ms (fetch ideal state and external view stats for 1 tables: 1ms, update routing entry for 1 tables ([issuerrisk_REALTIME]): 1ms)
   2023/11/03 17:50:07.096 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: BrokerRoutingManager in 2ms
   2023/11/03 17:50:07.096 INFO [HelixExternalViewBasedQueryQuotaManager] [ClusterChangeHandlingThread] Start processing qps quota change.
   2023/11/03 17:50:07.096 INFO [HelixExternalViewBasedQueryQuotaManager] [ClusterChangeHandlingThread] No qps quota change: external view for broker resource remains the same.
   2023/11/03 17:50:07.096 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: HelixExternalViewBasedQueryQuotaManager in 0ms
   2023/11/03 17:50:07.096 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish processing EXTERNAL_VIEW change in 2ms
   2023/11/03 17:50:07.109 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003378,table=issuerrisk,timeMs=2,docs=875721/1873069,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007107,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,2,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.218 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003379,table=issuerrisk,timeMs=1,docs=999998/1879272,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007217,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.304 INFO [CallbackHandler] [ZkClient-EventThread-73-localhost:2181] 73 START: CallbackHandler 1, INVOKE /PinotCluster/EXTERNALVIEW listener: org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: CALLBACK
   2023/11/03 17:50:07.304 INFO [ClusterChangeMediator] [ZkClient-EventThread-73-localhost:2181] Enqueueing EXTERNAL_VIEW change
   2023/11/03 17:50:07.304 INFO [CallbackHandler] [ZkClient-EventThread-73-localhost:2181] 73 END:INVOKE CallbackHandler 1, /PinotCluster/EXTERNALVIEW listener: org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: CALLBACK Took: 0ms
   2023/11/03 17:50:07.304 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Start processing EXTERNAL_VIEW change
   2023/11/03 17:50:07.304 INFO [BrokerRoutingManager] [ClusterChangeHandlingThread] Processing segment assignment change
   2023/11/03 17:50:07.309 INFO [BaseInstanceSelector] [ClusterChangeHandlingThread] Got 0 new segments: {} for table: issuerrisk_REALTIME by processing existing states, current time: 1699005007309
   2023/11/03 17:50:07.309 INFO [BrokerRoutingManager] [ClusterChangeHandlingThread] Processed segment assignment change in 5ms (fetch ideal state and external view stats for 1 tables: 2ms, update routing entry for 1 tables ([issuerrisk_REALTIME]): 3ms)
   2023/11/03 17:50:07.309 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: BrokerRoutingManager in 5ms
   2023/11/03 17:50:07.309 INFO [HelixExternalViewBasedQueryQuotaManager] [ClusterChangeHandlingThread] Start processing qps quota change.
   2023/11/03 17:50:07.309 INFO [HelixExternalViewBasedQueryQuotaManager] [ClusterChangeHandlingThread] No qps quota change: external view for broker resource remains the same.
   2023/11/03 17:50:07.309 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: HelixExternalViewBasedQueryQuotaManager in 0ms
   2023/11/03 17:50:07.309 INFO [ClusterChangeMediator] [ClusterChangeHandlingThread] Finish processing EXTERNAL_VIEW change in 5ms
   2023/11/03 17:50:07.326 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003380,table=issuerrisk,timeMs=1,docs=999998/1886050,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007326,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.434 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003381,table=issuerrisk,timeMs=1,docs=999999/1895348,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007433,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.545 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003382,table=issuerrisk,timeMs=1,docs=999998/1907060,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007544,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.652 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003383,table=issuerrisk,timeMs=1,docs=1000000/1910088,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007644,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   
   ```
   Wrong number logs are these two lines.
   ```
   2023/11/03 17:50:06.998 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003377,table=issuerrisk,timeMs=1,docs=740101/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/0/0/0/0,consumingFreshnessTimeMs=1699005002130,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,256,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   2023/11/03 17:50:07.109 INFO [QueryLogger] [jersey-server-managed-async-executor-7] requestId=621783916000003378,table=issuerrisk,timeMs=2,docs=875721/1873069,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007107,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,2,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select count(*) as count from issuerrisk
   
   ```
   
   
   Here is my table definition.
   ```json
   {
       "tableName": "issuerrisk",
       "tableType": "REALTIME",
       "segmentsConfig": {
           "schemaName": "issuerrisk",
           "timeColumnName": "UpdatedTime",
           "allowNullTimeValue": false,
           "replicasPerPartition": "1",
           "completionConfig": {
               "completionMode": "DOWNLOAD"
           }
       },
       "tableIndexConfig": {
           "invertedIndexColumns": [],
           "sortedColumn": [],
           "noDictionaryColumns": [
               "JTD1",
               "JTD2",
               "JTD3",
               "JTD4",
               "JTD5",
               "JTD6",
               "JTD7",
               "JTD8",
               "JTD9",
               "JTD10",
               "Content"
           ],
           "loadMode": "MMAP",
           "nullHandlingEnabled": false
       },
       "ingestionConfig": {
           "streamIngestionConfig": {
               "streamConfigMaps": [
                   {
                       "streamType": "kafka",
                       "stream.kafka.consumer.type": "lowlevel",
                       "stream.kafka.topic.name": "issuerrisk",
                       "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
                       "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
                       "stream.kafka.broker.list": "localhost:9092",
                       "realtime.segment.flush.threshold.rows": "0",
                       "realtime.segment.flush.threshold.time": "24h",
                       "realtime.segment.flush.threshold.segment.size": "50M",
                       "stream.kafka.consumer.prop.auto.offset.reset": "largest"
                   }
               ]
           }
       },
       "tenants": {},
       "metadata": {},
       "upsertConfig": {
           "mode": "FULL",
           "comparisonColumn": "UpdatedTime"
       },
       "routing": {
           "instanceSelectorType": "strictReplicaGroup"
       }
   }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1799400081

   I was able to reproduce the issue on my side as well. will debug this further. stay tuned..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "jhyao (via GitHub)" <gi...@apache.org>.
jhyao commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797769758

   After publishing 1M ids, producer continued to send 2M upsert data with same ids as first 1M ids.
   Producer code like this:
   ```python
   def generate_record(id):
       record = {
           'UID': id,
           'UpdatedTime': get_time(),
           'Content': generate_random_string(CONTENT_LENGTH)
       }
       for i in range(1, 11):
           record[f'JTD{i}'] = generate_random_number()
       return json.dumps(record)
   
   for i in range(3):
       for id in range(1_000_000):
           record = generate_record(id)
           producer.send(TOPIC, key=str(id).encode(), value=record.encode())
           if id % 10000 == 0:
               print(f'Published {i} round, {id} messages')
               producer.flush()
   
   producer.flush()
   ```
   
   
   I tested again without upsert, no this issue. So the issue is only on upsert table.
   ![image](https://github.com/apache/pinot/assets/17529008/e7847f06-8073-4ffa-bd38-606fc4aa6e10)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1800425938

   While debugging the issue, we found this bug and fix it here: https://github.com/apache/pinot/pull/11964 
   
   But even with this fix, I could still see the count(*) queries returned smaller values than expected, when new segments were created. Because there is another issue on query routing as described here: https://github.com/apache/pinot/issues/11965, will address this as next.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "KKcorps (via GitHub)" <gi...@apache.org>.
KKcorps commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797272697

   Is the consumption and producer still running even after publishing 1M ids?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797828108

   is the table config posted above the complete config? There is no `segmentPartitionConfig`, which defines table partitioning required by upsert tables to place the segments and route queries properly.
   
   How many servers are hosting segments for the table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "jhyao (via GitHub)" <gi...@apache.org>.
jhyao commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797882360

   That's all table configs. I rely on kafka partition so I don't need segmentPartitionConfig, and in my testing the kafka topic only have one partition. And only one server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797811501

   cool! How about using this query `set "skipUpsert"=true; select count(*) ...` to run test with the upsert table again, to see if any critical upsert states were not visible to the query when the segment was being commit. 
   
   (I'll try to reproduce this on my side for some clues)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797121368

   Thanks for reporting the issue. We will take a look
   cc @klsince @KKcorps 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1815293485

   hey @jhyao thanks for testing that out on your side. Yes, I also noticed the overcounting issue while testing the new changes. Basically, the realtime data ingestion can continue to update existing segments (like the upsert validDocId bitmaps used to identify which docs are updated in each segment) while the query is processing those segments in parallel (and without a deterministic segment processing order). 
   
   The query could overcount or undercount, depending on the order of segment processing and whether new records got ingested and invalidated existing docs. But this is different from the issue to be fixed by PR https://github.com/apache/pinot/pull/11978, because no segments were missed just that they got updated during query execution. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "klsince (via GitHub)" <gi...@apache.org>.
klsince commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797391108

   Nice test setup. If you can customize it, could you test with a table without using upsert, just to see if this issue is specific to upsert table or not. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "jhyao (via GitHub)" <gi...@apache.org>.
jhyao commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797885674

   Tested select count with "skipUpsert"=true, no this issue.
   ![image](https://github.com/apache/pinot/assets/17529008/66ead424-19f3-4ff9-86ea-9d226e3cbd03)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Count number of upsert table is incorrect when a new segment is created [pinot]

Posted by "jhyao (via GitHub)" <gi...@apache.org>.
jhyao commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1814029469

   @klsince Tested on your fix branch #11978, the result is pretty good, large count inconsistency is fixed. 
   
   ![image](https://github.com/apache/pinot/assets/17529008/247a8545-6121-47bb-9e43-575743c9a342)
   
   One more interesting thing is there are still many small differences, some smaller that correct count, some larger than correct count.
   ![Screen Shot 2023-11-16 at 4 47 34 PM](https://github.com/apache/pinot/assets/17529008/222358cb-031d-44b4-b3f9-ad1585bf5b24)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org