You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "shwin (via GitHub)" <gi...@apache.org> on 2023/06/27 22:55:13 UTC

[GitHub] [pinot] shwin opened a new issue, #10986: Count discrepancy among `in`/`=` and `between` queries using timestamp index

shwin opened a new issue, #10986:
URL: https://github.com/apache/pinot/issues/10986

   we're seeing something strange with timestamp indices and segment pruning.
   
   ```
   We have a timestamp index like this:
    {
           "name": "blockTimestamp",
           "encodingType": "DICTIONARY",
           "indexType": "TIMESTAMP",
           "indexTypes": [
             "TIMESTAMP"
           ],
           "timestampConfig": {
             "granularities": [
               "SECOND",
               "MINUTE",
               "HOUR",
               "DAY",
               "WEEK",
               "MONTH",
               "YEAR"
             ]
           },
           "indexes": null
         },
   ```
   
   
   What we're trying to do is count the number of rows in a single day, in our case 2021-05-11.
   
   So we do a query like this, with a date range:
   `select $segmentName, DATETRUNC('day', "blockTimestamp") "blockTimestamp_day", count("transactionFrom") "Count Transaction From Address" from "13059af7-8eab-4196-a7ea-1a170d73c02e" where blockTimestamp_day >= fromDateTime('2021-05-11', 'yyyy-MM-dd') and blockTimestamp_day < fromDateTime('2021-05-12', 'yyyy-MM-dd') group by "blockTimestamp_day", $segmentName order by "blockTimestamp_day"`
   
   We get these results, which look correct:
   ```
   13059af7-8eab-4196-a7ea-1a170d73c02e_OFFLINE_1620454828000_1620702391000_61_2296cf5b-2896-4ff8-bb57-0405bd69a7dc	2021-05-11 00:00:00.0	34699
   13059af7-8eab-4196-a7ea-1a170d73c02e_OFFLINE_1620702401000_1620964585000_62_9f7f4c0f-5d06-4909-99c0-7c009ff53385	2021-05-11 00:00:00.0	247050
   ```
   
   But if instead, we change this to do a `=` query, which should be exactly the same results, we only pick up 1 segment:
   13059af7-8eab-4196-a7ea-1a170d73c02e_OFFLINE_1620702401000_1620964585000_62_9f7f4c0f-5d06-4909-99c0-7c009ff53385	2021-05-11 00:00:00.0	247050
   
   Clearly, the first query is correct and the 2nd is incorrect, since we're not picking up segments/values that we should be.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1612436438

   in case it's helpful, here's some column metadata for the column in question:
   ```
   "segment.start.time": "1620663024000",
     "segment.time.unit": "MILLISECONDS",
     "columns": [
       {
         "columnMaxLength": 0,
         "sorted": false,
         "totalNumberOfEntries": 1169356,
         "fieldSpec": {
           "granularity": "1:DAYS",
           "sampleValue": null,
           "singleValueField": true,
           "format": "TIMESTAMP",
           "virtualColumnProvider": null,
           "defaultNullValueString": "0",
           "name": "$blockTimestamp$DAY",
           "maxLength": 512,
           "dataType": "TIMESTAMP",
           "transformFunction": null,
           "defaultNullValue": 0
         },
         "hasDictionary": true,
         "bitsPerElement": 1,
         "minMaxValueInvalid": false,
         "autoGenerated": false,
         "maxNumberOfMultiValues": 0,
         "indexSizeMap": {
           "dictionary": 24,
           "forward_index": 146178,
           "range_index": 23662
         },
         "totalDocs": 1169356,
         "partitionFunction": null,
         "partitions": null,
         "minValue": 1620604800000,
         "maxValue": 1620691200000,
         "cardinality": 2,
         "singleValue": true,
         "fieldType": "DATE_TIME",
         "columnName": "$blockTimestamp$DAY",
         "dataType": "TIMESTAMP"
       }
     ],
     "segment.size.in.bytes": "405461773",
     "segment.end.time": "1620694659000",
     "segment.total.docs": "1169356",
   
     "indexes": {
       "$blockTimestamp$DAY": {
         "bloom-filter": "NO",
         "dictionary": "YES",
         "forward-index": "YES",
         "inverted-index": "NO",
         "null-value-vector-reader": "NO",
         "range-index": "YES",
         "json-index": "NO"
       }
     },
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1612352457

   Based on the query plan, seems range index doesn't handle EQ predicate properly. Which Pinot version are you running? When you run the second query, is it possible that the first segment just expired?
   
   cc @richardstartin to also take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "richardstartin (via GitHub)" <gi...@apache.org>.
richardstartin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1632961236

   I was suggesting to patch pinot to avoid calling `RangeBitmap.eq` in case it's the cause of this bug. I can try to break `RangeBitmap.eq` and do a release to unblock this, but that's going to be time consuming. Some way to reduce this to something reproducible would be helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1631498213

   @Jackie-Jiang / @richardstartin let me know if you need anything else from me; understand that it might take some time to pick up this fix, but want to make sure you're not waiting for anything from me.
   
   I poked around a little to see if I could make headway on a fix, but got lost; I can try again if the expected timeline for this is a ways out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang closed issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang closed issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index
URL: https://github.com/apache/pinot/issues/10986


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1641081767

   @Jackie-Jiang `BETWEEN` returns more segments than `=` does (it picks up 3 segments where `=` picked up 2).
   
   If I query for a known $docId like:
   ```
   select $segmentName, DATETRUNC('day', "blockTimestamp") "blockTimestamp_day", $docId from "0c98d17f-fb19-42fa-bca3-54b9f1750e0f_OFFLINE"
   where blockTimestamp_day = fromDateTime('2021-05-11', 'yyyy-MM-dd') and $segmentName = '0c98d17f-fb19-42fa-bca3-54b9f1750e0f_OFFLINE_1620669780000_1620707462000_1469_87cec857-1140-4a46-b722-53d869aa871d' and $docId = '16'
   ```
   
   I get no results, even though that docId showed up in the `between` version of the query.
   
   Here's the plan for the between:
   ```
   
   
   BROKER_REDUCE(limit:10) | 1 | 0
   -- | -- | --
   COMBINE_GROUP_BY | 2 | 1
   PLAN_START(numSegmentsForThisPlan:2) | -1 | -1
   GROUP_BY(groupKeys:$segmentName, $blockTimestamp$DAY, aggregations:count(*)) | 3 | 2
   PROJECT($segmentName, $blockTimestamp$DAY) | 4 | 3
   DOC_ID_SET | 5 | 4
   FILTER_RANGE_INDEX(indexLookUp:range_index,operator:RANGE,predicate:$blockTimestamp$DAY BETWEEN '1620691200000' AND '1620691200000') | 6
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "richardstartin (via GitHub)" <gi...@apache.org>.
richardstartin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1612531596

   @Jackie-Jiang is this really related to the index? The plan shows that numSegmentsForThisPlan is 1 for the first query and 2 for the second, so is the segment pruning correct in both cases? In any case, I can take a look at how the range index evaluates equality queries and look for bugs there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1612404167

   > When you run the second query, is it possible that the first segment just expired?
   
   Hmm, not sure what you mean here; I can repeatedly run both these queries and get consistent results.
   
   We're using the `apachepinot/pinot:0.13.0-SNAPSHOT-e3d74339a4-20230406-11-ms-openjdk-linux-amd64` image in our cluster


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "richardstartin (via GitHub)" <gi...@apache.org>.
richardstartin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1614280375

   the best thing to do right now would be to replace the call to `eq(x)` in the range index reader with `between(x, x)` and see if this makes things better, I'll come back at a later date with a fix for `eq(x)` if this turns out to be a bug in the index (which seems likely given that `eq` was only added recently).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1645085416

   I think I might find the issue, and it is in pinot side. In `BitSlicedRangeIndexReader`:
   ```
     private ImmutableRoaringBitmap queryRangeBitmap(long value, long columnMax) {
       RangeBitmap rangeBitmap = mapRangeBitmap();
       if (Long.compareUnsigned(value, columnMax) < 0) {
         return rangeBitmap.eq(value).toMutableRoaringBitmap();
       } else {
         return new MutableRoaringBitmap();
       }
     }
   ```
   The if check should be `<= 0` instead of `<0`. We probably made the mistake by copying the logic for range predicate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1645787365

   Great, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1617292660

   @Jackie-Jiang yep! My latest table returns _3_ segments for the `>=/<` query and _2_ for the `=` query. The table has 3000 segments overall.
   
   For the >=/< query:
   ```
   "exceptions": [],
     "numServersQueried": 2,
     "numServersResponded": 2,
     "numSegmentsQueried": 3000,
     "numSegmentsProcessed": 3000,
     "numSegmentsMatched": 3,
     "numConsumingSegmentsQueried": 0,
     "numConsumingSegmentsProcessed": 0,
     "numConsumingSegmentsMatched": 0,
     "numDocsScanned": 3110482,
     "numEntriesScannedInFilter": 0,
     "numEntriesScannedPostFilter": 6220964,
     "numGroupsLimitReached": false,
     "totalDocs": 3994721240,
     "timeUsedMs": 44,
     "offlineThreadCpuTimeNs": 0,
     "realtimeThreadCpuTimeNs": 0,
     "offlineSystemActivitiesCpuTimeNs": 0,
     "realtimeSystemActivitiesCpuTimeNs": 0,
     "offlineResponseSerializationCpuTimeNs": 0,
     "realtimeResponseSerializationCpuTimeNs": 0,
     "offlineTotalCpuTimeNs": 0,
     "realtimeTotalCpuTimeNs": 0,
     "segmentStatistics": [],
     "traceInfo": {},
     "minConsumingFreshnessTimeMs": 0,
     "numSegmentsPrunedByBroker": 0,
     "numSegmentsPrunedByServer": 0,
     "numSegmentsPrunedInvalid": 0,
     "numSegmentsPrunedByLimit": 0,
     "numSegmentsPrunedByValue": 0,
     "explainPlanNumEmptyFilterSegments": 0,
     "explainPlanNumMatchAllFilterSegments": 0,
     "numRowsResultSet": 3
   ```
   
   So specfically:
   ```
     "numSegmentsProcessed": 3000,
     "numSegmentsMatched": 3,
   ```
   
   
   For the `=` query:
   ```
   "numServersQueried": 2,
     "numServersResponded": 2,
     "numSegmentsQueried": 3000,
     "numSegmentsProcessed": 3000,
     "numSegmentsMatched": 2,
     "numConsumingSegmentsQueried": 0,
     "numConsumingSegmentsProcessed": 0,
     "numConsumingSegmentsMatched": 0,
     "numDocsScanned": 2287773,
     "numEntriesScannedInFilter": 0,
     "numEntriesScannedPostFilter": 4575546,
     "numGroupsLimitReached": false,
     "totalDocs": 3994721240,
     "timeUsedMs": 561,
     "offlineThreadCpuTimeNs": 0,
     "realtimeThreadCpuTimeNs": 0,
     "offlineSystemActivitiesCpuTimeNs": 0,
     "realtimeSystemActivitiesCpuTimeNs": 0,
     "offlineResponseSerializationCpuTimeNs": 0,
     "realtimeResponseSerializationCpuTimeNs": 0,
     "offlineTotalCpuTimeNs": 0,
     "realtimeTotalCpuTimeNs": 0,
     "segmentStatistics": [],
     "traceInfo": {},
     "minConsumingFreshnessTimeMs": 0,
     "numSegmentsPrunedByBroker": 0,
     "numSegmentsPrunedByServer": 0,
     "numSegmentsPrunedInvalid": 0,
     "numSegmentsPrunedByLimit": 0,
     "numSegmentsPrunedByValue": 0,
     "explainPlanNumEmptyFilterSegments": 0,
     "explainPlanNumMatchAllFilterSegments": 0,
     "numRowsResultSet": 2
   ```
   
   So specifically:
   ```
     "numSegmentsProcessed": 3000,
     "numSegmentsMatched": 2,
   ```
   
   
   In both cases I'm a little surprised we're processing all 3000 segments, I guess because we're querying the generated timestamp$DAY columns instead of just the main timestamp column? If I do a between query on just my timestamp column (eg `blocktimestamp >= something AND blockTimestamp < something` I get 3 segments processed.
   
   So, in any case, I guess it's not the pruning here since the `numSegmentsProcessed` is identical in both queries.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1641086151

   @richardstartin understood; I'm not sure how to reduce this to something reproducible, but open to suggestions.
   
   I think this happens in cases like:
   1. let's say you're querying for time column = day N
   2. you have 3 segments, with values ranging from (N -1, N), (N, N), (N, N+1)
   3. `=` will not include that first segment, that is, the values that range from (N-1, N), but will include the other 2 segments
   4. between/using a combo of >= and < will include all 3 segments
   
   So if you have some way to generate such segments, I bet it's reproducible; if you'd like me to generate some CSVs or something and replicate it myself I can do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1613790055

   is there a way I can disable segment pruning or something similar so we can figure that out?
   
   ie, if there is a way to query a single segment and _not_ do segment pruning that might be useful; I did try including the segmentName in the where clause of the query but not sure if that helps.
   
   And for what it's worth, when trying it again, for the `=` query I get:
   ![image](https://github.com/apache/pinot/assets/1700973/bba50ad7-eb0e-4b7c-a0c5-79abe6b34d01)
   
   That is, `PLAN_START(numSegmentsForThisPlan:1)`.
   
   When I run the query on my current table, I get _3_ segments back, so not sure what that number means.
   
   I get the same line for the `>= + <` query:
   
   ![image](https://github.com/apache/pinot/assets/1700973/6c9ec555-78d7-42aa-ab5c-271e3372243a)
   
   When I run the query on my current table, I get _4_ segments back.
   
   So imo not sure if that explain line is super useful here? But y'all would know better than me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1614275957

   @shwin Can you also share the stats from the query response? In the stats, `numSegmentsProcessed` is the segment count passing the pruner, and `numSegmentsMatched` is the segment count with matching record


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shwin commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "shwin (via GitHub)" <gi...@apache.org>.
shwin commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1612411307

   Trying it again, I noticed that the `numSegmentsMatched` is 1 lower in the `=` EQ case than the >= case.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1663040574

   @shwin Can you confirm?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1632957010

   @shwin As @richardstartin suggested, can you try using BETWEEN with same value and compare the result with EQ? Something like `where blockTimestamp_day between fromDateTime('2021-05-11', 'yyyy-MM-dd') and fromDateTime('2021-05-11', 'yyyy-MM-dd')`?
   You may also query `$segmentName`, `$docId` to find the record location, and see if `where blockTimestamp_day >= fromDateTime('2021-05-11', 'yyyy-MM-dd') and $segmentName = <matchingSegment> and $docId = <matchingDocId>` returns the record


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10986: Count discrepancy among `in`/`=` and `<` queries using timestamp index

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10986:
URL: https://github.com/apache/pinot/issues/10986#issuecomment-1645063168

   @shwin Based on the different result of `BETWEEN` and `EQUAL` (both of them using range index), it is very likely some bug within `RangeBitmap.eq`. If you can reproduce it with some simple records, it should be very helpful for @richardstartin to debug it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org